Data is not like Sheep

SheepI hate it when I hear or read phrases like ‘these data’. It looks, feels and sounds plain wrong to me, and if not wrong then clumpy and turgid, like something you’d regret stepping in.

Of course, there are pluralised words (by which I mean, a word that describes itself in the singular or plural form, without adjustment) – for example ‘sheep’, that can theoretically be used in the singular or plural. For example:

  • This sheep is ‘Baaahhhhing’;
  • These sheep are walking to the kebab shop;
  • This herd of sheep is grazing in beautiful pasture.

In a moment, I will argue that the first usage above is also actually a little clunky, in the first usage it is singular, in the second it is plural (This becomes These, Is becomes Are), and in the third usage… well I think it is technically operating as a plural, but the ‘Herd’ is operating as a ‘singular group’ (So We translate, ‘This one herd of many sheep is…’). I’m sure a mathematician could draw some analogy here to a set, and didn’t that last translation start to sound like a relation?

Let’s revisit ‘This sheep…’. I feel this is almost as clunky as ‘These data’. Why? Because to me, the more common usage of the word sheep is to indicate the plural, and almost always could or should be replaced by a more specific singular noun when you might wish to indicate a singular. This lamb, This ewe, This ram, etc).

Let’s try and use ‘data’ in similar forms to our sheep example above:

  • This data is inconsistent;
  • These data indicate that the world is round; Yuk Yuk Yuk
  • This set of data comes from the Hubble telescope.

It’s certainly turning up a few oddities. Look at that first ‘This data is inconsistent’ again. It sounds like a singular (This) but what is it inconsistent with? Well, as I read it, it means ‘inconsistent with itself’… which suggests plural, right? And if it’s plural shouldn’t it be ‘The data are inconsistent.’? OK, so now we’ve become very confused, because we’ve got all mixed up and we’re just one step away from ‘These data are’ and hell I’m just very confused. But I do know that ‘This data are…’ or ‘These data are…’ are just plain wrong.

However, just like the singular of sheep sounds a little odd, and can be replaced by more appropriate and specific singular nouns, data when used in the singular can be more completely dragged down to the singular by the surrounding words, and in particular, specifier:

This data-point is ‘true’… (you could also try value, or perhaps tuple[?] instead of point) and you will sometimes hear…

This datum is… though I would suggest this is not in common usage amongst programmers.

And similar to ‘types of singular sheep’

But please, although some people do it, always treat ‘data’ as a singular with your verbs and your other wordage!

A helpful hint, gleaned from, is to mentally substitute ‘information’ when handling the plural cases.

And thankfully, the web reference above also nicely discusses these issues, though it does not go as far as to pillory the New York Times or the ‘scientists’ for their cringification (v. Cringify to make more cringely, uncomfortable, distasteful) of the language.

