Consider this tableau:
And then this one:
You’ve seen tableau one, a word (or tag) cloud, before, a now-standard, next-big-thing journo tool, this one floated into the firmament by Kirsten Long of the Politico web site.
Tableau two scrapes the same data – the text of President Obama’s September 8 jobs bill exhortation – but here makes primitive resort to your correspondent’s key-word spreadsheet (which I continue to refine), introduced in the previous post. The obvious question, then: how do the two modes of presentation compare?
Sure – you’ll accuse me of irredeemable, self-leaning bias, and I’ll have to take that hit; and while I’m at it, I’ll add another, incriminating mea culpa: I don’t think well visually, either. Ok, then; that means you’ll have to humour, or even better, patronize, me as we move along. But hey – it’s my blog.
But enough about me. We know perfectly well what it is the cloud means to tell us, but
- With what precision is relative word incidence conveyed? It’s clear the cloud invokes two presentational parameters – word size and shading, in the service of the comparisons, and are we entitled to assume as a result that sizing is properly proportioned, i.e., does a word appearing 12 times in the speech grow twice as large as a word appearing six? And how do the blacks and grays comport with one another? I see no apparent rule at work for daubing this or that hue across this or that word. Perhaps there is no rule, and the colors merely subserve some fetching aesthetic variety. But is this something about we should be wondering?
- More subtly perhaps, we could ask as well about word positioning. It’s clear that the words above are not made to stack in upright, size-hierarchical relation, but if not that, then what? If the design intent is merely random, ok – but what then do we make of that presentational gestalt, and its news-informational value?
- The word cloud conceit surmises a way to look at categorical data – essentially qualitative information, of the classic gender-religion-ethnicity stripe. These data can’t be directly scaled – maleness can’t be “more” of anything than femaleness – though of course they can be counted. Here the data are words, and because they aren’t mapped to either of the great existential properties of space or time – that is, they aren’t forced into place – the words can prance anywhere across the viz, owing to designer discretion. By way of counter-example: a data viz of crimes mapped to latitudes and longitudes necessarily slots the crimes into the coordinates at which they’ve been perpetrated. Word clouds incur no such compulsion, thus vesting placement decisions in the designer alone. Where, then, do the words go? The point is that they can go anywhere – and that is the point of the question.
My prosaic little spreadsheet-driven word count, on the hand, does just that – it counts word frequencies rather determinately, and sorts them too. Pushing the matter to the retro breaking point, you could even do this:
Now that’s not cool. But what does the reader want and need to know?
There’s something curiously interstitial about word clouds. Driven by a thrall with word frequencies – typically the province of academics and game designers – the clouds yet give pride of place to the imagery festooning the data. We’re given a beautiful arrangement of nuts and bolts – but for whom?
Do a Google Images search for word clouds and you’ll see what I mean. Or wend your way to any of the word cloud-construction sites – e.g., wordle, tagxedo, worditout, etc., and check out the possibilities. It’s fun to make word clouds, and there’s nothing wrong with that – but what are journalists doing with them?
True – Kristen Long’s cloud is a good deal more sober that those espoused by the novelty sites, and it delivers its macro point, to be sure. But don’t the basic informational questions remain there to be asked? (Also look at Wikipedia’s examples in its entry on clouds, including a nice key-word comparison between George Bush’s 2002 state of the Union speech and Barack Obama’s 2011 oration, the words sorted in alphabetical order.) How do clouds contend with the demands for reportorial precision?
Needless to say, all of the above points to the much larger question about the visual portrayal of data that might be otherwise delivered through more plebeian means, e.g., spreadsheets. Sorry – I can’t answer the question here, only ask it. But are journalists – properly seeding the clouds?