Good day from the nation’s capital – London, that is. I don’t know about you, but because I’d exercised my franchise via the mails weeks ago, breaking off a rather deft end-run past all those madding crowds, your correspondent was left to spend the better part of his Election Day doing what you were really, secretly, hungering to do – track Obama and Romney tweets.
And that’s what I did. My methodology, faulty but dogged, comprised a sample-and-tally of tweets containing mentions of the respective candidates (again, the data source was www.twdocs.com) via the process I described a few posts ago. The intent: to learn how the tweet volumes stacked and tracked across the day. And so I conducted my own election count, launching 13 paired candidate name-search sorties into the tweetosphere, starting at around 6:30 am New York time (no, I didn’t get up that early; remember I live in London) and concluding around 6 pm – because citizen journalism is hard work and its practitioners need their beauty rest. (And remember again that the searches netted a maximum of 1,500 hits each; these were divided by the time elapsed between the first and the last of these, and then projected to an hourly estimate.)
On each of the 13 go-rounds I fired off a search for the term Obama, following that inquiry a couple of minutes later with a request for Romney in an attempt to synchronize search times; and while that routine was practicable enough, it begs the importantly prior question of what one could hope to learn by it.
The Twitter universe, after all, is nothing if not a self-selected cosmos – and an Obamacentric one. As of this writing, the once and future president had 22,000,000-plus followers; his erstwhile opponent had attracted 1,700,000 (and he had as many during the campaign, too). Endeavoring to draw definitive voter conclusions from tweet counts, then, is tantamount to wandering through Yankee Stadium and sampling major league team support across the country – you’re asking for an exceedingly nasty skew.
Moreover, additional, classic sampling caveats apply here as well. It’s impossible to know how many election-day tweeters were eligible to vote, eligible but not interested in voting, and/or citizens of some country other than the US.
And some tweet-specific cautions need be entered, too. It should go without saying that, unlike an election vote, a good many of the tweets vituperated against, rather than endorsed, this or that candidate, and in any event no shortage of tweets issued across the day from the same hand – a cyber-riff on the storied, ancient, vote-early-and-often exhortation allegedly broadcast to the Chicago electorate. And let’s keep in mind too that many tweets recorded the names of both candidates, thus matting the analysis with a layer of redundancy that needs to be kept in mind.
So what was there to learn? Sheer volume, for one thing, and candidate tweet differentials, too.
The two tables below archive the time of last tweet transmitted among the 13 tranches, the projected per-hour tweet volume, and the percentage of retweets marking the tranche.
A few obvious conclusions beg to be drawn. First, and most obviously, Obama tweets outpaced Romney’s at every time-point, though by nothing like president’s follower advantage. Note as well, however, the undulant Obama-Romney tweet ratios woven across the day, and observe the nascent, mutual tweet spike in the last sampling. In addition, tweet volume was mightily correlated with retweet percentage – .753 for Obama, and .826 for Romney. The very top-heaviness of the retweet distribution is cause for rumination as well; that so many tweeters saw fit to forward someone else’s comment on the most politically charged day of the year bespeaks the kind of community so often imputed to “social networking” (rather redundant, that term; isn’t all networking social?).
But a subtle pause-giver haunts the 1,500-tweet samples. Practitioners of the polling arts will tell you that a sample so sized is acceptably large, fit to capture most of the variation criss-crossing a given population (provided the sample has been properly drawn). But a spate of say, 500,000 tweets per hour translates to about 139 tweets per second; put otherwise, a sample of 1,500 drawn at that velocity shakes out across about 11 seconds – and 11 seconds is a small sample size. (Imagine a department of highways installing a light at an intersection, on the basis of an 11-second scan of the traffic there.) String a couple of contiguous 11-second stretches together and you might thresh very different tweet yields.
And that’s exactly what I did earlier today, executing three one-after-the-other searches for the name Obama and realizing these per-hour extrapolations: 91,843, 128,802, and 266,700. Plenty of variation there across a pinched time frame, hinting in turn that my data above could be riddled by considerable plus-minus tolerance (even as the Obama tweet margin over Romney prevailed at each sample point – and that finding is likely significant). Additionally noteworthy, though, was the great dearth in repeat tweet contributors – 3041 unique tweeters to 3080 tweets (and only 3080, because the original 3×1500 sample contained many time-overlapping tweets, and thus had to be winnowed out). The bottom line, then: more research needed, and samples pulling across a broader periodicity.
And I hope to get to just these questions as soon as the…ahem – research grant – kicks in.