UK Election Poll Data: A Vote for Change(s)

21 Jun

It was Emerson who had little time for a foolish consistency, but Emerson never designed a spreadsheet; and had that 21st Century chore been thrust upon him he might well have backed away from the adage.

Because consistency – it all its unthrilling, small-minded, philistine tenacity – is part of the deal. Nothing foolish here; datasets, and databases (I’m not equating the two, but humor me), won’t work until each of their fields treat their records with the evenest of hands. After all, we’ve more than once time observed and experienced the oily slicks a subtotal row can spill upon a pivot table, and the havoc wrought by a column of dates, half of whose entrants are text frauds, flashing their bogus IDs at the bouncer. There’s such a thing as a foolish inconsistency, too.

Bad metaphor, but get your data on the same page, then, if you want them to work properly. And the UK election poll data lined up by the Guardian and the market researcher ICM  that’s tugging at your sleeve in this spreadsheet

 Copy of All Guardian_ICM poll results

may have a lesson or two to teach about the consistency thing.

The sheet, founded in Google Docs mode (and you can get that iteration here), has an interesting story to tell, hammering in place an extended track of UK national-election poll data proceeding from June 1984, with the actual election numbers booking passage en route. First things first: I would rule the Fieldwork dates field disposable by themselves (you’ll see what that means shortly); its holdings merely submit text-formatted corroborations of the more-or-less (we’ll see) genuine dates holding down the other side of the data set in A  – and so I’d drop a blank row between G and H, thus distancing – but not obliterating – the field.

But you’ll also note the sudden capitulation of the data in A to the text format picking up in row 390, and worsened in 400 by those campaign poll entries, which harbor no date pretensions whatsoever (something about consistency). You can’t use these as they stand, of course, and I don’t know how they found their way into the field at all. In any case, rather than stepping through a gauntlet of formulaic repairs, I’d simply latch on to A389 – owning up to a date-worthy format, click the Format Painter button on it, copy its format through A391-A496, and enter the first date in the corresponding Fieldwork date hyphenated spans (e.g., entering 13 June 2014 in A390), and thus properly dating all the items in A. (Note: if you’re trying this in the US, be vigilant about the regional date format disparities.)

Next, the informational rows 407-409 must be furloughed from the principal data; again, a simple deletion or an intervening blank row will make that happen. O, the other hand the field colorings of the party-specific poll percentages hueing columns B through E are, for the investigator, neither here nor there; they bear no conditionally-formatted utility, and rather do what they purport to do – simply differentiate the parties by color.

Slightly more eyebrow-furrowing, though, is the sudden reversion in row 320 to the two decimal points now puffing all the percentages, in addition to the actual election-day rows in which the points have already been vested – except for the election report in row 405, which shows up in but one decimal:

poll1

I should add that the native Google Docs sheet doesn’t give way to the two decimals in mid-sheet. It seems the document has undergone some loss in the translation to Excel that I can’t yet explain, though the data as such are of course none the worse for the experience. 37.00% is one and the same with 37%, but the reader is left to wonder about the intermittency of those decimals as a result, particularly as their zeroes bring no material clarification of the numbers. A little more consistency would seem in order here.

And the same could be said about the most variable row heights oscillating across the records, a consequence of the Wrap Text enforcement meant to rein in the extended GENERAL ELECTION RESULT postings in the Sample field in Column. G. The simplest palliative: sizing an unvarying, maximized row height that by definition makes room for every entry.

Of course these contrarieties only beset the formatting properties of the workbook, and as such don’t adulterate the data whose values ultimately matter most. But a book’s patchwork presentational mien gives pause about the fitness of the data as well, a perception to which no spreadsheet designer wants to give succor.

Still another presentational curiosity lands itself in the CON LEAD OVER LABOUR field, wherein Labour poll margins over the Conservatives appear in reddened characters and Tory advantages effect a blue tint. These could and should have been conditionally formatted, and I see no evidence that they were so treated, neither here nor in the first-instance Google Doc (I’m prepared to be corrected here, however). Moreover, I cannot account for the surprise insurgency of a formula in F322 in lieu of the hard codes evidenced in the field till then; and after a two-row reset the formulas resume in earnest in F325 and don’t let up for the remainder of the sheet. Again, that little shock doesn’t contaminate the numbers – but why start writing formulas in F322?

And finally, perhaps, we need to take note and stock of the seven actual General Election result cells mingling with the other records. By definition of course these aren’t poll-sampled results, and as such perhaps don’t belong here at all, though they do not inflict the same double-count contortions of a subtotal. If you decide they need to be disinvited, you could conduct a seat-of-the-pants find and replace, finding the word General in the Sample column and replacing it with ZZ, sorting these new entries to the bottom of the data set, and severing them from the pack with a blank row (you need that ZZ replacement to begin with because many of the other cells in the field report an n/a, which would sort beneath the original General Election Result legends).

Now you could, for example, do some poll result grouping by date, provided you have in fact reformatted those text scammers in the End of fieldwork/election data field in A.

But we need to ask: had you or I had designed this sheet, wouldn’t you have proceeded with a bit more consistency?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: