Double Formats and Double Counts: UK Voter Data

3 Jul

It was somewhere – I think it was here, for example – that I filed my brief for the two-spreadsheet thesis, a meek bit of iconoclasm about the variable intentions by which the sheets could be guided, and how the sheets might look as a consequence.

You’re requesting a translation. What I strove to say then is that some spreadsheets are meant to be merely read and considered, while others – again, in virtue of their construction – place themselves in the service of those who would analyze the data before them. And a prescriptive follow-on recommends that data gatherers should, if the inconvenience to them isn’t prohibitive, make their spreadsheet available in two versions, one for the readers, another for the analysts – because the two could look very different.

Well it turns out that the United Kingdom’s Office of National Statistics (ONS) has commendably done just that, at least for one of its myriad of spreadsheets: a count of UK voters eligible to apply their franchise to local-government and/or parliamentary elections, broken out by what the sheet calls local government areas and counties/boroughs. If nothing else the data daubs a bit of deep background to the Brexit referendum that’s roiled the country.

Two separate ONS workbooks iterate and reiterate the data, one in a book described as formatted, the other unformatted:

PIVOT TABLES-electoralstatistics2015uk Formatted

PIVOT TABLES – electoralstatisticsuk2015 Unformatted

(As always, you’ll want to consider the clarificatory footnotes attaching to both workbooks.)

What then, is “Formatted” about the formatted workbook? That, as it turns out, is a question that lends itself to both literal and slightly more figurative understandings. The bold-facing of principal headings on the sheets epitomizes the most text-bookish of formatting highlights, of course, but I think more is meant by the term here. Note for starters the blank C, E, G, I, and K columns that interrupt the data and, left as is, would almost literally stand in the way of a fruitful manipulation of the numbers (small exception: I see the value 0 in I45, and I’m not sure what it’s doing there).

No less evident of course are the blank rows that stream between key regional areas on the sheet, the classic data-entry no-no that could perhaps be deemed forgivable here because again, this sheet is made for reading, and not number-crunching. But at the same time, and as a matter of definition, it could be asked if blank columns and rows implement format changes; it seems to me that these devices are better appreciated as substantive design elements, if that quibble matters.

But there’s another data quirk which could, one supposes, fitly enwrap itself in the format banner. In the Formatted rendition, Table 1 layers its data in a kind of stepped fashion, with smaller political demarcations recessed further the to the right of the sheet, as per bullet sub-points. But look at Cell B23, for example, and the cells beneath it:

format1

The entry for Gateshead exhibits an indent, one brought about simply – and questionably – by two taps of the space bar:

format2

That vertical line is the cursor, flickering in its favored habitat, the formula bar; and if you subject B23 to some LEN-formula scrutiny you’ll get 10, even as the city name comprises eight characters.

Now again one could retort that since we’re studying the Formatted sheet, the one cast in effect into read-only status, the space-padding doesn’t matter much, and that’s probably technically true. But superfluous spaces are superfluous, and in view of the fact that they were apparently imposed upon the names in numerous rows down the sheet, the designers committed themselves to a fairly painstaking and ultimately unnecessary space-bar-tapping chore. Unnecessary, because much the same effect can be attained more productively and elegantly via an option I’m not sure I’ve ever used in a real spreadsheet – the Increase Indent button:

format3

It’s right there in the Alignment button group, but I’ve barely said a word to it (and vice versa). But Increase Indent does something interesting: click it on a text-bearing cell and the text pushes rightward, as if you’ve typed a few superfluous spaces. But type the word “hello”, say in A3, return to the cell and click Increase Indent, and then write

=LEN(A3)

somewhere else, and the formula returns 5, because Excel regards the indents as character-free.

Thus Increase Indent does a pair of useful things: it drags its text deeper into its cell, befitting an outlined sub-point, but at the same time does nothing to impair its actual length. And I think it could have been used here to particularly to good effect, for example in cells A10 and A12 in which the entries ENGLAND and NORTH EAST are meant to subordinate themselves to the subsuming UNITED KINGDOM, and yet are made to appear as equivalently aligned co-equals.

And what of the unformatted version? For one thing, and I’m looking at Table 1, it properly piles its data into contiguous rows sans any blanks; but by top-loading the data set with voter aggregates – e.g., the figures for United Kingdom, England, Wales, etc. – the data set capitulates to the double-count error, by seating national, aggregated data at the same table as individually itemized rows. If you want to pivot table the data, then, you’ll need to identify and then disaffiliate the double-counting rows from the set, a mandate whose fulfillment I found surprisingly challenging – because the codes keying each row’s regional/demographic level attest those levels begrudgingly.

In other words: it’s clear that the data’s first 16 rows – 2 through 16 – communicate aggregated data, and as such need to be sent on their way, even as we want to remember the numbers in row 2, which after all total the UK’s registered electorate for 2014 and 2015 (see the definitions of Attainers on the sheet). We thus want to work with non-aggregating rows whose voters ultimately total 46,828,163 in column C, for example – that representing the UK overall voter total.

But delete rows 2-16 and add the remainers (no Brexit pun, there) in C and I get 70,544,681, or about 24,000,000 voters too many. That means there’s a whole lot of double-counting still going on, driving me in turn to the Table 1 Metadata tab and its code legend. You’ll note that the codes through row 24 denote the 16 rows we’ve already deleted; I then pursued an educated guess and deleted all rows bearing codes commencing with the E 1 preifx, these standing for County data that I presumed to contribute to the double-count as well. My new total for column C, which now comprises 391 rows: 46,045,889, close to but surely not coterminous with the national United Kingdom figure in C we’ve since deleted. But copying the sum formula in C onto D yields 46,204,725, precisely the United Kingdom total for 2015 voters, and a further copy to columns E and F gets the F (Attainers) number exactly right, and with another near-miss for column E. That is, the 2015 data seem to have now been properly adjusted for data-doubling, while the 2014 remain a wee bit discrepant. I’m open to suggestions about it all.

Also, you’ll note the numbers in the 2015 columns feature commas, and the 2014 data don’t. But that’s a formatting matter.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: