Here’s a droplet from the torrent of spreadsheets loosed weekly by the UK Office of National Statistics, one that embodies an interesting, if messy, set of problems for the analyst:
The data historicize British monthly unemployment claims data from a June 1881 point of inception, and drill down to May of this year, across two different sheets.
There are, first of all, significant comparability issues across the years and the sheets, some of which are expounded in the linked document on the Historic Data sheet as referenced in cell A5. If your intentions here are investigatory, those assuredly beg your attention. Indeed, the qualifier in cell A7 of that sheet warns against a melding of the UK and Historic Data – the latter only counting claims from 1881 through 1970 – a recommendation that presumably justifies the very fact of two sheets.
That’s one issue and a real one, albeit of the extra-spreadsheet kind. But there yet might something to learn from each sheet regarded singly – namely, something about unemployment claims by year.
And to advance that prospect, all good pivot tablers will doubtless move to group the Historic Data (let’s start there) by year(s) and proceed to average their claim rates (which may roughly proxy for unemployment percentages).
And it’s right there where you have to start paying attention. The dates tumbling down the curiously untitled field (we’ll call it Date) in A don’t seem to be dates at all, but rather indolent text; but for once, as we hope to see, these impersonators will serve the analysis well – because Excel can’t evaluate putative dates that were calendared before its baseline day of January 1, 1900. That day receives the date number 1, with every succeeding day counting off its relation to the foundational January 1 by an increment of 1. Thus December 17, 1983 quantifies itself as 30667, for example, the number of days distancing itself from the January 1, 1900 touchstone.
Thus Excel can’t ratify Jun 1881 in native date mode (though a VBA-macro –driven workaround is described here) – and indeed, even if all the entries in the Historic Data sheet postdated January 1, 1900 they’d still resist grouping, because they’re text.
But for our purposes, that insufficiency shouldn’t matter. To commence what could be billed as faux date grouping, all we do is check into the D column, head it Year, and enter, in D12:
That expression drags the 1899 out of the A12’s Jun 1899 and upgrades it to full-fledged quantitative standing. Copy it down D and you’re in business, priming this sort of pivot table
Row Labels: Year
Values: Rate (%) (Average, honed to two decimal points. But note that 31 months report NA in lieu of an actual rate).
Because the Years are duly numeric, we can group these into say, five-year bins, yielding:
You’ll note the Depression-driven 19.10% in the 1931-1935 bin.
Now there is, apropos the discussion above, a formatting curiosity that’s wormed its way into the dates. You’ll see that the A12:A234 range bears the General format, befitting a non-committal receptivity to any and all date/numeric types, and these precisely dovetail the pre-1900, text data. But A235 – entertaining the Jan 1900 date –asserts a Custom format, consonant with a numeric entry – even as we’ve allowed that all the field’s cells are text. And sure enough – click A235 and jab at the Number Format drop-down menu in the Home > Number button group:
You’ll see that every formatting variant therein fails to vary, each one reporting the same Jan 1900 expression; and that very constancy, Custom format notwithstanding, means – text.
But ok – again, that discovery does nothing but rubber-stamp our earlier averral, to the effect that all of the entries set down in the Date field are merely, and only, text. But what’s interesting about Jan 1900 – and by that I mean ‘Jan 1900 – is that, some wordplay intended, it isn’t completely valueless.
To explain: given the overweening centrality of numbers to the spreadsheet remit, Excel accords certain quantitative properties to apparent numbers costumed as text. Enter ’17 in A3, for example (and the apostrophe textualizes the entry) and go on to enter =A3*2 in A4, and you’ll get…34, a really, truly, usable number. And less obviously, enter ‘Jan 1900 in A3 and type =A3*2 in A4, and you’ll get 2 – and that’s because Excel takes the liberty of treating ‘Jan 1900 as January 1,1900 – numerical value 1. Multiply it by 2, then, and you get 2.
Thus even as ‘Jan 1900 owns up to its text-label lineage – and as such, can’t be grouped by a pivot table, for example – it’s happy to consort with numeric operators, and pass itself off as a number when asked to assent to the deception.
And if you want to work with the presumptive years in the UK sheet, you’ll want
because the years in the sheet’s A column position themselves before the month in the UK date cells. And because they do, they won’t return the same kind of de facto numeric values that surprised me in the Historic Data sheet. If you want those, you’ll have to restructure the data thusly, e.g., the 1971 Jan entry in A6:
The RIGHT/LEFT extraction reverses the placements of the 1971 and the Jan, after which an authentic date/number should emerge.
Should. But scroll A column downwards, and you’ll drive past entries such as this one, in A230:
That footnoting pound sign, or hashmark (its name depends on the side of the Atlantic on which you’re reading this), inflicts one more dollop of complication into the mix, requiring this formulaic rewrite:
The nested MID rescues the Sep from its entanglement with that sign,as that month name begins in character position 6 in A230.
Ho hum; all in a date’s work.