Given the indispensability of its remit, the UK’s National Health Service can’t help but make news, whether it wants to not. Nothing about the Service is low-profile – be it budgetary conundrums, misgivings about the fluctuating quality of its care, debates about reorganization, or charges of cover-ups – there’s always something there
And so it is with NHS data, made available here, and more specifically in a recent Guardian entry on the demographics of visitors to NHS Accident and Emergency (A&E) departments, coming to you in Excel form here:
Let’s direct our collective scrutiny to the Figure 3 tab, an hourly breakout of A&E patient arrival times extended across six consecutive yearly tranches (you’ll want to ask why the respective column headers comprise hyphenated year references, e.g. 2011-12, and I’m not sure of the answer and the yearly endpoints they mark. Moreover, the attendant Guardian story adds that the 2012-13 data encompass but 11 months). You’ll note in addition a second set of supplementary columns, these computing the percent of all visits by hour, and thus read down the columns.
In any event, the spreadsheet appears to have done most of work for us, intelligibly lining up the visits with their hourly points in time and culminating in a set of annual totals (row 28) that demonstrate the spike in A&E visits across the report span (again, recall that the 2012-13 numbers record only 11 months’ worth of data). But we could add a measure of value to the data by grouping the hours into larger, aggregated bins – set, say, to three-hour intervals, and seeing how the totals play out there.
Our first, necessary tweak toward that end is to install a blank row above those total data in row 28, thus excommunicating the totals from the remainder of the data set. We’ve discussed this matter before; leaving the totals in place and batching them into a pivot table would have the effect of counting and/or adding the same numbers twice, and that’s very bad form (and I also don’t’ quite appreciate the need to have merged cells A29:F29 in order to store that Source: caption. Text can happily splurge across empty column unimpaired, but that’s a separate and ultimately minor matter.)
But there’s something else amiss with the data that begs our attention, and that must be set aright. Remember first of all that time (and date) data are really, once the makeup is peeled off, prettified numbers, and it turns out that some of the hours posted to the A column are nothing but text. If you’re skeptical, enter
in the now-blank A28; you’ll get 16. COUNT, it will be recalled, counts only those cells bearing numbers – and we’re working with 24 hours, and thus pulling up eight numbers short. Click each cell in the A4:A27 range, and you’ll be told that some of them have been vested with the General, as opposed to the Time format, and while a General-formatted entry could be numeric, these Generals aren’t. Moreover, the very first hourly posting – 00:00, or midnight – is one of those text pretenders, and so any copy-down-the-column procedure can’t start there, because doing so will inflict a non-numeric format all the way down. So what I’d do is copy the contents of A5 – a bona fide time entry – and copy it to A4. That operation does two things: it delivers the authentic Time format to A4, along with an actual time value – 1:00 AM. You need both, because simply reformatting A4 won’t metamorphose 00:00 into a number. The 1:00 AM replaces our obdurate text at last, after which you can simply delete the 1: and dust off an actual, quantifiable 00:00 AM.
That remedy applied, we can select A4 and A5, grab the fill handle and copy down the column, braking at A27. Now all the hourly representations have been duly quantified, spaced one real hour apart. (Note: again, don’t think we’ve just bulged each adjacent hour by an increment of “one”. In fact the operative additive is here .041666, or 1/24 of a day. Thus 8:00 AM “really” possesses the numeric evaluation of .3333.)
Now back to that grouping scheme. We’re aiming to consolidate the 24 hourly records into eight bins of three hours, and to tray these into the Row Labels area of a pivot table, e.g.
And so on.</p>
We’ve done this before. Stoke a pivot table, and drag Hour of Arrival into the aforesaid Row Labels:
Nothing you haven’t seen before. Click PivotTable Tools > Options > Group Selection, deselect Months, and click Hours:
You may not realize it yet, but now we have a problem (and I’m not hinting at the 1/1/1900 end date supplied by the grouping defaults above; let’s hold off on that one for now).
And here it is. If you group a set of numbers under standard-issue grouping auspices, you’re asked to name a grouping interval by which the numbers can be bundled, in the By field e.g.:
But no such capability avails in the Grouping dialog box, if it’s oriented to time data. True; times are numbers, but once they’re conceived in chronological terms the dialog box changes, along with its possibilities. There appears to be no prospect for imposing a grouping interval upon time data.
Of course you could reformat the above hours into their native numerical character and propose to group them:
But do so and you’ll merely group the numbers as you see them above, stemming any recourse to the hourly motif (the 0.125 in the By field, by the way, references the desired three-hour grouping interval. .125=one-eighth of a day).
And I know what you’re thinking: group the numbers as you see them above, and then reformat them into their hourly guises. Good idea, but it won’t work: once a grouping has been effected, the data reverts to text and is no longer subject to numerical adjustments. After all, a grouped entity such as
sports that nettling hyphen, which augurs text status – and that’s that. So what to do?
I’m working on it.