Time for a bit of review. Spreadsheetjournalism is in the first instance about how spreadsheets might be placed in the service of story-seeking, a marching order coupled to a corollary remit – namely, to attempt, through a judicious tweaking of the data as they currently stand, to synergize some novel, reportable take on the information that wasn’t there before, or at least that had been hiding-and-seeking coyly in the brush. I’m not necessarily in love with the term, but we could call this sort of enterprise value-added journalism, an instance of which we saw it in the post on the 2008 election. Here’s instance number two.
Crime, like it or not, is a reporter’s staple, and crime data abound on the net, variously guised and variously usable. One notable stockpile of data occupies the shelves at data.dc.gov, a site devoted to informational bits and pieces on the nation’s capital; and I’ve adapted their Crime Incidents 2011 workbook for download here:
(The data in their original form had to be taken back to the shop for some retooling; had I asked you to call up the workbook directly from the data.dc site the attendant issues would have pushed us off message).
Look around, and begin to think about what’s up with the data. (Click here for an explanation of some the more recondite fields on the sheet; thanks to Christopher Marshall of dc.gov for the heads-up.)
Of course you could begin unprepossessingly, by simply aggregating all crimes by their categories (needless to say, our view is confined to reported crimes) by running them through a pivot table, or by subjecting them to the DSUM function or even, if you insist, trotting out the hoary and ungainly Subtotal option, a feature I never use. You could do any of these things, to be sure, but your findings won’t quite qualify as stop-the-presses material; we’re entitled to assume that somewhere in DC that story has already been filed. But wait – I think I’ve experienced a low-grade epiphany: what if we were to analyze the distribution of crimes by day of the week? That might stake us to a less-than-obvious angle on the matter, particularly if we could bare day-specific variations in crime types. There’s just one problem with all that: even as the workbook before us indeed sports a REPORTDATETIME field in the C column, actual days of the week are nowhere recorded. So that’s where we come in; we need to somehow pluck days from the dates streaming down C.
You’ll be pleased to know that the means for doing this is rather simple. Make your way to the K column, rewrite its heading to WEEKDAY or something kindred, and enter in K2:
(You’re doubtless wondering why I elected the K column. It’s because the data there are completely redundant, consisting exclusively of the word WASHINGTON, thus adding nothing to the narrative. Rather than advancing to the next empty column – W – I’ve opted for a column far closer to the data with which we’re working.)
You should return a number somewhere between 1 and 7. I get 1 (but if you’ve sorted the dates you may see a different value), standing for the first day of the week – i.e., Sunday, day one in Excel’s calendrical view. (And don’t be fooled, by the way – WEEKDAY gives back any day of the week, not just Monday through Friday.) Copy that formula down the column, and you should get something like this:
You see what WEEKDAY does, and does easily – it extracts the day, in numerical terms, from a date entry. But I know what you’re thinking: wouldn’t readability be served by replacing the numerical day entry with its associated name, e.g., Sunday in lieu of 1? Agreed; and here’s what I’d do: in a blank area, say starting in cell Z1, enter the values 1 through 7 down the Z column. In the adjoining column AA enter the days of the week beginning with Sunday (you may know that you need only actually type Sunday in AA1; return to that cell, click on its fill handle and drag down, and the other day names appear in sequence, a consequence of one of Excel’s built-in fill routines). Then select cells Z1:AA7 and name the range DAYS (by clicking in the Name box and typing DAYS and tapping enter:
There’s a very good reason to name a range, as we’ll see. Now return to K2 – the first of the day-identifying cells – and now type and follow with Enter:
What’s happening here? We’ve deployed a VLOOKUP function, a near-indispensable tool in the box that consists, in its default mode, of three elements or arguments, as they’re officially termed. The first alludes to that which is being looked up – in our case the numbered day realized by WEEKDAY, which has now been nested in the lookup. The second argument – the reference to DAYS – names the lookup range in which the numerical value we’ve just returned with WEEKDAY is to be looked up; and the final argument, here the number 2 – pinpoints the column in which the “answer” – that is, the day name – is stored.
In other words, and assuming the day in cell C2 is Sunday, the lookup returns the number 1 via the nested WEEKDAY formula, then reckons that number against the DAYS range (in that range’s first column), and then “looks” in turn to the second column – the 2 in the VLOOKUP formula – for the day name, in this case Sunday. In other words, 1=Sunday. And naming Z1:AA7 liberates you from the chore of having to deal with all those nasty absolute-reference dollar signs that beg your attention when you copy the range down a column. Naming Z1:AA7 as DAYS means that those coordinates always remain Z1:AA7, irrespective of any new, copied location.
And once the weekday names have been engineered in K, it’s but a short commute to a revealing pivot table (note: you should first click the Refresh All button in the Data ribbon before you proceed, so as to incorporate that amended WEEKDAY K-column heading into the table-to-be). Then once you’ve inserted the pivot table
Drag WEEKDAY to the Row Labels area.
Drag OFFENSE to the Column Labels area.
Drag OFFENSE (yes, the selfsame field – see the quick note-in-passing on this issue in my August 7 post) to the Values area.
You should see:
(ADW stands for assault with a dangerous weapon. THEFT F/AUTO signifies theft from an auto.)
OK – that looks moderately cool, but before we scan the data you probably want to know why Monday is instated first among the days, particularly after we observed that the WEEKDAY function nominates Sunday as day 1. Good question, the answer to which seems to turn on the day listing in Excel’s Custom Lists area (reachable via the File tab>Options>Advanced>General drill-down), in which Monday indeed is assigned the first position.
In any event, if you’re happy with the alignment above, fine. If, however, you do want Sunday to head the day list, right click on Sunday and maneuver here:
and click accordingly.
Now for the data. It’s notable that far fewer thefts are committed on Sunday; whether that ties in to a day-of-rest mindset I don’t know. Note in addition the homicide and ADW totals for Saturday, for example. There’s a good deal to think about here, as some crimes don’t appear to be distributed randomly – and the sample size of over 32,000 offenses is pretty formidable.
Now watch this. Click in any cell in the pivot table and click Pivot Table Tools>Options>Show Values As>% of Column Total:
34% of all arsons committed on Tuesdays? Hmmm. 44% of all homicides perpetrated on the two weekend days? The next step might be to contact a criminologist at a local university, or some police bigwig for some learned explication. In any case, one trusts you see the possibilities.
Of course, this is Washington DC, and you probably don’t live there. But I’d suggest it’s likely that your local police keep day-sensitive crime data; whether they’ll release it to you is another matter. If they won’t, get back to me: I’ll write you a supporting letter on my stationery.