London Police Stats, Part 1: A Look at the Evidence

23 Jan

 

If the way in which the official crime rate is compiled is dodgy, too, then you have a problem; and something like that, at least in allegation form, is what’s happening in the UK nowadays. The country’s Statistics Authority has recalled its “national statistics” accreditation of police-issued crime data, pending an Office of National Statistics-led inquiry into the manner in which the numbers are put together, and its methodological rectitude.

Serious business, to be sure, serving up a helping of scepticism upon the spreadsheets banked in the (London) Metropolitan Police web site from which this 2012-2013 report is featured:

 wards

Of course a spreadsheet can’t body-scan the ethical innards undergirding its data, but there may be some things to be said on that count. But in the first instance, there are enough spreadsheet-qua-spreadsheet goings-on here to keep us busy for a while.

The spreadsheet interestingly breaks out London crime reports by borough as well as the 624 city electoral wards (remember that number), these averaging about 20 per borough. As I said, there’s a lot to do here. First – and we’ve certainly seen some of these oddments before – I’d muscle a blank row between 4 and 5. Row 4 contains merged global titles which could only complicate any pivot tabling endeavour with faux field headings, and we don’t need that, do we? Second, note the portico of hidden columns locked behind B and G, but I want to hold off on these for the moment. Now check out the field headings in B6, G6, and H6. The ones in the B and H columns have been colored white, for reasons to which I’m not privy; I’d color them black in the interests of simple visibility. But B6 is wrongly identifies its data Borough Name, when in fact the contents that issue below are wards; I’d thus rename the field Borough Ward. G6, on the other hand, has no title at all, and I can’t explain that either. I’d call the field Borough Name, because that’s where that heading belongs. I’d also select row 6 and free its columns of their Wrap Text shackles, which are snapping field names in odd places, and follow up with a global column autofit. And because column B is thus unnecessarily extruded in virtue of that marquee-length title in B2, I’d either delete the title or move it to the nearly-abandoned A column and autofit B anew.

And note – and this is something I admit I’d never considered – the autofit straightaway uncovers the hidden columns in C:F, too (there seems to be a Visual Basic workaround here if you want the columns to resist exposure; see this discussion It is also fair to ask why these were hidden to begin with, a question that does not mean to allege spreadsheet foul play; after all, the columns were merely and always a double-click away from revelation. It’s possible that these behind-the-scenes data were meant to serve the analytical intents of police only, and were deemed to offer little edification to the public. Still, the question remains.)

But now that those columns are in fact in view, you’ll observe that none of these are titled, either. But before we do the renaming thing, take a look at the formula in C7:

=D17&”/”&E7

That expression means to string together the Month data in the D column – properly realized via the MONTH function – with the slash symbol “/”, and the year, reported in E (i.e., =YEAR(B7) – but we need to return to this field). But what is this formula doing? The formulas’ yields – e.g. 1/2012 – are pure, inanimate text, and I’m at a loss to understand what quantum of additional understanding these outcomes bring to the narrative, though as always, I am happy to be re-educated.

In any event, if you decide to leave the column in place you need to title it; I’ll call it Date (though were I broaching the sheet in earnest I’d probably delete the column altogether). You’ll then need to title columns D, E, and F. I’ll go with Month, Year, and Ward Count, respectively – which has the bewildering effect of releasing a cascade of #VALUE! error messages down the better part of three columns and which, needless to say, need to be explained. So let’s repeal those headings and proceed to something less unnerving – like putting the scissors to all those decimals in C:E, none of which seem to be needed (I think you’ll need to turn to the Format Cells command here in order to do this; clicks of the Decrease Decimal button on the Home ribbon won’t do the job here).

But it gets much curioser, and we haven’t even gotten to ask what the Ward Count field is doing there at all. Look at the formulas in D8 and E8:

D8: =IF(F8=624,IF(D7=12,1,D7+1),D7)

E8: =IF(AND(D8=1,F8=1),E7+1,E7)

In view of the completely unimaginative but wholly satisfactory formulas in the preceding row, I have no eureka-emitting idea why the above pair and their copied minions beneath them were interjected into the sheet, not when =MONTH and =YEAR are available, and again, these were actually written to row 7. (The first, a nested IF statement, turns out month numbers after seeing to it whether any given Ward Count exceeds the maximum 624 (there’s that number again) and the entry immediately above it in the D column equals 12, or December. If both conditions are met the IF statement returns 1 for January, restarting the year-month sequence for the next year, in our case 2013 (the data after all only capture two years’ worth of records).

The slightly more self-evident formula in E8 (bearing the year data), a multiplex IF/AND formulation, stipulates that if the accompanying month and Ward Count data in the D and F columns both compute to the number 1 – hence declaring a January month and the very first ward of the 624 – the expression culminates in the year-value 2013, that is, the value in the preceding E cell plus one. Thus E7495 – the first of the 2013-sporting cells – canvasses a 1 in both D7495 and F7495, and so having met both IF/AND conditions, adds 1 to the value in E7494, which happens to be 2012.

But none of this accounts for the placement of these formulas themselves to begin with – not when =MONTH and =YEAR could have been copied down the relevant columns.

All of which takes us to the field previously known as Ward Column and those #VALUE! exclamations. What seems to be happening here is a simple numbering, and renumbering, of all 624 London wards by each monthly mention, a kind of de facto internal check seeing to it that each ward is reliably present and accounted for throughout the data.  It’s not inconceivable that the field was dedicated to that very end by the spreadsheet designers, but if that were the case it seems to me its utility has since obsolesced.

But what about those #VALUE!s? They’re a gruesome resultant of a wholly honorable intention – my decision to enter the heading Year (or any other textual posting, for that matter) at the top of the E column. Yet entering Year in E6 wrought its share of havoc because the formula in F7 reads

=IF(E6=624,1,E6+1)

thus referencing none other than the title-bearing E6, which, in view of its textual content, discombobulates the formula with unreadable material; and the error message that ensues unleashes a knock-on effect on all the other, dependent formulas.

That’s one problem. Now look at the formula in F8, the next row down:

=IF(F7=624,1,F7+1)

Same column, and yet the Es have shifted to Fs.

And I can’t explain that discrepancy either. But if you’re looking for fixes, try this in F7 and copy it down the column:

=COUNTIF(B$7:B7,B7)

That should work, but I get paid by the word and my editor is getting nervous. Let me aim for an explanation in the next post. But at least now you can title columns D, E, and F.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: