Striving for cheap dramatic effect is my expository stock in trade, and I did myself proud last post. I had closed the first installment of our survey of British Transport Police crime data with a boffo teaser of a homework assignment, of sorts: namely, what needs to be done about the aggregate station passenger numbers (when available) which restate themselves in every station-specific record. Again, for example, Abbey Wood’s 1515106 passenger figure – an overall tally for the station – stakes all the Abbey Wood entries, even as these count individual crime totals for a particular month in a particular year (I should add that I’m still not unassailably sure if the passenger numbers have rolled up the traffic across the entire February 2011-Jauary 2013 interval with which the data work, but I assume they do. I have emailed data.og.uk about this).
So we’re left with a confusion of strata, as it were; month-specific crime totals always accounted against what appear to be 24 months’ worth of passenger traffic. We seem buying an apple mixed among the oranges, then, when in fact we’re looking for a given month’s crime figure packaged with that month’s – and only that month’s – ridership total. And that doesn’t seem to be there.
Still, there is a bit of useful work to be done here, I’d submit. You could divide each record’s Crime Count by its passenger number, churning out a miniscule but meaningful percentage that could be added to all the other same-station percentages. Because each crime count would divide itself by the same, unvarying station passenger number, the combined percentages should build a supervening percentage of all station crimes, as a fraction of its passenger traffic.
And so those marching orders would direct me to name a new field in the M column, say Percent of Passenger Traffic (again, I’m assuming, as per the previous post, that the All Crime and ASB records have been shown the door), and step down into M2 and enter, with all due mindfulness of the fact that some stations have no passenger numbers:
Then copy down the column. Because of their smallness, a good many of the numbers evaluate to apparent zeros or scientifically-notated cocoons; and while all of these could be reformatted into legible, multi-decimal-pointed values we’re really interested in their combined effect, once they’ve been channelled into this pivot table:
Row Labels: Station_Name
Values: Percent of Passenger Traffic (sum, formatted in Percentage terms, say to five decimal points. Remove the Grand Totals).
Once you’ve gotten this far you could, for example, right click among the station names and conduct a Filter > Top 10 run-through, opting for say the top 20 (you’ll have to go ahead and sort these descendingly by yourself, though). I get:
Confession: I don’t think I’ve heard of any of these stations, even if Google and Wikipedia have. The Ardwick outpost is somewhere in or near Manchester, and it won’t be pleased to know that in purely percentage terms its 6 reported crimes – reckoned against a reported passenger traffic total of only 334 – puts it atop the list. Do those numbers sound right? No, but they apparently are; Wikipedia tells us the station is unstaffed, adding that “In [the] 2004-2005 financial year only 285 passengers used the station, or fewer than one per day, increasing to 358 in 2005-2006“. Interesting place, and yet Ardwick rocks with activity when sized against number two New Clee, in Northeast Lincolnshire (I don’t know where that is, either), and its 149 straphangers, though at 149 no one is hanging onto any straps in its cars.
On the other hand, Mr. Wikipedia counts 616 commuters for Ardwick across the 2012-2013 financial year, and 334 for New Clee’s for 2011-12, and those departures from the BTP data are bothersome, begging the question about other under or overcounted stations. That claim on our investigatory attentions is in the first instance to be put to the data gatherers; but whether the question is at the same time slightly political is for the powers that be, and those who write about them.
In any case if you add Crime Count (Sum, and formatted to decimal-free Number mode) to the Values area and rerun a top 20 (by Sum of Crime Count), you’ll get:
I’ve heard of most of these. In purely literal, quantitative terms the Victoria station (excluding its underground stop, which ranks 18th above) heads the enumeration, even as its ranking here by percent – 13th – does not. Note as well the unrelieved zeros lining up by the St. Pancras International (that is, Eurostar) station, a consequence of its unreported passenger figures.
We can begin to wind this discussion up with this small, quick table:
Row Labels: Location_Type
A subtle measure, the figures perhaps reflect the crime-“facilitating” properties of duration and place. Train journeys typically extend longer than the waits for them to commence, and so in theory avail more time for wrongdoing; but stations afford swifter egress from the scene. We see that 64% of the crimes were station-bound, a proportion that needs to be thought about with some due diligence.
Then reset the table thusly:
Row Labels: Crime_Type
Column Labels: Location_Type (moved here for presentational reasons)
Values: Location_Type (Count, Show Values As > % of Column Total (and retract Grand Totals):
Note the likenesses and the disparities. Public Disorder and Weapons (however defined) account for about the same distributions across the two sites, but Other Thefts (again we need a definition) proliferate on trains (drug activity, on the other hand, is by far the more likely in stations, where perhaps rapid transactions can be consummated, followed by rapid disappearances).
And now I don’t know about you, but I’m tempted to book my ticket to New Clees. Good seats still available.