Chicago Homicide Data: Two Months, and Sixteen Years

8 Mar

Even given crime’s perennial, dark-sided newsworthiness, the burgeoning toll of homicides in Chicago has come in for recent, concentrated scrutiny. The New York Times has subject the recent spate of killings in the city to recurring coverage, and even the president has informed his followers that if the bloodshed isn’t staunched there he’ll send the “feds” in to enforce the peace.

While it’s not clear who the feds are, less uncertain data on the totals are available, and with them some perspective. A recent piece on the fivethirtyeight site reminds us that while Chicago’s homicide rates have indeed bolted upward in the past two years, the figures don’t approach Chicago’s deadly accumulations of the 90s – not exactly good news to be sure, but a measure of context, at least, that grounds the discussion.

And as could be expected these days Chicago’s Open Data portal makes the homicide data available to us, along with the records of other reported crimes. You can access the crime data by traveling to the portal and clicking this link on its home page:


(note that the different Crimes – 2001 to Present – Dashboard link on the page will take the user to a series of charts founded upon the data.)

I then filtered the crime data set for records beginning January 1, 2016:



If you’re doing the same, and have downloaded the filtered 300,000-plus records (again, these recall all reported crimes) you may want to disgorge some of their fields, ones you’re not likely to apply towards any analysis (e.g., the X and Y coordinates that in effect duplicate latitudes and longitudes, the Year field, whose information can be derived from Date, and the curious Location, whose records bundle crime-location latitudes and longitudes that already appear singly and more usefully in Latitude and Longitude).

And because our download request gathers records for 2017 as well we can begin to develop some inter-year comparisons between homicide rates.

It should be noted by way of additional preamble that Donald Trump’s tweeted augury about federal action dates back to January 25, at which point the Chicago homicide total for 2017 exceeded that as of the comparable date last year by 23.5%. But as we’ve since advanced more deeply into the year (my download runs through February 27), let us extend the comparison through this pivot table:

Rows: Date (Grouped by Month and Year)

Values: Primary Type

Slicer: Primary Type, and tick Homicide

(Note the deployment of the Primary Type field in both the Values and Slicer position – an allowable tack).

I get:


You’ll note that the 2017-2016 January/February differential has disappeared. Homicide totals are now nearly identical for the two months; again, the Trump tweet referenced a homicide total of 42 through January 25, and likened it to last year’s figure of 34 through the same day. Those numbers of course are relatively (and thankfully) small, and raise a corollary sample-size question. And of course it remains to be seen if the May-August surge in homicides last year will be duplicated across the same interval in 2017.

Of course, once the Slicer’s Primary Type field selection is in place, any and all of its items remain available for the analyssis; and by clicking some of the other Primary Types the January/February inter-year comparisons don’t trend uniformly. For example, tick Narcotics in the Slicer (and I’ve turned off the pivot table subtotals) and I get:


Here the 2017 January/February totals fall far beneath the 2016 aggregates for those two months – by about 40%, and I don’t know why. The temptation is to ascribe the retraction to some rethink of the reporting protocol, but I doubt that’s the case, though my surmise is easily researched. Click Weapons Violation, on the other hand and

Here the 2017 figures far outpace those of the preceding year, even again as the homicide totals for the January/February intervals stand as near-equivalent. Perhaps the publicity about weapon-inflicted murders in Chicago has spurred the city’s police to identify weapon wielders more concertedly, with that aggressive enforcement pre-empting still more homicides. But that is speculation.

It also occurred to me that, because the larger Chicago crime data set reaches back to 2001, we could download the homicide data for all the available years, and analyze these in a data set dedicated to that lethal offense. I thus returned to the Chicago set and filtered:


The records of 8,334 homicides, again dating from 2001, populate the rows. A simple first pivot table could confirm the yearly totals:

Rows: Date (Grouped by year; remember that the 2016 version of Excel will perform that grouping automatically)

Values:  Date (Count)

I get:


The precipitous ascent of the city’s murder rate, up 85% from 2013 to 2016, is confirmed (of course some control for population increase need be factored). But some deeper looks at the data also avail. (Note that columns E through G contain uniformly identical data down their rows, and as such could be deleted. Note as well that Chicago’s population may have actually declined slightly over the reported period above.) For example – have the distributions of the crime across the 24 hours of the day varied over the 2001-2016 span? By itself, that question presents what is normally a straightforward task for a pivot table, but in this case we need to call for a workaround. That’s because if we want to simultaneously break out the data by year and hour of day we’d have to derive those data from the same field –  i.e., Date in two different various Grouping modes – and install these in the Row and Column areas; but you can’t assign the one and the same field both to Row and Column.

And because you can’t, I’d claim the next available column, call it Hour, and enter in row 2:


And copy down the column. Now we have a second, independent field that reports a time reading, enabling us to proceed:

Rows: Date (by Years)

Columns: Hour

Values: Hour (Count, % of Row Total).

Filtering out the incomplete 2017 data I get:


The table is dense but readable and worth reading, at least selectively. Keep in mind that the percentages read horizontally across the years, returning the proportions of homicides for any year that were perpetrated by hour (for example – a percentage beneath the number 7 records the percentage of all the year’s homicides committed between 7:00 and 7:59 am ). Some numerous and notable variations are there to be considered, e.g., homicides during 2004 accounted for 3.96% of the year’s total during the 7:00 am time band, but in the following year the figure for that hour fell to .44%. In absolute terms the numbers were 18 and 2. The percentages at 21:00 pm for 2006 and 2012 come to 8.81% and 3.18% respectively; the actual totals stood at 42 and 16.

Are these fluctuations predictably “chance”-driven, or rather, statistically and sociologically significant?

For that question, I’m not confident about my confidence-level skills.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: