The Plot Thus Far
You like to tweet? I like to tweak. Before I so rudely interrupted myself with that simpering apology below, the one about which you remain undecided, you’ll recall we spent the better part of the first installment (even farther below) assaying the King County crime-tracking spreadsheet, with the intention of reframing it for a new set of purpose-driven looks at the data. Again, the practice of emending someone else’s data needn’t always be construed as a headlong critique of the original; rather, the overhauls we want to carry out here should be read as an attempt to personalize the data, so to speak. If you’ve yet to download the workbook you can do it here:
(You should then review the data modifications we enacted in Part 1.)
Picking up the sheet, then, at the point at which we had left it at the close of Part 1, another perambulation across the data yields 20 records (at least that’s what I get) whose Months Reported amount to fewer than 12, or in other words, less than a year’s worth of data. Of these, four rows offer up 0 months, meaning that, for whatever reason, there’s nothing here to report – no crimes, nothing. Leaving aside the reasons for that data dearth, I’d sort Months Reported descendingly and jam a blank row immediately above those zeroes, thus isolating them from the substantive rows.
But those remaining, fewer-than-12-month rows need to be thought about, and thought through. After all, these 16 or so records do divulge crime data, but because their totals are necessarily fractional we need to extrapolate these to an entire year. For example – the Index Crime Total (the sum of all reported crimes) in column E for Algona in 2004 shows 40 for its nine months’ worth of tracking; but 40 doesn’t represent a rate, but rather an absolute figure, and as such needs to be mapped to 12 normalized months. What I’d do here is motor to the next available column – T – head it something like Projected Crime Total (if you’re bothered by the disparate title formatting you can click on S5, click Format Painter on the Clipboard button group, and click T5), and enter this formula in T6:
and copy it down the column. (Important note: if you’ve copied here with the fill-handle-double-click technique you may find the copied results extending even to those cells on the other side of the blank-row divides we imposed on the data. This curious overreaction on Excel’s part seems to have been stirred by the auto filter buttons sewn to the column headings, though I don’t have a confident interpretation yet as to exactly why. Turning the filter buttons off, though, appears to forestall the complication; but if you do find zeroes in the blank, demarcating row near the bottom of the data you need to delete them.)
In any case the formula above considers the value in the S column, simply returning the Index Crime Total if the S value is 12 (thus signifying an entire year), or if not, multiplying the Index Crime Total by the appropriate reciprocal fraction. Thus for our Algona case, the 9-month Crime Total of 40 will be swelled by 12/9, ratcheting the total to 53.33. Of course this sort of linear, straight-line conjecture is likely imperfect, but 53.33 better surmises the Algona data than the original, but partial, 40.
In view of the above, prudence recommends we put an effective crime rate into play as well. Hang a right to the U column, title it Effective Crime Rate, and enter the following in U6:
and copy it down the column (keeping that caution about the blank-row-zeros in mind).
There are Rates, and There are Rates
Now what about these crime rates – that is, aggregated, year-by-year rates for King County writ large? Those data aren’t in place in the worksheet yet; what are available to us right now are the County’s city-by-year rates, and true – we could orchestrate a pivot table assay of a yearly average of the rates, by simply dragging the Total Crime Rates per 1,000 Pop. (column F) into the Values area, breaking it all out by the Year field, and selecting the Summarize by > Average option. In fact, that tack is in my view an arguable strategy, but it accords disproportionate weight to smaller cities (think, for example, about simply averaging two batting averages of .200 and .300, in which one player has 10 at-bats, and the other has 600), and doesn’t hold any pride of place as a standard metric.
What I think we really want, then, is to hard-wire this basic equation
(All Crimes/Total King County Population)*1,000
into a pivot table and see how these recombined data play out by year. And in order to put those numbers on the board we need to introduce a new term into the equation – the calculated field.
I suspect that a great many pivot table devotees have yet to scavenge it from the toolbox, but the calculated field is something you’d do well to know about, and bringing it to your table isn’t a fearsome deal either.
A calculated field is an on-the-fly field gleaned, or calculated, chiefly from existing fields in the pivot table’s data source, but at the same time it remains accessible only through a pivot table. Now that’s a terribly abstract first pass at a definition, but as we proceed the concept should sharpen into intelligibility.
Keep in mind what we want to do: Total all the crimes in King County by year, divide that total by that year’s King County combined population that year, and multiply the yield by 1,000. But precisely because the crime data in our spreadsheet is atomized by year and city, it falls to us to do the aggregating, and here’s how:
Start up a pivot table and
Drag Year to the Row Labels area
Make sure you’ve stationed the cell pointer anywhere in the Row Labels area, and click PivotTable Tools> Options > Fields, Items, and Sets > Calculated Field. You should see:
Type a name in the Name field, say County Crime Rate, click in the Formula field, and then click on Projected Crime Total (remember this is the field we appended to the data source) in the Fields area and click Insert Field (you can also double-click the field name). Type a “/”, the division sign, and click Population and Insert Field. Enter parentheses on either side of this incipient expression so far (but to the right of the equal sign) and conclude the process by entering *1000 to the right of the close parenthesis. You should see
Click OK. Now you should see
The calculated field immediately whisks into the pivot table, and also enrolls in the Pivot Table Field List:
Remember that as a calculated field, County Crime Rate won’t suddenly – or ever – line up in the next available source data column; it remains pivot-table specific, just sort of out there, and confined to guest appearances in pivot tables alone.
If you want to refine these results with decimals, proceed as usual – right-click anywhere in its column in the table and select Value Field Settings > Number Format > Decimal Places. I’ve added two decimals, but that’s up to you. (I should add that the outcome we’ve just choreographed could have been reproduced via some SUMIF formula-based derring-do, but that approach is the more ungainly one, and I’ve passed on it here.)
You’ll note the obvious – the extraordinary scale-down in King County crime across the 1985-2011 swath by nearly two-thirds from its 1987 peak, again to be qualified with the understanding that numerous County cities are AWOL from some of those 27 years.
And while we haven’t gotten into charting here (at least not yet), you can boldface the finding above by churning out a Pivot Chart tout suite. Just click anywhere among the data and click PivotTable Tools > Options > Pivot Chart. Click on a basic Line Chart option, click OK, and for starters you get something along these lines (or line):
(We won’t pursue chart formatting issues here, though.)
And to exemplify those missing city years – which, after all, could “artificially” depress or dilate the crime rates – let’s reconstruct the pivot table.
Remove our County Crime Rate calculated field (it’ll always remains on call, however), and drag the year field to the Column Labels area.
Drag City to the Row Labels area.
Drag Effective Crime Rate to the Values area. In excerpted form, you should see something like this:
You’ll note the vacant cells, denoting years for which a given city has no data. (You probably should also turn off the Grand Totals; their sums are meaningless in our context. Adding crime rates is akin to adding a group of batting averages.)And if you don’t like all those geeky decimals, you can again right-click any cell in the Value area and take the Value Field Setting > Number Format route. (Note by the way that, multiple columns notwithstanding, all the year data in there belong to the same Year field – and because pivot tables treat a field’s data of a piece, formatting one cell here formats them all.)
A Top-Rank Feature
Now watch this space – but before you rigidify your gaze make way for abject apology no. 2, really a reiterated one (see previous post). So far as I know, the feature we’re about to expound swept in with the 2010 release, and so 2007 users will have to sit this one out. I told you I was sorry.
Click anywhere among the data and then click PivotTable Tools > Options > Show Values As > Rank Largest to Smallest. You should see:
Click OK. What you get is an incisive, year-by-year arraying of crime rankings, vesting the lower numbers in the more crime-ridden cities – that is, largest to smallest:
Had you nominated Year as the Base Field instead, you’ve have strung together an intra-city crime ranking, in which each city’s yearly rates would be compared to that selfsame city’s other years (try it).
I’d say that’s pretty striking. Note Tukwila’s absolute hegemony over the number 1 crime rank, holding fast across all 27 years. And if you spot-check the source data, you’ll also note the spike in the city’s population from 1989 to 1990, along with a commensurate, nearly 50% drop in crime across these adjoining years (not enough to displace it from the first position, alas). I placed that curiosity before Dr. Nathan Brown, the spreadsheet’s designer, and he ventured that some manner of territorial annexation might explain both, i.e., a city merger with nearby, more peaceable districts. But that’s something to be scrutinized, along with all sorts of other data possibilities in there (for example – how about another calculated field shaping the yearly proportions of property to violent crime?).
Advising me to watch for new articles on the data, Dr. Brown allowed that Tukwila and other cities show some “interesting multi-year patterns”. Is it too nerdy to suggest that he’s right?