The World as Spreadsheet, Part 2: Heat-Mapping the Numbers

1 Nov

The Plot Thus Far

Now, let’s see, where was I…..oh yes, it’s over here. Our reconnoiter of Washington DC crime data developed the intelligence to stock a pivot table that pinned crimes to place as reckoned by their latitudes and longitudes, thus mapping the numbers with some geographic fidelity, e.g.:

But before we ascend to the next step – a conditional formatting of the numbers that’ll color-code crime incidence along an frequency continuum – we need to post a codicil to the pivot tabling activity we described in Part 1, one pointing to a potential disjuncture in the maps.

Pivot tables work with the data they have – meaning in our case that the grouped lats and longs derive from, and only from, the actual coordinates lodged in the contributory source data. Any missing coordinates – or put very simply, those lats or longs running somewhere down or across DC that went crime-free – will appear neither in the data nor their kindred pivot table; and that near-obvious truth drives the reality that pivot tables won’t fill in those lat/long gaps, and extrapolate them to the Row or Column Label areas. For example, if the grouped latitude 77.054–77.044you see above had experienced no crimes, the table would have suffered a corresponding absence of those numbers – with a correlative crimp in the “map’s” proportionality.

Needless to say, a city reporting over 32,000 incidents is likely to bring all its lats and longs to the table (to a given degree of precision, at least), but the problem could beset other, smaller data sets. (One remedy: add dummy records of the missing lats and longs.  They will convey no crime data, but because they’ll worm their way into the pivot table they’ll help pave over any lat/long dearths in the Row and/or Column areas.)

And Finally, Those Conditional Formats

And once your numbers crunch melodiously you can head toward the conditional formatting department. Remember that our intention is to plot a heat map of the crime data, whereby various crime magnitudes are variously colored, along some continuum of shades – and that’s exactly the sort of thing conditional formats do.

You’ve probably had some prior dealings with this valuable feature, through which data are colored, italicized, or otherwise reformatted, provided they satisfy one (or in theory even several) conditions. In fact most conditional formats are readily and simply on call, particularly the ones marqueed here:

 

(though the options can get increasingly opaque as you descend that menu. Experimentation here can both enlighten and frustrate; don’t say I didn’t warn you. The experience can be Kafkaesque). But in any case, because we’re working with a pivot table the contingencies are slightly different, and so we need to click anywhere inside the pivot table data area (you need to click here, on not in the Row or Label areas) and click Home tab > Conditional Formatting in the Styles button group > New Rule:  You should see this:

 

 

While the above menu won’t make any user friends I wasn’t consulted about it, and we’ll have to make do with it as is. And in fact, to give credit where’s it’s due, Excel has rightly noted its pivot table surroundings by asking us, in the menu’s upper third, exactly which segment of the table we want to conditionally format, though its three answers need explaining.

Selected cells, the default and least likely option, will conditionally format only those cell(s) on which you’ve happened to click in order to trigger the menu to begin with. The All cells showing “Count of Offense” values selection will consider every data cell in the pivot table, including those addressed in any Total rows or columns, should these be turned on. You’ll almost surely pass on this option too, because Totals data will by definition be freighted with the largest numbers, and as such will decisively skew the conditional format coloring. We’re thus left with option 3, All cells showing “Count of OFFENSE” values for “LAT” and “LONG” – a curious-sounding locution, because it simply enumerates all of the fields currently in force in the table, and doesn’t appear to tell you anything you don’t already know. But because it’s possible, for example, that the Row Labels area could receive two fields at the same time – in which a superordinate field would be breaking out a subordinate one (e.g., Ward breaking out LAT) – this third option would then require you to start the conditional format by clicking on a cell positioned at the intersection of two of the three fields installed in the labels areas – either Ward and LONG, or LAT and LONG. But all right; if that anodyne digression has left you reaching for your brew, not to worry. That eventuality isn’t happening here.

In any event, we’re going to adopt the most programmatic conditional format available to us on the menu – the one holding down its lower third, offering up that 2-color scale possibility. What we’re about to get is that continuum, the one that tints the pivot table data according to their respective size. All you need to do is select the colors you want – the ones that will anchor what Excel terms the data’s Lowest and Highest values. I’ll work with the defaults; and so after making sure I’ve clicked that third menu option I endorsed above, I’ll just go ahead and click OK. I see this, more or less:

 

Excel has brought some heat to the map, color-grading the data by their relation to their highest and lowest numbers dotting the pivot table. If you don’t like the gradation, return to the conditional format and pick two different colors (and you start the revision process by clicking Conditional Formatting > Manage Rules > Edit Rule, which you’ll find when the Conditional Formatting Rules Manager window opens). It’s mix and match – or, rather, mix and contrast.

That procedural part is easy, but it comes with a qualification. What I can’t identify yet are the intervals across the resulting color continuum at which one color gives way to the next. As we’ve indicated, our conditional format is bounded at the extremes by Highest and Lowest-number colors, but  the data in-betweens call up intermediate-value colors as well – and I just don’t know precisely where one such color elides into the next. I’ve consulted several Excel books on this little conundrum, and I’ve yet to turn up an answer. Even my books don’t discuss it.

Now how’s that for transparency?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: