Let’s keep an exciting thing going, and in line with our leitmotif de semaine that means extending our look at North American Municipal Services by tip-toeing across the Canada-US border to the great city of Boston, home of the world champion Red Sox (that’s a xenophopic baseball reference, if you’re reading this in Croatia), and its scintillating log of property code violations for calendar 2013.
And true – if you are reading this in Croatia, keeping yourself current with Boston’s city’s improper trash storage infractions and the like isn’t bulking large on your to-do list; but the spreadsheet’s the thing, and all that poking for patterns among the trash and overfilled barrels means to speak to some larger analytical truths – we hope.
The data in question are archived on Boston’s open data site, with its now-familiar interface (e.g., it nearly reproduces the New York open data environment; see my April 18 and 15, 2013 posts), under the name Code Enforcement _-_Building_and_Property_Violations (the underscores suggest the data derive from some pre-spreadsheet native source). The on-site original packs about 27 megabytes of low-grade urban misbehavior into its rows, is continually updated (the January 8, 2014 violations have already been duly filed), and it dates its recording activity back to 2003; but in the interests of concision I’ve squeezed the tweezers on only the 2013 data right here:
Even with all that abridging, the sheet rams 4.65 MB into your RAM – not quite asphyxiating, to be sure, but in matters of file size, smaller is always better. And in fact you’re almost sure to find a good many of the data’s 16 fields expendable and hence amenable to deletion, although you’re perfectly free to simply leave those fields alone and ignore them. If you ask me I’d show the door to Ticket_ No, St No, StHigh (presumably a consecutive number for sites with shared addresses, e.g., 45-47 Broadway), Street, Suffix, State (all the data are MA or Massachusetts-centric, after all), and Location. That last-named parameter warrants the sack simply because it implodes a location’s latitude and longitude into a single text-stamped field, even as the two columns to its left split lats and longs into far more usable states. (I haven’t made my mind up about Code and Value, as the former presumably dusts off some statutory entry that might ultimately prove material. I don’t yet know what the codes and values mean, though). These cutbacks relieve the file of about 1.5 MB, and the following treatment assumes you’ve implemented these field austerity measures (and yes, you’ll also want to conduct the usual column auto-fit upon the remaining fields).
But before we actually step through the data you’ll also want and need to do something about another, classic formatting puzzlement that badgers the Zip field. Zip stands for Zip Code (Postal Code to you, if you get your mail in the UK), a five-digit locational identifier that homes in on your home for precision’s sake (Zip codes nowadays have also been stretched by an additional four-digit extensions, but I do digress). That’s right, I said five digits – but you’re viewing only four of them per entry (though eleven records are entirely code-less), and in order to understand that deficit you need to know something about zip coding protocols and, more to the point, how Excel treats leading zeros.
American zip codes ascend numerically from their starting point in the country’s northeast region, in seeming obeisance to the country’s historical, transcontinental unfolding. Massachusetts, the state of which Boston is the capitol, maps itself in that very region (called New England, in fact), and as a result Massachusetts zip codes begin with the number 0 – the number that is nowhere to be found among the codes.
The problem is that any default-governed attempt to enter, say
will be repackaged by Excel as 2128, because the leading 0 adds no substantive value to the entry, and as such will be disregarded by the cell (see also my January 14, 2013 post). The same fate would befall an ID number such as 001, for example, being distilled back to 1.
But help is available, via a variety of benefactions. The programmatic, textbook remedy asks you to select the Zip column and right-click any cell therein (or turn to the Format Cells button on the Home ribbon). Next click the Number tab > Special Zip Code, and OK:
Each and every Boston zip code now assumes its leading zero. The Zip Code routine has clearly foretold the 0 problem, and readied its digit(s) for just the kind of data exemplified by the Boston codes.
But keep in mind that the Zip Code format is very much off-the-shelf, and thus isn’t up to any and all data entry challenges you may have to confront. If you enter the number 1 in the hopes that Zip Code will pad the entry into your ID 001, you’re going to get 0001 instead, because Zip Code always forces five-digit solutions on existing cell contents, and achieves that length by shipping insurgent zeros – and only zeroes – into the expressions. If the value already possesses at least five digits, Zip Code simply leaves the number alone.
But ok – exactly how do you get that coveted 001, in the event you merely type 1? It goes something like this: enter 1 in the desired cell. Right click back into Format Cells but here click Custom, and enter in the Type field
Here, however, the zeros play a role slightly less literal than the one with which they’re charged in their Zip Code capacity. The zeros above stand for a general place-holding code, and not an absolutely-valued zero. Each zero here is assigned a column, such that if I enter, say, 23 in any cell and subject it to the above triple-zero Custom format, I’ll realize 023. If I want 0023, I’ll have to enter 0000 in the Type field.
And I hope to actually get to those violations in the next post.