The FactFinder Data: My Two Cents About the Census

7 Feb

I’m a legal alien, which means that my star-spangled passport gets stamped in Heathrow these days at journey’s end; it’s what happens when you marry a Londoner and move to her side of the pond. But I can still access scads of demographic data about my country at

http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml

and so can you, even if Uncle Sam won’t cough up that green card.

Once you’re onsite, a variety of access routes to the data point themselves at you, guiding the way with signposts of variable clarity; and once you get where you want to go the data beckon to be downloaded to one of a range of formats including Excel, and that’s where we come in. Now prepare to be disturbed.

What do I mean? To appreciate that caution, let’s download this workbook, which details the demographics of the 10023 zip code on Manhattan’s Upper West Side:

 ACS_11_5YR_DP0210023

Put the data in (what appears to be) the E column in your sights and consider the numbers there: that tell-tale left alignment, their upper-left cell corners discreetly festooned with that green indicator and companion exclamation-marked alarm, boding trouble:

us1

Click the mark, and Excel proceeds to confirm your disquiet: these are numbers manqué, valueless, textual interlopers who can’t be added, subtracted, or pivot tabled – at least not yet. To date, every Census workbook I’ve downloaded – a paltry numerator juxtaposed atop the whole, to be sure – freights its numbers with the same impairment, and I don’t know why.

But help is available. You could select the entries in the E-column and click that exclamation mark, divulging the Convert to Number option:

us2

 A click there should restore the data to quantitative form. (You could also select the cells and click Number in the drop-down menu in the Number button group on the Home tab.)

But it’s fit and proper to ask why we’re bidden in the first place to apply remedies to a data-management issue that should have been pre-empted before it hit our screens.

And there’s more. I seem to have placed the above-referenced text/value data in the E column to date, but I’m not too proud to admit I’ve been wrong. The name box in the above shot identifies the selected cell as D10, even as the Alignment button group supplies notice that D10 has been merged. And it has; in fact cells D7 through D199 have amalgamated with the corresponding, same-row cells in the E and F columns, and again I remain mystified by the procedure. It’s possible that a wish to center the title in cell D7 across columns D through I inspired the merge, but that title yet clings to its default left alignment, and there appears to be no need to have extended the merge motif to the cells beneath it. No pun intended, but where’s the justification? Note as well the merge of the text-laden content in what were singly B3 and B4, along with the no-less-curious merge engineered in B210.

The suspicion, then, is that these data have crossed the transom from some other application, and suffered the proverbial loss in translation as a result. The suggestion, then, is that the sheets should have submitted to the necessary retooling before they backed into the web site. (And yes, I’ve seen the merge problem in another Census sheet, too.)

Note also the Wrap Text jag by which the sheet seems to have seized. It appears as if every data-bearing cell has capitulated to that tweak, and some cells have submitted both to Wrap Text and Merge & Center, and I don’t even know what’s supposed to happen to the data when that tandem strikes. What we see, however, is that cell I8 – both merged and wrapped – appears to us as Percent, when in fact the entry really states Percent Margin of Error. Check the Formula Bar.

All of which stirs perhaps even a larger question, one you – and the Census Bureau – may deem unfairly posed. Does any profit accrue to downloading the data in spreadsheet form, in view of the way they’re organized on this and the other Census sheets I’ve viewed? The Census download options include PDF mode – a proposal to which you’d reply: ah, but saving to PDF means you won’t be able to do anything with the data. That’s basically true, but you need to ask what you’ll be able to do in any event, even when it’s Excel that incarnates the data. Study the workbook and think about where you’ll be able to take the data next. Here fertility data are arrayed with household, marital status, and school enrollment figures, and quite apart from the itinerant blank rows, and total rows that cause the kind of redundancy problems I recounted in last week’s post, it’s proper to wonder where the data can go next. I’ve asked the question in an earlier post, but I’ll issue it anew: where can you take the sheet from here?

An unfair question? Perhaps it is. It may suffice to read the data and learn from them, and that’s fine. There’s a great deal in the Census site to learn. But a spreadsheet is an admirably unstable thing; prime its data properly and it can potentiate into something new and instructive, something that wasn’t there before. Is it treasonous, then, to allow that the Census data – at least the data I’ve seen – won’t allow that to happen?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: