New York City Homeless Data, Part 2: A First and Second Look

24 Jun

OK – I know you’ve been wondering, about that tantalizing question with which I put lock and key to last week’s post, the one about some oddly large daily shelter census figures that seem to have found their way into New York’s Department of Homeless Services reports. I said I’d ask the DHS about them, and I did – and I haven’t heard.

Due diligence aside, we need to press on; that above concern is in any event a data-quality affair, and here we want to make some room for a couple of design/organizational points as well. Here’s one: Understood in isolation, I’d move to submerge the Total Adults in Shelter/Total Children in Shelter beneath a unitary Persons in Shelter field (a bit of centralization that would, for example, free you to filter the field by either Adults or Children and also superannuate the Total Individuals in Shelter field, which could be returned as a matter of course in a pivot table), as per my discussions on this issue that I’ve cited in any number of posts. But here – and unlike those previous deliberations – because each record comprises multiple, properly distinct and cross-cutting fields (e.g. Adults in Centers, Families with Children and the like), the sheet can’t hope to perform multiple consolidations on multiple fields, and have that all inform a single record. It’s one thing to rein these subject fields:


into one overarching Subject parameter, but quite another to do the same, simultaneously, to each and every bona fide field – because what would the reconstituted records look like?
For example – proceed with instituting a Persons in Shelter field, whose entries would look something like this:


Ok; but what are we going to do now with the discrete Single Adult Men/Women in Shelter data? We can’t here too aim to consolidate these, e.g.


Because we can’t sensibly align the Men data on, say the same row/record as Children in the previous shot, and Women with Adults. The apples and the oranges make the compote unpalatable. Conclusion, perhaps: you may just have to leave the data be, unless you’re prepared to parcel them into what are in effect sub-worksheets, for example, a sheet devoted only to Adults and Children in Shelter, another one keyed to Single Men and Women in Shelter, and so on.

Now a slightly cognate problem poses itself in the New York Open Data Directory of Homeless Population by Year worksheet:

Directory_Of_Homeless_Population_By_Year New York City

It’s a small sheet with far smaller homeless numbers, leaving me to ask about the relation of its data set to the one we’ve considered above. But a glance at the fine print appears to deliver an accounting. The Directory sheet counts the number of New York’s unsheltered homeless, and I think we must assume in addition that the single number attached to each borough per year stands for some daily estimate (I’ve put a query about that inference out there, here too without response thus far).

And if we’re prepared to acquiesce in that premise, the data read pretty straightforwardly – but you’ll at the same time need to think about the yearly rows that compile what the sheet calls Surface totals, along with Total Unsheltered Individuals. These entries aggregate the city’s homeless residing somewhere above ground, and then in the second of the two metrics combine those figures with those of individuals living in the city’s subway system. Here – as opposed to the previous, multi-variate workbook that loosed the complexities recounted earlier – we can properly and simply delete these totals, because left unattended and invested with the status of a standard record, they’ll inflict a double and triple count of any pivot tabling of the numbers. But again, because the data are small, comprising homeless data for New York’s boroughs across four years, you need recall only eight rows from the set (in fact if you sort the data by the Area field those eight happen to descend to the bottom of the data, from where that can be easily effaced).

What you won’t want to conduct, however, is a like repudiation of the four Subways records, even though each assembles what is in effect a cross-borough total by year. Because New York can’t accurately tie the subway-resident homeless to a specific borough above ground, it hasn’t much choice but to present these overall figures in single-record terms, one per year, and without any spectre of a double count. The Subway data can thus be deemed, for the purposes of the analysis, as a sixth borough.

And once that’s understood the data lend themselves deftly to this pivot table:

Row Labels: Area

Column Labels: Year

Values: Homeless Estimates (Sum):


The grand totals alternately wane and wax, and don’t map compatibly with the shelter day we reviewed last week, which pull upward. Note however the steady increments among the Subways population -accounting for 50% of all homeless in 2012 – and the notable spike in the Manhattan surface population in 2010. Are the ups and downs creatures of sampling issues, governmental interventions, or economic turnabouts?

That’s a matter for even a third look.

One Response to “New York City Homeless Data, Part 2: A First and Second Look”


  1. Assignment 6 - October 17, 2017

    […] want to get into the nitty-gritty. If you want to use this dataset, here are two blog posts (p1, p2) that provide a decent summary of how to process and use […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: