NYC School Attendance Data, Part 2: What I’ve Learned

23 Aug

Once we’ve decided we’re pleased with the New York City school attendance data in their current, emended state (per last week’s post), we can move on to ask some obvious but edifying questions about what we’ve found.

First, a breakout of attendance percentages by day of the week is something we – and certainly Board of Education officials – will want to see. In that connection, we again need to decide if we want to break out the attendance percentages arrayed in the %_OF_ATTD_TAKEN field, and/or the numbers we derived with our ActualTotals calculated field, the latter according numerical parity to each and every student; and as such, it seems to me that ActualTotals is fitter for purpose here (of course we could deploy both fields, but let me err on the side of presentational tidiness here).

But in the course of tooling through and sorting the data by the above %_OF_ATTD_TAKEN, I met up with a few additional complications. Sort that field Smallest to Largest, and you’ll have gathered a large number of records reporting days on which absolutely no students attended their respective schools – 7,245 to be exact; and while an accounting for these nullities can’t be developed directly from the dataset, we could be facing an instance of mass, errant data entry, and/or evidence of a requirement to furnish a daily record for a day on which classes weren’t held. And in fact, over 14,000 records attest to attendance levels beneath 50% on their days, and I don’t know what that means either. It all justifies a concerted look.

But in the interests of drawing a line somewhere, let’s sort %_OF_ATTD_TAKEN Largest to Smallest and split the data above row 513796 – the first to bear a 0 attendance percentage – with a blank row, thus preserving an operative, remaining dataset of 513974 records. But still, I’d submit that more thinking needs to be done about the low-attendance data.

Returning now to our day-of-the-week concerns, the pivot table that follows is rather straightforward:

Rows: Day

Values: ActualTotals

I get:


(Note that you’d likely want to rename that default Sum of ActualTotals header, because the calculated field formula itself comprises an average, in effect – NumberStudents/REGISTER*100. You’ll also want to know that calculated fields gray out the Summarize Values option, and thus invariably and only sum their data. Remember also that 2 signifies Monday.)

I for one was surprised by the near-constancy of the above figures. I would have assumed that the centripetal pull of the fringes of the week – Monday and Friday – would have induced a cohort of no-shows larger than the ones we see, though attendance indeed slinks back a bit on those two days. But near-constancy does feature strikingly in the middle of the week.

And what of comparative attendance rates by borough? Remember we manufactured a Borough field last week, and so:

Rows: Borough

Values: ActualTotals

I get:


By way of reorientation, those initials point to these boroughs:

K – Brooklyn

M – Manhattan

Q – Queens

R – Richmond (Staten Island)

X – The Bronx

The disparities here are instructive. Queens students are the literally most attentive, with Bronx students the least. Of course, these outcomes call for a close drilldown into the contributory values – e.g., economic class, ethnicity, and more – that can’t be performed here.

We can next try to learn something about attendance rates by month, understanding that the data encompass two school years. Try

Rows: Date (grouped by Months only)

Values: ActualTotals

I get:


The school year of course commences in September, with those early days perhaps instilling a nascent, if impermanent, ardor for heading to class. We see that attendance peaks in October, and begins to incline toward the overall average in December.

The question needs to be asked about June, or Junes, in which the attendance aggregate crashes to 85.21%, deteriorating 4.69% from the preceding May(s). While explanations do not volunteer themselves from the data, an obvious surmise rises to the surface – namely, that students beholding the year’s finish line, and perhaps having completed all material schoolwork and exams, may have decided to grab a few discretionary absences here and there. It’s been known to happen.

But let’s get back to those zero-attendance days and their polar opposite, days in which every student on a school’s roster appeared, or at least was there at 4 pm. The data show 1641 records in which each and every enrollee in the referenced institution was there, a count that includes 31 days’ worth of school code 02M475, i.e. Manhattan’s renowned Stuyvesant High School; a pretty extraordinary feat, in view of the school’s complement of around 3,300. And while we’re distributing kudos, mark down September 21, 2016, the day on which all 3,965 of Staten Island’s Tottenville High School registrants showed up, and June 24 of that year – a Friday, no less – on which the entire, 3,806-strong enrollment of Queens’ Forest Hills High School settled into their home rooms. But ok; you could insist that these laudable numbers should likewise be subjected to a round or two of scrutiny, and you’d probably be right.

Now for a bit of granularity, we could simply calculate the average daily attendance rates for each school and sort the results, and at the same time get some sense whether attendance correlates with school size as well. It could look something like this:


Values: REGISTER (Average, to two decimals)

%_OF_ATTD_TAKEN (Average, to two decimals)

Remember first of all that a given school’s enrollment is by no means constant, swinging lightly both within and across school years. Remember as well that because you can’t average calculated field totals, I’ve reverted to the %_OF_ATTD_TAKEN field that’s native to the data set.

Sort by %_OF_ATTD_TAKEN from Largest to Smallest and I get, in excerpt:


That’s the Christa McAuliffe School (named after the astronaut who died in the Challenger explosion) in Bensonhurst, Brooklyn on the valedictorian’s podium, followed very closely by the Vincent D. Grippo school, (physically close to the McAuliffe school, too). And if you’re wondering, I find Stuyvesant High School in the 907th position, with its daily attendance average put at 90.87. Tottenville High, interestingly enough, pulls in at 1155 out of the 1590 schools, its average figuring to a below-average 87.43. At the low end, the Research and Service High School in Brooklyn’s Crown Heights reports a disturbing 41.84% reading, topped slightly by the Bronx’s Crotona Academy High School (remember that you can learn more about each school by entering its code in Google). These readings merit a more determined look, too.

And for that correlation between school size and attendance: because my pivot data set out on row 4, I could enter, somewhere:


I get .170, a small positive association between the parameters. That is, with increased school size comes a trifling increment in attendance rates.

But if you want to learn more about correlations you’ll have to come to class.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: