The question with which we wound up the previous post bears repeating, so let me repeat it: does a cognizance of the data quality issues that compromise the US school dataset inflict a failing grade upon the the workbook? That is, are we right to proceed with a look at the data just the same and give them a proper hearing, or should we bail out and take our pivot tables elsewhere?

I’d allow that the answer I submitted in that post bears repeating too – i.e., something like a yes. It seems to me that an analysis should properly go ahead, because the weight of the records – all 97,000 or so of the usable ones (let’s remember that number) – should, in virtue of its very size, synergize a plausible summary of the shape and quantity of America’s schools. Again, of course, the numbers are “wrong”; but they’re invariably wrong for any demographic take built on US census data, which after all date from 2010. Still, some general picture of the country’s institutional landscape should be there for the limning, if we’re persuaded.

We can commence, then, by performing a simple count and break out of schools by state:

Rows: STATE

Values: STATE (Sorted, Largest to Smallest)

I get, in excerpt:

You’re not likely to be surprised by the above. American’s most populous state, California, predictably registers the most schools, though a check of Wikipedia’s state population totals furnished both for 2010 census aggregates and 2016 estimates points to a relation between population and school numbers that isn’t quite linear. Note in addition that our pivot table makes no place for the 28 schools in American Samoa, and in this case it could have; you’ll recall that we provisionally banished that territory’s records from the data because of the unavailability of its student totals. But here because we’re counting school names alone, the AS data could have been accommodated – simply because its school names, at least, are indeed listed.

Next, we could just as easily consider the average number of students per school by state, but before pressing ahead entertain a pre-verification surmise: namely, do the more populous states average more students per school? That’s a sensible conjecture, but one that must by no means invariably follow; after all, it’s possible that the sparser states simply maintain fewer schools, whose average size compares with the denser ones.

Let’s see. You can leave STATE alone in Values, and add to it ENROLLMENT, ticking Average. Then sort the results by ENROLLMENT. I get, in part:

That’s Guam at the head of the class, its 40 schools averaging 781 students per institution (again, the bundle of 55 “STATEs” owes its hyper-count to the inclusion of American territories). At bottom sits the scantly populated Montana, whose per-school average of 171 makes it easy for teachers to space their students at exam time, if nothing else.

The disparities here, then, affirm a most mixed set of students-per-school averages, recalling the question if the numbers of schools by state indeed correlate with average school size. A rough read on that relationship becomes available with CORREL, in my case looking like this:

=CORREL(B4:B58,C4:C58)

And that expression evaluates to .3117, mustering a loose but meaningful association between the two variables – that is, as the number of schools across the states ascend, so does average class size. At least slightly.

And what of teacher-student ratios across the states? Again, the incipient question, one we’ve submitted several times in the past, needs to be asked and answered: does one calculate the ratios by according equal weight to every school irrespective of absolute school size, or does one realize a grand average of all of a state’s students divided by all its teachers (apparently defined here as Full Time, and so perhaps deeming two half-time instructors as one full-timer) instead?

In fact, both alternatives are credible. There may be good analytical reason to treat schools as equally-sized entities in the averages, in the same way, for example, that we would could compare the standards of living of residents in different countries. And weighting 100 and 1000-student schools equivalently can’t be worse than granting one vote to every country in the United Nations General Assembly or apportioning two US Senators to each state, can it?

But before we tangle with the numbers we need to broaden our look at the FT_TEACHER field, because a measurable batch of records there reports faculty complements at zero or in the negative numbers. Sort the field Largest to Smallest, and lower a blank row atop the first zero. But you know that routine.

Now if we play the above option, by in effect averaging each student-faculty ratio, we need to title the next available column (where that is depends on how many fields you deleted in last post) StuStaffRatio or something like it, and simply enter a formula in row 2 dividing ENROLLMENT by FT_TEACHER and copy down (again, column references may vary). Back in the pivot table, you need to rewrite the range coordinates of the source data set (in PivotTable Tools>Analyze>Change Data Source.), and follow up by recruiting StuStaffRation into Values, averaging the results to two decimal points. I get, in excerpt, after sorting the field Largest to Smallest:

The inter-state variation is considerable, and it’s most noteworthy that, large school sizes notwithstanding, Guam has managed to insure a pleasingly low (13.75) student-to-teacher proportion – so low you can’t see it in the screen shot – with California faring worst, at 23.76.

Now a suitable null hypothesis might foretell a perfect correlation between average enrollments and faculty sizes, irrespective of the absolute state numbers involved. Thus alerted to that prospect, we could write

=CORREL(C4:C58,D4:D58)

I get .4076, pretty serviceable by social-scientific standards, but surely not a 1. In other words, some positive association obtains between school student sizes and their teaching teams.

And speaking of school numbers, if you sort the ENROLLMENT field Largest to Smallest you’ll discover the honors going to the Ohio Virtual Academy, an online institution that serves Kindergartners through 12^{th} graders and 11640 students – somewhere (its governmental web page counts 11278 as of the 2014-15 school year).

That means there’s some crossing-guard monitors in Ohio with a whole lot of time on their hands.