Air Traffic Data: Some Reservations, Part 2

31 Oct

The line between instructive reiteration and a point insufferably belabored may be molecularly slight, but I’ll take my chances with your judgment and go ahead. There are, again, two types of spreadsheets: ones to be perused and read with a passive regard, and ones to be analyzed, and pushed and pulled about for some new, information-gleaning end. The anno.aero data in our collective sights must, as they’re presently organized, answer to the first criterion. After all, the spate of inconsistencies flooding the sheets won’t matter if the reader simply needs to know what’s up with air traffic numbers and move on. Maybe.

I’m equivocating even here, because similar data suited up in different clothes can give pause, even to readers happy to give the data the barest minimum of attention. I’m thinking about the Var. field at the back end of the european-airlines-traffic-trends LF (load factors) sheets, which do nothing more than subtract the previous year’s airline-specific LFs from the ones in the given year’s sheet.

Check out, for example, the 2012 LF sheet’s column P, and its alphabetically apt pp format:

 res1

We learn first of all that Excel happily attends purely textual addenda with its numeric entries, as per the screen shot. Move into Format Cells > Custom, type a 0 in the Type: field, and tack on the text you want. (That 0 is a code representing a numeric entry).  Cells subject to the makeover thus receive any numbers along with their alpha paddings, and with no loss of quantitative value – a very good thing to know, and good for me, too. The question here, though, is what the pp – abbreviating “percentage points”, I think – is doing for or with the numbers. I’d allow that, because the actual numbers in the column simply report an absolute, subtracted percent change in load factor from 2012 to 2011, the designers wanted to stave off the impression that the figures stand for some divided result, e.g., 2012/2011-1, or some sort. That’s a sage clarification on their part, but apart from a coarse rounding off of the values (the number in P4 is closer to .42% than the reported 1%) the numbers are simply hard-coded; that is, they were simply entered, sans formula. Except for the result in P22, that is, which emerges from a

=O22-B22

calculation.

And what deepens my puzzlement, though, is the Var. field in the 2011 sheet:

res2

Those +/- 0 entries are labels, and in view of the fact that real, determinate values could have issued from the data via simple cell subtractions I don’t know why they’re there; and in any case we don’t see these labels in the 2012 sheet. Load Factors for 2013? They’re not there, even though with the contributory data in place, they could have been.

Now for something completely different but problematic: open the anno.aero svid calculator below:

 

The sheet sires a kind of cousin metric to standard deviation, measuring the monthly traffic variability in a given airport’s traffic and culminating in the index in D19 (the readings in F19 and H19 work with demo data). Note first of all the hidden E column and its operative formula, e.g.

=((100*D5/(D$17/12))-100)^2

 

It seems to me that

=AVERAGE(D$5:D$16)

might properly replace the (D$17/12) term, by overwriting the hard-coded 12 value (formulas respond to data-entry contingencies; the number 12 can’t), and I think the expression’s outermost open parenthesis and a corresponding closed one following the 12 could be safely banished.

The calculator asks the user to paste traffic data from one of the sheets drawn from one of the links stowed beneath the calculator (the American traffic link doesn’t work, by the way; you’ll need to divert to the anno.aero database page directly for those data); but note that the data therein are planed (sorry!) horizontally, even as the calculator wants them oriented along the vertical. As such, you’ll need to hurtle through a Copy-Paste Special-Transpose hoop, one that could have been cleared from the process.

But it’s the formula in D20 – the one that returns the textual airline rating – that’s the grabber. It reads:

=IF(D19<$A$24,$D$23,IF(D19<$A$25,$D$24,IF(D19<$A$26,$D$25,IF(D19<$B$27,$D$26, $D$27))))

 

It’s nesting a series of IF statements, each assessing a value falling somewhere between the numbers in A23:A27. (Note, by the way, the space poking between the $D$26 and $D$27 references at the back end of the formula. I didn’t know you could do that.) Now if you’re wondering why the iterations above, which propose alternative after alternative in the event the number in D19 meets or fails to meet each logical test, couldn’t have been emulated by a garden-variety VLOOKUP formula I share your curiosity, but there’s a reason – sort of.

The reason is supplied by the isolated > sign, nothing but a label all by its lonesome in A27, and in a range otherwise populated by values (those hyphenated constructs, e.g. 0.0 – 1.9 – pull across three different cells; only the 0.0 finds itself in A23). And because that “greater than” is there, VLOOKUP can’t work with it. Thus the sheet designer plumped for the formulaic plan B above.

 res3

And if you doubt me, consider this excerpt from the formula:

IF(D19<$B$27,$D$26, $D$27))))

 

All the other IFs in there test values in A23:A26, but suddenly we’re testing a value in the B column – B27 – because that’s where the actual, quantifiable, 100 sits – and because that’s where it is, it can’t contribute to the lookup column in A. And that’s why that flurry of IFs jam the formula.

But I’ll continue to allow that a VLOOKUP will work, particularly once you scan back a couple hundred words, and rethink that pp numeric format we reviewed all the way up there. If you click in A27, return to Format Cells > Custom, type >0 (again, the zero is a coded placeholder for a numeric value) and ratify it with OK. Then simply type 100 in A27, and you should see >100, in all its adorned but value-sporting glory. Then enter

=VLOOKUP(D19,A23:D27,4)

 in D20. Now when and if you paste air traffic numbers into D5:D16, you’ll evoke the same textual results evoked by the IF-laden formula (you’ll also note D20’s text-resonant, colored conditional formats).

I like that solution, if I do say so myself. As a matter of fact, I may even email anna.aero about it. And I’ll even accept frequent flyer miles in lieu of my customary fee.

 

 

 

 

 

Air Traffic Data: Some Reservations, Part 1

26 Oct

Stop me before I pun again. It’s the data’s fault, you see, a sheaf of multi-sheet dossiers of yearly traffic data for European airports and airlines, all embarking from the anno.aero site (there’s a new domain for you, or at least for me), and just asking for a jetstream of non-stop wordplay… but I thought I told you to stop me.
So let’s cut to an actual workbook – fast. Have look at this one:

european-airlines-traffic-trends

 Three sorts of sheets are loose-leafed into the book: breakouts of passenger numbers by airline and month for a particular year (the Pax) sheets, monthly percentage passenger comparisons (presumably compared with the corresponding months from the previous year in the vly sheets; and no, I don’t know what that means either), and monthly/airline figures for load factors, a measure of passenger capacity against a miles-traveled factor (see a definition here), and sheet-coded LF.

Note first of all the obvious watch-this-space vacancies in the 2014 sheets; I’m not sure why they’re presently there at all as a consequence, but note as well the column of links to respective-airline data in Q peculiar to the 2014 sheet alone, perhaps lying in wait for the folks in IT until year’s end, when the numbers finalize. Cut then to the 2013 PAX sheet and think about the workbook’s prevailing custom numeric format, e.g. as registered in cell B5:

That zero acts to round off such numbers as require it to decimal-free status, but the data here appear resolutely whole-numbered in any event, and as such would appear to have no need for that contingency plan. Do fractional passengers have to pay full fare?
But it’s the format’s defrocking of commas that’s the more bothersome fillip. Is clarity really better served by those insurrectionary spaces driving seven-digit numbers apart? And there’s an additional, more substantive quarrel to be had with the format. The first of its pound signs (or hash marks, to Britons) deals a kind of wild card to its numbers, i.e. that sign stands for all the digits that might proceed to its left (remember our numeric notation is Arabic) – and these wouldn’t be appropriately spaced, even where a space might be expected to appear. That means, for example, that if the entry in B5 were in fact

7,899,653,000

the format will redistribute it thusly:

7899 653 000

Probably not what they had in mind. (For a thoroughgoing treatment of customized numeric formatting consult this review.)
And a spot-check of the sheets snags a few other spotty design and format elements, for example the gray fills streaming across some, and only some, of the blank cells in the 2012 Pax sheet:

I can’t explain the fill’s intermittency, nor can I explain the incidence and white-font format of the formula in G14:

=(2762922-1872434/2)

Embed a white font in a white cell and you get a result that can’t be seen.
In fact, if you F5 your way to the Go To window and proceed to Special > Formulas, you’ll emblazon in blue every formula-sporting cell in the sheet, some rather unexpectedly placed, and some heretofore invisible like G14:

anno1

It gets murkier. Try F5 on the next-door 2012 vly sheet and:

anno2

Each shaded cell sits atop a formula, leaving us to again account for their unshaded confreres. Why are some, but not all the cells invested with formulas?
And what are those formulas doing, by the way? They’re computing monthly traffic changes by airline from 2011 to 2012, performing a couple of VLOOKUPs on the data that track down month/airline numbers in the two sheets, subtracting 2011 from 2012, and dividing by the former year to stoke the percentages. Now consider the D$3 recurrently referenced in the formula strung together in D6, for example:
=(VLOOKUP($A6,’European Airlines 2012 Pax’!$A:$N,D$3,FALSE)-VLOOKUP($A6,’European Airlines 2011 Pax’!$A:$N,’European Airlines 2012 vly’!D$3,FALSE))/VLOOKUP($A6,’European Airlines 2011 Pax’!$A:$N,’European Airlines 2012 vly’!D$3,FALSE)

It points to one of a bank of hidden values (i.e. green numbers made to disappear into green cell fills) ranging across C3:N3, each one naming the lookup column number of the month below it in row 4. Thus in our formula the D$3 returns 4, directing the lookup to find its appointed value in the column topped by February. It seems to me that a rewrite might instill a quantum of elegance to the matter, by dropping MATCH into the mix in lieu of that D$3, e.g, for the first term of the formula:

=(VLOOKUP($A6,’European Airlines 2012 Pax’!$A:$N,MATCH(D$4,$A$4:$N$4,0),FALSE)

(For a run-through of MATCH look here. MATCH identifies the position, conveyed in numeric terms, of a searched-for value in a single-columned range or array. In the instance above, MATCH finds that the entry in D4 – February – is 4.)
You’ll ask what’s elegant about a replacement expression that has the effect of actually lengthening the formula. It’s elegant, I’d rebut, because formula now engages with, and only with, the air traffic data themselves (i.e. the row of month names). The numbers in row in D3 superimpose a layer of lookup values that we see could have been derived instead from the information as it already stands. But that’s just one man’s opinion.
But now what are we to make of this formula, in F12?

=(‘European Airlines 2012 Pax’!F12-258267)/258267

Or this one, in J20?

=’European Airlines 2012 Pax’!J20/’European Airlines 2011 Pax’!J28-1

Spread out before us, I think you’ll agree, is a most curious spreadsheet, its data subject to all manner of formulas irrigating a tract of cells in which formulaic uniformity would seem to be the necessary rule. It is a near postulate of spreadsheet construction that ranges of cells doing the same kind work should boast the same kind of formula, apart from the standard allowance for relative cell references. Yet the formulas here are strangely inconstant – in cells when there’s a formula to be viewed at all. That F12 expression, among other things, references the April 2012 traffic for BMI British Midland Air, but the corollary number for 2011 appears to be there, and thus available as a cell reference, in that sheet’s F17, and that number isn’t 258267, either. Yet the F17 value isn’t used, and I don’t know why. As usual, I am happy to be reeducated.

Note, by the way, that nowhere am I impugning the worksheet’s math in any of this; that is, I’m perfectly prepared to endorse the accuracy of the actual air traffic numbers that fan across the sheet (granted, I’ve yet to be asked for an endorsement). It’s how they got there, in their heterodox modes of transport, that’s the puzzler.

Not bad numbers, but odd form.

Excel Live in London, Nov 3-24: Yours Truly On the Bill

21 Oct

You write enough posts, they give you a free ad. Here’s mine: Your correspondent is scheduled to give a quartet of Excel courses in London on consecutive Monday evenings, November 3-24, to be exact, at the offices of the Guardian newspaper near the Kings Cross station. The topics:

• Pivot Tables
• Formulas and Functions
• Visual enhancements (e.g., charts, sparklines, conditional formats)
• Grabbing and Grooming the Numbers (Text to Columns, Data Validation, removing duplicates, issues in spreadsheet design, etc.)

Each offering can be taken singly or in discounted group mode.

To learn more, see http://www.theguardian.com/guardian-masterclasses/the-essentials-of-excel-abbott-katz-digital-course

Handouts autographed upon request.

Mystery Solved, II: A Crackerjack Sequel

8 Oct

Didn’t I tell you Debra Dalgleish is a pivot table expert? You’ll recall I prevailed upon her expertise, in last week’s nail-biter of an installment about grouped pivot table data whose tousled decimal points would not be rounded.

And you’ll further recall my self-satisfying workaround: importing a faux, zero-bearing record into the data set, a supplement that convinced the numbers above it to submit to a rounding at last.

That patch did what it set out to do, but at a cost: the empty zero could bias record counts, averages, minimums and the like, and so my Excalibur exhibits a bit of rust.

And that’s where Debra Dalgleish comes in. In an email communique beamed all the way from Canada she reported this spot-on resolution:

Once the grouping sequence is set in motion and you get here, e.g.

round1

Once there now simply alter the Starting at number to the degree of precision you need, for example:

round2

Or eliminate all the decimals if you wish. Then click OK. The grouping commences, and hews to your decimal significance:

round3

Tres cool. Note that the tweak need only recalibrate the Starting at field, and happily swerves past the Ending at junction. All due kudos to Debra for the superior tack, one which decisively obviates the need to contrive my counterfeit zero record, and which smuggles a meaningful round-off into the grouping process at just the right point.

That last allusion reminds us about the essential, petrified character of grouped pivot table numbers. They’re typically emplaced in the Row Labels area because they’re…labels, in spite of their indubitably numeric inception. After all, a grouped figure such as

3000-3500

can’t really be anything but. We would normally hold this truth to be of the self-evident character kind – namely, that a hyphenated phone number is not a quantitative thing – but again, because grouped numbers are numbers one needs to appreciate the loss in translation once a grouping is thrust upon them.

Of course again if the data happen to boast a zero or two, the data will group nicely all by themselves as a matter of course, and as such won’t need to call up the Starting At intervention. But Debra Dalgliesh’s tip should work all the time.

And it’s holiday time for me again. Sorry – I’ve been working nights to earn this vacation.

Mystery Solved, I Think

3 Oct

Mystification has its place. As an antidotal corrective to haughtiness and unbecoming outbreaks of self-confidence, an anomic stumble or two through the data can work wonders for the reality checker within.

Beware, my brother’s philosophy professor warned, of people who are sure of themselves; and sureness gets taken for a ride through the Beds Open Overnight workbook that scratched at my head in the previous post and across the last couple of weeks.

The problems, again, are those intermittently cooperative pivot tables we had put to the task of grouping various worksheet data, some reporting the data in fastidiously rounded-off integers, and some in tranches spewing rancid jets of decimals, like this:

 group1

These several outcomes appeared to perhaps have be geared to respective formats, with the excesses above perpetrated by the garden-variety Accounting mode, and the prim, decimal-less groupings tonsured by some exotic custom format, e.g.

 group2

But, if I may anticipate your question, when I painted the custom format atop the numbers suited in Accounting, the decimals yet refused to go away; nothing changed. And that’s mystifying.

In search of enlightenment I retold the mystery to Michelle Tenwick, the contact person making herself thoughtfully available right there in the NHS Trust by Sector sheet. She replied, in part:

“Here the variations seen in the formatting of the data fields are as mentioned so that they are consistent with previous publications. Most importantly, the figures remain the same regardless of whether they are custom or accountancy format.”

That’s all true, but not quite explanatory, and indeed – why would the sheets have been troubled to assume different numeric formats to begin with?

I then put the matter before Debra Dalgleish, a bona fide pivot table expert who’s authored three books on the feature. She offered that because some of the data cells reveal hyphens in lieu of numbers – apparently standing in for 0, that would cause Excel to treat the entire field in Text terms, thus resisting any efforts at rounding. Perhaps, but when I drew up a demo data set comprising nothing but decimal-freighted numbers, the rounding off would not happen.

I was thus left to speculate that the operative formattings on the sheets – be they Custom or Accounting – have nothing to do with the problem. My suspicion – though remember, I’m officially mystified – was that the data themselves had been put through some sort of ringer, perhaps via a download from some other, presently unknown source or sources (note Tenwick’s note about consistency with past publications). I should add that my past experience with the grouping/rounding issue has instructed me that decimal-retention is the default, and so it’s the data here that are amenable to the rounding that pose the question.

But then I had an idea, one which returns me, via the back door, to Debra Dalgleish’s thought about the hyphenated entries, viewed this time on its head. Remember that in fact the hyphens aren’t text – they’re numeric zeros, formatted as an Accountant might (keep the jokes to yourself). It seems, after having subjected the data to a bit of controlled experimentation, that as long as the data has at least one zero, however it’s formatted, the grouping will consent to a rounding. It may be that the intervallic intelligence of the grouping feature requires the grounding of a zero somewhere in the data, thus establishing a floor atop which the intervals can be built.

 

But whatever the reasoning, a data set seems to call for at least one zero, at least if you like your grouped numbers all smoothed and polished and rid of those scruffy decimals. But what if your data have no zero? You could throw in a dummy, zero-valued record, but that intention would have to be thought through, particularly if you’re counting and/or averaging the data.

And so if the data are zero-less you may also alternatively want, or need, to round the numbers in a user-introduced column in the data set with the ROUND, or FLOOR, or CEILING functions, a matter I’ve already broached here:

http://datadrivenjournalism.net/resources/groupthink_grouping_values_in_excel_pivot_tables

And that link points to wider pivot table strategy, an open secret of the enterprise: the trick is to do as much preparatory work in the data set before it washes up onto the pivot table. For all its aggregating might, pivot tables won’t do everything, and a little data-organizational proactivity can help the table along.

And that’s because the pivot table Group Selection command is zero-tolerant.

NHS Bed-Occupancy Data: I Need to Sleep On It

19 Sep

If health service provision is here, can the provision of data be far behind? That affirmatively-answered question comes to you from the UK’s National Health Service and corollary entities, and a great many of the holdings are columned and rowed, e.g.

http://www.england.nhs.uk/statistics/

http://www.hscic.gov.uk/searchcatalogue

http://www.ons.gov.uk/ons/taxonomy/index.html?nscl=Health+and+Social+Care

And among the many, you’ll find this workbook, compiling a series of counts of overnight bed occupancy numbers and rates across 226 hospitals nationwide:

 Beds-Open-Overnight-Web_File-Final-Q1-2014-15-34655

 

Now in fact the book’s Area Team by Sector sheet appears to have done much of the analytical work for us, by aggregating the totals by what it calls Area Teams from the finer-grained NHS Trust by Sector sheet alongside of it, with “sector” presumably denoting the medical rubrics beneath which the bed figures gather: General & Acute, Learning Disabilities, Maternity, and Mental Illness. Aggregation there is, but the sheet’s geography suggests that attainment was spurred by some instrument other than a pivot table, and that’s fine, of course. Where does the Constitution of your or any other country mandate pivot tables upon its citizen data crunchers?

But there could be more. Apart from reconsidering the question advanced by last week’s post, the one asking what remains to be done with data that seem to have already been put through their paces, we’re yet left with the Occupied by Specialty sheet, breaking out the bed-occupancy numbers by 76 medical treatment categories. If you want to pivot-table that one, think about these emendatory matters:

  • Blank row 17 must be deleted.
  • As must the grand totals reaching across row16. Again, a totals data row masquerading as just another record will double any tabled results – and that’s just wrong; and affirming England in F16 as but another Org Name is likewise and self-evidently wrong, too.
  • The data in Columns B and C – Year and Period – are identical throughout and hence superfluous, but as such inflict no actual impairment on the sheet and could simply be ignored. I’d delete them, though.
  • Note the same-width speciality columns and their variably-spaced headings; the broken text wraps in the latter is a necessary consequence of the former. That’s a simple design complication, but if the concern is to promote parity among specialities through column equivalence the answers are none too forthcoming. A Scaled-down font would ultimately achieve a universatl heading fit, but will take its toll on legibility. Dilating columns to about 22 from their current 12 would smooth out the wrap-raised bumps, but users will have to do that much more scrolling across the sheet before they’ll be able to see it all.
  • On a deeper level (and we’ve been here before, too): the specialities, each of which has been allotted a field all its own, should have been demoted to item status and made to stand beneath one all-subsuming field. Given the plurality of specialties it would be useful to be able to tick each one up and down a single Slicer – but you can’t, because Slicers slice through one field at a time:

nhs1

76 specialities? You’ll need 76 Slicers.

And the transformative Alt-D, P recombination of fields into the desirable data set I detailed in my August 29, 2013 post won’t work here, because Alt-D, P will err on the side of a hyper-inclusiveness, investing only the left-most Year field with Row Label status and relegating all the other labelled fields into item status along with the specialties – and that won’t work. You don’t want Org Name to liken itself to Ophthalmology; one’s a bona fine Row/Column parameter, the other a proper Values-sited datum. Try it, if you don’t see what I mean.

  • Click on any cell among the specialty data and it turns out the bulk of them drag a long train of decimals behind or before them. It is clear these numbers must have emerged from some spell of formulaic activity – they’re averages, after all, and they’ve simply been treated to a bit of presentational sprucing by rounding themselves off – kind of. Now click in any number-sporting cell and observe the format – Custom. Check into Number Format and you’ll see

nhs2

 

Now I have neither used nor understand that format yet, and I need to take it into the shop and probe its whys and wherefores, assuming it’s been used with all due number-managerial intent. But click in a number in the Area Team by Sector sheet and its number format reports Accounting instead I can’t account (no pun intended, but let’s go with it) for the disparity here, but it matters, and it matters in this way: Remember our numbers are really extended affairs, pulling across all those decimals, and if you Accounting-format them, throw them into a pivot table and group them, they’ll look something like this:

nhs3

That is, the decimals overthrow their Accounting round-offs and burst into the pivot table as is, and it isn’t a pretty sight, and it can’t be reformatted, because Row Labels are labels and numeric-format resistant. But impose the Custom format above upon a set of values and they obediently group, as they do in Occupied by Specialty e.g.

nhs4

 

But for other numbers in the workbook, Accounting-formatted grouping does happen, and for some other Custom-formatted values. it doesn’t. I’m duly puzzled, but has to be an explainable take on these several outcomes, but I don’t have it yet, and it requires a think-through; hope to get back to you on it shortly. But I’m on vacation now, and they tell me if I don’t take it now I can’t roll it over. Hope to see you in about two weeks.

Class Attendance, Part 2: British Elites and Their Schools

12 Sep

Now that we’ve righted those faux schooling percentages – that is, the ranges of school-attendance-reporting, whole numbers posing as fractions of 1 – of the achievers enrolled in the Elite Britain study’s spreadsheet –a few additional nuts and bolts need to be tightened in turn (and it goes without saying that the rectifications described in last week’s post work for the Profession by school type sheet in the workbook, too).

First, since we’ve moved past the percent misrepresentations, the field-heading (%) indicators dotting columns E through H need to go. Second, I’m not crazy about the Percentage known field; its informational contribution is trifling, because if necessary those percentages could be derived (Population known/Population size, after all), and a little design parsimony won’t impoverish the narrative.

One noteworthy stat suggests itself: the proportion of university attendees who walked through the ivied doors of Russell Group institutions. These “leading” schools – 24 in number (out of a couple of hundred in the country all told; how the global count tallies depends on designations of institutes and the like) – seemed to have commanded the lion’s share of elites’ tuitions; by simply dividing the summed Russell Group by Any University fields we arrive at 67.55% of all those in the survey who plied some manner of higher education, correlative evidence of an “elite” clustering. On the other hand, of course, correlations don’t prove a relation of necessity between A and B.  In theory, at least, a modestly-backgrounded young person could rightfully earn a place at an estimable school, an attainment which by itself would not serve to clinch the “elite” thesis. Speaking in the abstract, a duly deserved seat somewhere in the Russell Group would simply and only validate the achiever’s just desert. An equitably framed reward system that succeeds in promoting the best and brightest and assembling these in better schools (however understood) could be said to be doing nothing but its job; but of course, our data can’t by themselves point to that ideal.  Sociological robustness would call for research traveling back to the square ones – the economic class inceptions of the 4,200 biographies numbered here, before untrammelled merit could be properly ushered into the equation.

Of course, a purely methodological question jabs at all of the above: the disproportions of individual profession counts yanking that 67.55% in…proportion to their numbers. Residents of the house of Lords and Parliament contribute 31% of the survey complement, but in fact their Russell Group/Any university figure of 70% doesn’t badly skew the overall ratio. And if instead we divide all Russell Groupers by the entire Population known, the aggregate 55.85% (that’s sans the House numbers) compares most comparably with the Houses’  57.73%; not much difference there at all.

And a final consideration might qualify as something of a non-finding, notable precisely in virtue of its apparent uneventfulness. As we have them, the data don’t seem, to fall back upon a perfectly dreadful modifier, particularly actionable.  If anything, they resemble a pivot table, fait accompli – broken out, kind of, into their fields and items (and that’s a nasty equivocation to be sure) in a kind of done-deal finality. As such one could wonder where the data could be taken next, and I’m not quite prepared to signal a direction. Is there such a thing as a spreadsheet that has nothing, or nothing else, to teach us? Maybe, but only maybe, and the matter isn’t wholly captive to a smallness in the numbers that might stunt the analysis. One of the issues here asks after the prospects for aggregation that the data hold out, as the presently stand. One could, for example, perhaps dispatch a pivot table to group the professions and the associated numbers by their respective sizes, but I’m not sure what profit would accumulate thereby. Do university percentages respond to sector largeness or smallness? You’d have to look and hard before you’d put your investigative standing behind anything other than the null hypothesis here.

What might work is a coding scheme that would align sectors by some shared nature, and let the school numbers fan out along accordingly; but that plan too would have to be preceded by some deep thinking.

So here, then, is one of the larger questions: take a pivot table, the product of some concerted, reasonably well-devised intention, and treat it as an initial data set. Where do you take it next?

I think that’s a fair question – at least a reasonably well-devised one.

Follow

Get every new post delivered to your Inbox.

Join 166 other followers