Toiling in, and with, the Fields: Restructuring the Data, Part 1

22 Aug

No entomologist am I, but if you have a bee in your bonnet it seems to me you take the hat off – that is, you do something about it. The bee buzzing in and around my haberdashery? You’ve heard it from me before, and now you’re going to hear it again – it’s that curious relation between fields and the items furrowing the fields, or put otherwise, parameters and their values.

Past exposition notwithstanding, the matter could stand a bit of review, although the archivists among you could reopen my September 28, 2012 blog as well for a quick reread. In any case, consider these demo data (and don’t inquire after the all-over-the-place student test scores – they were randomly generated):

con1

To click along with me, download the above data here:

Student grades by subject

Not much to controvert here, you’d allow. But in fact the field status accorded each and every academic subject strikes a blow against spreadsheet efficiency, and for this reason: plow the data into a pivot table, and if you want to calculate a student’s test average, for example, you’ll need to plant each subject into a calculated field – and that’s a bigger-than-necessary, click-happy deal. And ask yourself how this would play were you forced to roll your eyeballs across 50 such fields instead of our half-dozen. Moreover, if you wanted to shift the mathematical operation – say, move from an average to a maximum – you’d either have to edit the calculated field, or calculate a new one.

Restructure the data, however, into something that starts like this

con2

Wherein each subject presents itself as a value in the same, unitary field, and that pivot-tabled average enables itself thusly and simply:

Row Labels: Name

Values: Score (Average)

But yet one could reasonably opine that all that analytical elegance comes with a price, charged to the data entry department. After all, the student/test score matrix comprises 70 cells worth of data (absent the header entries) – ten student names keyed to six scores per. My “improvement”, on the other hand, enjoins the user to enter 180 cells – six test subjects and their associated scores, each wired in turn to a student name. That’s a lot of elbow grease, and merely to lubricate the wheels of a pivot table; and in fact that’s a point well taken, one we need to keep in mind; but just the same, I’ll stand by the idea that the greater analytical good is served by lining the data up in the manner I recommend above. And anyway, if the data are already dubiously arrayed, we want to know what pains we’d need to take in order to overturn that fait accompli, and if those pains are worth it. The question, in other words: what would we have to do in order to get the data as pictured in the first screen shot to look like the data pictured in the second, short of simply re-entering them     ?

Here’s how (understand that we’re just reciting the instructions here, and want to reserve the explanations for another time. You can also read this boffo tip first-hand here, where I came upon it):

  • First, click anywhere among the data.
  • Then click Alt-D and follow by tapping P, thereby blowing the doorknob off one of the great clandestine, closeted, latter-day Excel options, the PivotTable and Pivot Chart Wizard (as the caption attending Alt-D whispers to us, the Alt-D keystroke tandem bores through the time tunnel back to older Excel command sequences).
  • Tick Multiple consolidation ranges, even if our intentions here remain consolidation-free.
  • Click Next and tick “I will create the page fields” (though it appears you can get away here with “Create a single page field for me”, too), an alternative that time-stamps the command’s antiquity. “Page”, after all, is the terminological forebear of “Report Filter”.
  • Select the range of student/test score data and click Add.
  • Click Finish. You should see

con3

a pivot table, its default grand totals set to sum, but don’t worry about that. Now here’s your homework assignment: Look right, at the PivotTable field list:

con4

How many fields do you count? I get three – but remember, we started the table-making process with seven, the student names and their six subjects. Think about it.

No­w double click the grand, grand total – in our case, the value 3812. You’ll see (in part):

con5

And that’s exactly what we want: all the data are columned in the way we want them, with all the test subjects redefined as items, all lowered beneath the same field, and in a real table besides. And now you can do

Row Labels: Student

Values: Value (Average)

And so on. But there’s more to say about all this.

(Note; Many thanks to commenter Andrew Marritt for apprising me about Google (now Open) Refine’s ALT-D, P data-reorienting equivalent. Hope to say more about this in the next post, too.)

2 Responses to “Toiling in, and with, the Fields: Restructuring the Data, Part 1”

  1. Andrew Marritt August 22, 2013 at 10:20 pm #

    Open refine (used to be Google Refine) has an option which does this in a couple of clicks. If’s free and more powerful / easier than Excel at getting data into the right format.

    • Abbott Katz August 22, 2013 at 10:24 pm #

      Thanks for the tip – I need to check it out. Of course, the method described here also takes just a few clicks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: