I’m left-handed, sort of, predilection enough to qualify myself for a subscription to the Left-Handers Club newsletter, a miscellany of products and disquisitions about the minority group whose members belong to all the other ones, too. (OK; I’m an in-betweener – writes, eats, and dribbles a basketball with his left hand, throws, bowls, and actually shoots a basketball with his right.)
It was a recent piece in the newsletter that raised at least one of my eyebrows (can’t recall if it was the left or right one), and got me to thinking about possible rejoinders to its central claim – namely, that handedness somehow associates itself with birth month. A survey of its left-leaning readers, respondents gathered via self-selection, measured notable left-handed disproportions for August and July, with smaller precipitates stacking in January and February. Moreover, the survey has nicely packaged its data in spreadsheet form, complete with none other than a pivot table to season the analysis. Let’s devote this installment to the workbook itself, after which we hope to think about the results and position a supplementary data alternative to the thesis. You can take grab the workbook with either hand here:
Note first, and for what it’s worth, the Line field numbering the 3000+ survey records in apparent chronological sequence of submission as counted off in neighboring Entry field. Again for what it’s worth note that the sequence is not unrelievedly consecutive; dozens of gaps poke at it throughout, and so the 3148 records top out in the Line field at number 3348. Does any of this matter? Probably not here but it could, depending on where you’d want to take the data. If, for example, you wanted to group the records by Line number in bins of 100 each, you’d pull up short of 100 in nearly every bin. (I’m not sure what you’d do with the Entry data themselves and their five survey submission days flanked by the 9th and 13th of January, but that’s a quibble.)
More than a quibble, though, could be registered over the piecing together of the DATEVALUE expressions in the Date field. We spent some time on DATEVALUE in my February 21, 2013 post, but that introductory note doesn’t speak to the data contingencies placed before us here.
DATEVALUE remixes a collection of (basically) text data into actual, quantity-bearing dates, but it’s quirkier than I would have supposed. For example(s):
=DATEVALUE(“6/Aug/2013”) (as per European date syntax)
Will both yield 41492, the numeric identity of August 6, 2013. Format the container cell in Date mode and the date obtains (one of those aforementioned quirks allows you even to enter Augu and the like in the expression without impairment, so long as the month name invokes at least its first three characters. The names whip up date values even If the attending year and day data are numeric, and not text. See the discussion immediately below in that regard. I told you it was quirky).
The birthdate workbook, however, concatenates, or combines, three cell references instead, e.g. cell H9:
The references signifying Day, Month, and Year respectively (as per the Left Handed organization’s UK-based time protocol). The ampersands concatenate, or join, all the references into a single expression, along with the hyphens sidled by quotes in our case. The result is captured in H9.
But the formula raises some syntactical questions. For one thing, the + sign grazing E9 is superfluous – and so, for that matter, are the quote-surrounded hyphens. Had the formula read
7/25/1958 would have worked itself out just the same. Edit the formula that way if you want to see what I mean. (Note by the way that the hyphens turn into slashes, because the H-column cells had received this format:
in which the slashes rule.)
Now check out the formulas in the J and K columns, each directing a VLOOKUP to a table in the Countries tab. The plan here is to additionally locate each respondent’s country of birth inside what are labelled the Zone and Hemisphere rubrics, and deliver these to the J and K column. Note the interjection in all cases of the FALSE term (or argument, as it’s officially called), e.g. cell J9:
FALSE is a compensatory means for negotiating a lookup table whose first column – the one comprising the values to be looked up (be they text or numeric) – isn’t sorted ascendingly. FALSE means that the lookup is to confine itself precisely to the contents of that first column, and not entertain any value that squeezes between two rungs in the column. Thus you need FALSE in order to work with this lookup table of students and their grades:
Once you sort column one, though, you can drop FALSE from the formula – truly.
Thus if we sort the first column in the data set on the Countries sheet – and then, in the service of additional formulaic understatement name the range there (let’s call it Regions), our lookup statement slims down to something like this (in J9, for instance):
The expressions in the Hemisphere field imitate the above formula, save one departure – the 2 is replaced by a 3, the column number in which Hemisphere info awaits looking up. But are the lookups as we found them “wrong”, then? Not quite – they all issue irreproachably correct results, after all. But formula efficiency is better served by the rewrite.
Next, push over to cell M9:
What’s this formula striving to do? This: reap four-character yields down the M column by concatenating the respective contents of any given row’s I and E cells (in that order). Thus M9 returns 0725 – ultimately =I9&E9 – but because I9 offers but one character, a 7, the formula tacks on a zero (remember that the result is textual and possesses no direct quantitative standing) in order to pad the yield to four characters; and should the day value in E likewise be single-digited, the formula rolls in a zero there, too.
But again, the formula could be pared, first by formatting both the E and I column thusly:
As advertised, you’re looking at a custom number format comprising a two-digit 00 code. That code supplies a 0 alongside any single-digit number, but if the number already features two digits, the format simply leaves the value alone.
Once the E and I values are so formatted the data will take on this appearance:
Then get back to M9 and enter:
And copy down the column. I get, in excerpt:
But that isn’t working, is it – not if we demand four-digit entries in each and every cell. It isn’t working because the & sign, which concatenates values in specified cells, demotes the values into text, and the 00 format only attaches to bona fide values (keep in mind that combining a 7 and a 25 here isn’t an act of addition; rather, it’s stringing the 7 and 25 together). The way out? Try this:
And the 725 acquires numeric status. Then format the M column:
Four zeros this time, because you’re aiming for four-digit numbers.
Got all that? This stuff is likely to be on the mid-term.