Archive | September, 2018

Tracking the Fields in Track and Field: Olympic Athlete Data, Part 3

21 Sep

Call it a footnote, addendum, postscript, or rather call it for what it is – unvarnished forgetfulness – but my longitudinal appraisals of Olympic athlete heights and weights last post walked right pass the most obvious case – basketball.

What other sport, after all, jams its goal ten feet above terra firma, and so casts its net – pun intended, heh heh – for the tall and majestically wing-spanned? Jump on this pivot table, then:

Rows: Year

Columns: Sex

Values: Height (formatted to two decimals)

Weight (format as above)

Slicer: Sport > Basketball

Nothing that women’s basketball took off in 1976 I get, again in excerpt:

oly31

You’ll observe that, dating from the 1976 baseline, the heights of both women and men divulge a five-centimeter rise (about two inches), more or less; but scroll back among the men’s data to the 1936 inception point and the end-to-end differential spikes to almost 18 cm, or about 7.5 inches. Male basketballers have also gotten about 41 pounds heavier across the same interval; measured from the 1976-2016 gender-shared interval, the net average increase for men comes to around 25 pounds, and 11 pounds for women. And for what it’s worth, the average women hoopster nowadays is taller than the typical male competitor in 1936.

And while I was browsing, it occurred to me that at least one other sport likes its entrants long and tall – volleyball, introduced to both genders in 1964. Slice for volleyball and:

oly32

The average 1964-2016 height boost for women and men: a notable, nearly-identical 12-plus centimeters.

Now you’ll recall that Part 1 of this trilogy issued a clarification about the data set; namely the fact that many athletes are recorded multiple times in a given Games, if they competed in a plurality of different events in any year. I allowed then that the reasons for counting the same entrants repeatedly in the various pivot tables I compiled could be defended, and I proceeded from that conceptual springboard.

But that policy could be reframed and executed anew. If in fact unique Olympic appearances of athletes by Games could be shaken out – that is, if we counted but one instance of each athlete per Games – would the age and gender distributions I reported in Part 1 depart materially from the ones I could collate here?

Let’s see. First, we could shear the redundant athlete records from the data set via the stalwart Remove Duplicates routine, selecting the records by their ID and Year fields; that decision should preserve one iteration of an athlete for each Game (if you want to give this revamped data set a good you may want to save the workbook under a new name). Going ahead, that run-through discharged more than 80,000 now-excess records from the set, leaving a mere 187,000 or so behind to submit to a round of pivot tables replicating the batch I minted in Part 1 (in the interests of presentational lucidity I won’t paste the Part 1 tables here; they’re still back in there whence they came, though).

I began with a breakout of average athlete ages by gender and sport:

Rows: Sport

Columns: Sex

Values: Age (average, formatted to two decimals)

I get, this time (in excerpt):

oly33

The differences are very small and indeed in most cases simply not there at all, and for a good reason: most athletes tried their hands and feet at one event per Games. Small differences do attend the Diving averages, because some divers essayed multiple events in the sport (remember Sport defines the general category beneath which the various events are gathered). You can’t see the figures above, but even the age averages for swimming – a sport in which entrants often plunge into several pools – are near-equivalent, at 19.48 and 21.38 for the entire data set and 19.53 and 21.51 for the unique compendium.

Since the numbers appear to match so thoroughgoingly let’s move to but one more comparison, the alignment of age averages by gender and year of Games. Replace Sport with Year and I get now in excerpt:

oly34

You’ll see again that the divergences are largely minute, though we the unique average here for women in the 1932 games – 26.99 – distances itself appreciably from Part 1’s 29.22 (the respective counts for the two tables: 369 women in the full data set, a significantly trimmer 222 in the unique-record rendition).

But the redundant-record entanglement snares us anew if we move to answer perhaps the most straightforward of the questions we could ask of the data: How many medals has each country managed to win, and by which type? (One preparatory briefing before we advance toward our set of totals – you’ll note the outpouring of NAs flooding the Medal column; but here that classic placeholder in lieu of an unavailable datum means, more meaningfully, that the record in question belongs to an athlete who failed to win a medal.)

But the medal question’s defining detail consists of the means by which the country-by-country medal counts are to be totaled. Consider an obvious case: the United States men’s basketball team brought back the gold medal in the 2016 Games – but that victory, for standard reporting purposes, comprises exactly one medal. Yet the Kaggle data set marks the gold attainment for each of the 12 members of the team, thus unleashing a massive, prohibitive overcount of the medal numbers for this and any other sport contested by teams of any size. That sort of capitulation to the data as they stand can’t work, of course, and thus calls for another dial-up of Remove Duplicates (I’m assuming you’re working with the original, full data set), this time selecting for Year, Event (not Sport, each one of which again spans several events), and Medal, because we want to sieve but one of medal dispositions for each event outcome. Implementing the routine rid the data of over 246,000 redundant records this time, per the selection criteria.

Following that massive jettison, we can apply the remaindered records to this pivot table:

Rows: NOC

Columns: Medal

Values: Medal

(Note: NOC, or National Olympic Committee, assigns a three-character abbreviation to each participating nation, the legend to which may be found here. I’ve applied NOC, and not Team, to the table, because the latter parameter also admits, for whatever reason, the names of actual teams that fielded contingents at the Games, e.g. the Bagatelle Polo Club Paris).

In view of the preemptive steps we carried out above, the table should now properly cross-count medal victories by country, e.g. in a pinched excerpt:

oly35

It’s starting to look intelligible, but the article isn’t quite finished just the same. For one thing – and we’ve seen this before – the curious sort priority accorded MAR and MON, or Morocco and Monaco in full, needs to be explained. Those countries are stacked atop the column because Excel reads their codes rather as March and Monday, date data that the pivot table insists on sorting first; and in fact a manual A-Z sort won’t restore them to their proper alphabetical place because they’ll continue to retain what is in effect their mathematical character, and so force themselves again to the head of the list. If you really want Morocco and Monaco to assume their orderly mid-column positions you’ll need to right-click each name and apply the Move option, selecting Down repeatedly until each is bumped into the berth you assign them.

And Move gets to do its thing again for the column entries, because you’ll want Gold – the winners’ precious metal of choice – to station itself at the head of the row, followed by Silver and Bronze. As for NA, I’d filter it out – because Remove Duplicates has preserved but one instance of the non-winners for each and every event and year and that’s misleading of course, because most athletes won’t win any event, and there’s a lot of them.

And if you’re still with me you can sort the values by Grand Totals Largest to Smallest, realizing this hierarchy, in excerpt:

oly36

The United States wins gold for most golds and all medals, and by a margin far more extensive than I would have supposed. URS is a legacy reference to the former Soviet Union, but note the simultaneous tally assembled for Russia.

And I bet you didn’t know that Luxembourg has won four gold medals – including two for Mixed Painting, as many as the United States.

Tracking the Fields in Track and Field: Olympic Athlete Data, Part 2

6 Sep

Have Olympic athletes gotten taller and heavier? Common sense, the hypothesis builder of first, but not last, resort, submits a yes. After all, competitors in other sports seem to have bulked up and out; so why wouldn’t entrants to the global quadrennial enlarge as well?

The conjecture makes sense, common or otherwise, to me, but fortunately more persuasive corroboration is at hand. The Kaggle Olympic athlete data set features height and weight data, thereby adjudging my sense-making a false start.

Nevertheless, a few important caveats need be pronounced upon the data. First, a notably large portion of the height and weight entries – around 60,000 in both cases, or around 22-23% of all athlete records – are missing, and I haven’t paced my way through the data to learn if those ciphers are systematically distributed (and by the way, about 9,500 age records, or 3.5% report no ages either). Second, one must take some care to think about the like-for-like proviso, in view of the turnover of events appearing in this, but not that, Olympic year. There may be little analytical gain, for example, by according a place to the weight and heights of the combined 3,578 aspirants in the Art Competitions last conducted in 1948. One assumes their vital statistics contributed little to their medal fortunes, though one never knows (and since you asked, their average weight came to 165.64 pounds, with a mean height reaching to about 5’9”); but because the demographics for those pastimes simply aren’t there in the later Games their confounding effects must be considered, and probably discarded.

In addition, because boxing necessarily cordons its contestants by weight classes, its 6,047 contestants should be expected to evince little variation along that parameter across Olympics, and they don’t (they haven’t gotten much taller, either).

With those cautions in mind, investigative prudence might justify a concentrated look at those durable sports expected to win a continual place in the Games. We could then proceed to pivot tables buffered by a Sports Slicer that would more afford a cleaner look at the question, one sport at a time.

If we’re satisfied with that recommendation, any pivot table that would spring therefrom sounds like a rather straightforward proposition, but grouping Olympic years by a chosen interval – say a 12-year tranche that would parse three Olympics at a time imposes a small presentational disquiet.

My initial intent, for example, was to group the years into bins of 12 in the interests of averaging heights and weights for three Olympics at a time (but recall that the eventual rethreading of Winter Games into alternative even-numbered years would step up the number for any 12-year stretch). If, then, we move Year into Rows and frame the 12-year grouping, we’ll see:

oly21

Now that bit of consolidation works, but note the disclosure above of some years in which no Games were held (e.g. 1907), and which in fact don’t even appear in the data set. But the pivot table insists on interpolating those years in order to preserve the grouping’s consecutive character, however misleading to the reader. But remember that Row Labels can be edited for display purposes, that is to say you can rewrite the above spans 1896-1904, 1908-1916, etc.

On the other hand, because the data set counts a manageably grand total of 35 modern Olympics, we may be happy to leave the years as we find them in their singular, ungrouped state. If so, we can begin this way:

Rows: Year

Values: Height

Weight (both computing Average and formatted to two decimals)

Columns: Sex

Slicer: Sport

And turn off Grand Totals.

With all those material variables in play – Year, Sex, Height and Weight – the pivot table may tend toward a certain untidiness and hyper-density. Truncating the Average of… labels to something like Avg Height should sharpen its legibility.

Because I’ve delimited my interest to those mainstream sports likely to have been fielded across most or all Olympics, I can aim my first Slicer tick at Gymnastics, mustering these outcomes in excerpt:

oly22

Note, my fellow Americans, that height and weights are expressed in centimeters and kilograms, respectively. (To restate the former in inches you’d need to multiply each entry by .39701, one formulaic way or another. Poundage is expressed by multiplying a weight by 2.20462 (I had long thought the increment was exactly 2.2)). As for the #DIV/0! messages besetting the table that allude to sliced fields containing no data, these can be air-brushed out of sight via the PivotTable Tools > Analyze > Options > Options > Layout & Format tab and leaving the relevant field below blank:

oly23

Returning to the gymnasts, note the decided lightening of women gymnasts from 1948 through 1980, an average weight reduction, so to speak, from 57 to 41.42 kg, or well over 30 pounds. Indeed, women gymnasts were on the average 5.89 kg, or 13 pounds, lighter in the 1980 Games than in the 1976 competition; but the poundage reverts upward again from 1980.

Explaining the variations requires something more than a top-of-the-head, knee-jerk, or any other such anatomically-driven response. Remember for one thing that many women gymnasts are counted multiple times among the data, and so their weights are submitted to the averages as often. Secondly, a survey of the particular gymnastic events in force for any given Games would be in order as well (the Gymnastic heading spreads over a variety of events), as new or abandoned events could have skewed the totals. In addition, the boycott by 66 nations of the 1980 summer Games may have likewise wrenched the average down.

On the other hand, the weight trajectory for male gymnasts is considerably smoother, tossing but one more complication into the mix.

We could next slice for swimming, another Olympic perennial. I get, in excerpt:

oly24

Here we see both men and women getting taller and heavier, with the paired slopes lifting upwards determinedly. That male average weights have burgeoned 16 pounds between 1972 and 2016 and the male-female weight differential has gradually ascended surely means something, though again some deep thinking about conclusions would be in point.

For a look at a winter sport, let’s slice for speed skating, opened to women in 1960:

oly25

Here the fluctuations are gentler, perhaps remarkably so, along with a small but measurable closing of the gender weight differential. You’ll also note that gymnasts tend to be shorter than both swimmers and speed skaters. But swimmers and skaters need to get to their finish lines sooner.