We’ve batted this ball around before, but those hacks were taken on other fields. Still, a recent (UK) Times piece by Daniel Finkelstein on birth order and its association with soccer players’ ascent to the British Premiership league returned the analytical ball to me on a different court – in this case the one earmarked for tennis.
We’ve looked at tennis, too, but with a consideration of country and age-driven breakouts of mens’ tennis players – not their birth months. So I booked some time on the tennisabstract site and its current, online-sortable rankings of the male of the species, which you can copy and paste from here.
The rankings seem current indeed, by the way; an ascendant Andy Murray in the pole position attests to their recency. In search of some deep background on the matter, I Googled my way into the menstennisforums site, and its precedent discussion of the birth-month-rankings relationship (you need to join the forum, by the way; a free enrollment entitles you to limited access to its holdings). In this connection a Taiwanese contributor screen-shot this birth-month-rankings distribution for 2014 player-rankings data:
We see that the birth months of all ranked players skew heavily toward the first half of the year, and rather discernibly, though occupants of the top-100 exhibit a far evener natal distribution, among that far smaller sample (if in fact the cohort can be permissibly understood as a sample. A sample of what, after all?) Yet 54% of the top 500 present a first-half birth certificate, as do 55% of top-1000 position holders. The proportion for all 2221 ranked players: 56%. Something, then, seems to be at work. So what about 2016 data?
That sounds like a question we could answer. But before we give it a try, a pre-question of sorts could be posed at the activity: does it pay to bother? If the 2014 data above have been faithfully compiled – and they probably have – would much interpretational gain be realized by another look at the men’s rankings, but two years’ later? With a player cohort exceeding 2000, would statistical sense be served by recounting the birth month distributions?
Well, they said Clinton would win, too. Distributions change, and testing the data anew – which after all are not wholly coterminous with 2014’s player pool – is worth the try, especially since we’ve budgeted for the project (a bit of blog humor, that was).
So let’s see, starting with this pivot table (note: 13 players have no birth dates to report, and are to be filtered away throughout):
Rows: DOB (grouped by Months only)
Values: DOB (Count, then % of Running Total In (this against the DOB baseline, the only one undergirding the pivot table. Turn Grand Totals off, too).
The running totals’ month-by-month accumulation indeed emulates the 2014 56-44 first/second-half yearly breakout, along with the respective monthly contributions to the whole. No surprises, then – but replication does have its place.
And how do our month distributions compare with the 2014 top 100, 500, and 1000? We can start by dragging DOB into the Columns area and grouping these into bins of 100, retaining the running total effect. Isolating the first bin in the screen shot, I get:
Here, and unlike the 2014 figures, the first/second-half differential breaks 59-41%, comporting with the rankings’ overarching tendency, although again, of course the universe of 100 players will not mollify a statistician.
For the birth-month distribution for the top 500, group the rankings by that interval:
Pretty much more of the same. Then group by 1000:
The approximate 56-44 weighting runs through the data and its several granularities; and remember that the third, 2001-3000 bin, comprises only 65 players.
Now what if we isolate the contingent from the US? We’ve learned in a previous post about the August birth-month effect that seems to prefigure the career prospects of baseball players from that country. First, in view of the likely diminished US-specific aggregate that’ll sprinkle just a few numbers across the rankings I’ll remove Rank from the table, introduce a Slicer for Country and click USA, and restore Grand Totals. I’ll also tap DOB a second time for Values duty, one instance to convey the straight sums, the other to record that running column percentage. Here I get:
Note first of all that only 164 Americans appear among the 2087 ranked players, around 7.9% of them all, even as that proportion leads all nations. Second we see that no Jan-Jun differential obtains for the US, though the 23 Americans born in October could perhaps be wondered about.
But the global birth-month disparity holds, and as such calls for an accounting. Tennis players, after all, are among the most international of sporting populations, the rankings admitting players from 98 countries. The simple, but yet-to-be-substantiated hypothesis, would maintain that January 1 cut-off dates for age-specific tennis youth programs advantage older players, but that’s an early surmise. (Note by the way that UN birth data by month across the 1967-2015 periods reveals no January-June skew.)
First conclusion: more work needs to be done here. And while we’re at it, think about Michael Grant, an American ranked 836 and born in 1956, having earned his highest rank of 96 in…1979. Well done, Mr. Grant, I’d say – and he was born in Februrary.
But what about women players? Good question.