Don’t be fooled by last week’s denouement of the World Series; the baseball season never really ends. The looks at the data from what was once boomed as the National Pastime just don’t stop, including some looks at the World Series itself, e.g. a survey put together by Joel Pozin of the regular-season winning percentages of Series participants dating back to the first series in 1903. It’s here:
Regular Season Winning Percentages of World Series Champions vs. Losers_ 1903-2017 Joel Pozin
The survey in fact contributes one of three small, public-domain Series’ datasets Pozin makes available on the collaboration-fostering data.world site (you’ll need to sign up for free for access to the other two workbooks; note that the percentage data for 1904 and 1994 aren’t there, because the World Series weren’t contested those years. In addition, I’ve appended percentage win-percentage data for the 2017 season to the sheet here.)
The other two workbooks recount the Series winner and loser percentages in their discrete sheets, but they do something else as well: they bare the formulas that return the team winning percentages, formulas that do a slightly different math from that performed by Major League Baseball’s number crunchers. A winning percentage, after all, emerges from a rather simple act of division: Wins/All games played. But Pozin has taken account of the mini class of games that, for whatever reason, culminated in a tie or draw, games that baseball officialdom simply bars from the win-loss calculation. Pozin, on the other hand, admits them to the All-games-played denominator, and assigns a .5 for each tie to the numerator. Pozin’s outcomes thus don’t invariably comport with the canonical percentages, though the differences of course aren’t game-changing, so to speak. But differences they are.
Those formulas themselves are interesting, however. On the loser sheet, for example, the 1903 Series runner-up Pittsburgh Pirates won 91 games, lost 49, and tied one, those respective accumulations committed to cells C2:E2 in the losers’ worksheet. The formula in F2 then declares:
=ROUND((C2+E2/2)/SUM(C2:E2),3)
(Note that the sheet featuring Series winners formulates its denominators this way instead, e.g.: (C2+D2+E2) ). The single tied game recorded in E2 is halved and added to the win figure in C2 to build the formula’s numerator; but in addition, the rounding of the result to three decimals quantifies the value in F2 to exactly what we see – .649, or .6490000.
But one could opine that the cause of exactitude could have been slightly better served with
=(C2+E2/2)/SUM(C2:E2)
followed by a formatting of the result to three decimals, thus preserving the quotient’s precision. The ROUND function forces a substantive pullback in precision – because after applying ROUND, the number you see is truly and only .649. But does my nit-pick here matter? Maybe.
And while we’re deliberating about things formatting, the winning percentages expressed in the workbooks in their Excel-default, 0.649 terms could be made to assume the baseball standard .649 deportment per this custom format:
Now back to the winners and losers in the master sheet I’ve offered for download. A simple inaugural inquiry would have us calculate and compare the average winning percentage of the winners and losers. Rounded off to the usual three decimals I get .619 and .614 respectively, a dissimilitude that is neither great nor surprising. World Series competitors, after all, are the champions of their respective leagues, and so could be regarded as more-or-less equivalently skilled. And while of course only one team can win, the best-of-seven-game motif (in fact four series proceeded on a best-of-nine basis) could be ruled as too brief to define the truly superior squad.
But additional comparisons may point to other interesting disparities. If we pivot table and group the winning percentages in say, ten-year tranches:
Rows: Year
Values: ChampWinRatio (Average)
LoserWinRatio (Average)
(Remember that no Series was played in 1904 and 1994, and that the custom format we commended above must be reintroduced to the pivot table if you want it in play here. In addition, of course, the 2013-2022 tranche, forced by our grouping instruction to embody the ten-year amplitude, comprises only five years’ worth of data).
I get:
Note the rather decided scale-down in winning percentages set in motion during the 1973-1982 tranche. Do these smallish but apparently real curtailments hint at a press toward parity among baseball’s teams that dulled the advantage of elite clubs? Perhaps the advent of free agency following the 1975 season, in which teams’ contractual hold on their players was relaxed, played its part in smoothing the distribution of talent.
But another, if distantly related, exposition of the trend could also be proposed. Baseball rolled out a post-regular-season playoff system in 1969, one that now qualifies ten of its 30 teams each season; and that broadened inclusiveness overwrites any guarantee that the clubs with the best regular-season records will find themselves in the fall classic. The 1973 National League champion New York Mets, to call up the extreme example, beached up in the Series with a regular-season winning percentage of .509. But they won the playoff.
Now let’s return to my quibble about the deployment of the ROUND function, and my counter-suggestion for simply calculating win percentages without it and formatting the numbers to three decimals instead. Working with Joel Pozen’s rounded figures, we can write an array formula that’ll count the number of World Series victors whose regular-season percentage exceeded that of the losers each year:
{=SUM(IF(C2:C114>E2:E114,1))}
The formula assigns the value of 1 to each value in the C column – the one containing Series winners’ percentages – that tops the corresponding value in E, the losers’ column, and then adds all the 1’s (note: the formula can surely be written in alternative ways). I get 57, meaning that according to Pozin’s percentages a jot more than half of all the 113 World Series wins went to the team with the higher regular-season percentage. Again, not a shocker, but worth demonstrating.
Now if we array-calculate the number of Series winners with the lower of the regular-season winning percentages:
{=SUM(IF(C2:C114<E2:E114,1))}
I get 53 – but there are 113 Series for which to account here, and 57 plus 53 computes to 110. It turns out then then that in three Series – the ones played in 1949, 1958, and 2013 – the competing teams appear to have achieved the same regular-season win percentage.
And for two seasons, 1949 and 2013, the winner-loser identity is inarguable – the teams in those years had precisely the same numbers of wins and losses. But if we actuate my formulaic alternative, in which the drawn-game half-win-credit is retained but the ROUND function is shed for the reasons advanced above, we learn that the 1958 winner New York Yankees played to a .596774 percentage (rounded to six decimals), because they played a tie game that year; but the losing Milwaukee Braves steamed into the Series with a .597402. Seen that way, the 1958 Braves are the 54th losing team to best the winner’s regular-season mark.
The difference here of course is hair-splittingly slim. But if your head-of-the-class high school average was .000628 points greater than that of the runner-up, would you be willing to share your valedictorian honors by agreeing to a round-off?
Leave a Reply