Archive | April, 2019

The Hockey Stick Effect: Wayne Gretzky’s Goals, Part 2

12 Apr

There’s another parameter-in-waiting pacing behind the Wayne Gretzky goal data, one that might be worth dragging in front of the footlights and placed into dialogue with the Date field in column B. National Hockey League seasons bridge two calendar years, generally strapping on their blades in October and unlacing them in April. For example, Gretzky’s last goal – time-stamped March 29, 1999 – belongs to the 1998-1999 season, encouraging us to ask how those yearly parings might be sprung from the data, because they’re not there yet.

Of course, a catalogue of Gretzky’s season-by-season scoring accumulations is no gnostic secret; that bundle of information been in orbit in cyberspace for some time (here, for example), and so developing those data won’t presume to teach us something we don’t already know. But the seasonal goal breakdowns could be joined to other, more putatively novel findings awaiting discovery among the data, and so the exercise could be justified.

So here’s my season-welding formula. Pull into next-available-column R, head it Season, and enter in R2:

=IF(MONTH(B2)>=5,YEAR(B2)&”-“&YEAR(B2)+1,YEAR(B2)-1&”-“&YEAR(B2))

We’re looking to concatenate two consecutive years, and so the formula asks if the month of any given entry in B equals or exceeds 5, or May, or falls beneath that value. If the former, the year in B is joined to the following year, via the +1 addendum. If the month equals or postdates May, then the preceding years, operationalized by the -1, is concatenated with the year returned in the B column.

The formulas seemed to work, but as a precision check I rolled out this simple pivot table:

Row: Season

Values: Season (count, of necessity; the data are textual. The values should denote goal total by respective year).

I wound up with this, in excerpt:

Gretz1

Cross-referencing the results with the Gretzky goal data in the above hockey-reference.com link yielded a pleasing equivalence across the years.

Now for some looks in earnest at the data. Starting simply, we can juxtapose Gretzky’s goals scored at home to the ones he netted in away games:

Row: Home/Away

Values: Home/Away (count)

Home/Away (again, % of Column Total)

I get:

Gretz2

We learn that Gretzky scored a palpable majority of his goals at home, but we’d expect as much. As in nearly all team sports, NHL teams enjoy the proverbial home advantage, winning about 55% of the time – a near-equivalence to Gretzky’s ratio. That is, if home teams prevail disproportionately then their goal totals should exhibit a kindred disproportion, kind of. One difference with Gretzky, of course, is that he simply scored more of them.

And does the distribution of his goals by period pattern randomly? Let’s see:

Rows: Per (for Period)

Values: Per (Count)

Per (% of Column Totals)

I get:

Gretz3

Gretzky’s production appears to mount in games’ later stages (OT stands for the overtime period), but that finding needs to be qualified on a number of counts. We’d need first of all to track Gretzky’s average presence times on the ice; that is, was he deployed more often as games advanced toward their denouements and his valuable self was sent ice-bound at clutch time? And we’d also need to plot Gretzky’s goal timings against the league averages for such things; and while I haven’t seen those data, we can assume they’re out there somewhere.

Next, it occurred to me that a look at the winning percentages of games in which Gretzky scored might prove enlightening, once the task was properly conceived. Remember that, as a consequence of his numerous multi-goal flourishes, Gretzky’s goals scatter across far fewer than 894 games. The idea, then, is to fashion a discrete game count across which the goals were distributed; and that sounds like a call for the Discrete Count operation we’ve encountered elsewhere (here, for example). Once we isolate the actual-game totals – which should be associated uniquely with game dates – our answer should follow.

And this pivot table seems to do the job, enabled again by a tick of the Add this data to the Data Model box:

gretz4

Rows: Result

Values: Date (Distinct Count, % of Column Total)

I get:

gretz5

What have we learned? Apart from the up-front factoid that Gretzky scored in 638 of the 1487 games he played across his NHL career (638 is the numeric Grand Total above, before it was supplanted by the 100% figure in the pivot table; note Gretzky also appeared in 160 games in the World Hockey Association), we don’t know how his when-scoring 64.89% win percentage compares with his teams’ success rate when he didn’t score. I don’t have that information, and don’t know where to track it down. But it too is doubtless available.

For another analytical look-see, we can ask if Gretzky’s goals experienced some differential in the number of contributory assists that prefaced them. That is, players (up to two of them) whose passes to a teammate conduce toward the latter’s goal are rewarded with an assist; and the question could be asked, and answered here, if Gretzky’s assist-per-goal average fluctuated meaningfully.  We might to seek to know, for example, if during Gretzky’s heyday his improvisatory acumen freed him to score more unaided goals than in his career dotage, when he may have been bidden to rely more concertedly on his mates.

Since two Assist fields, one for each of the two potential per-goal assists, accompany each goal, the simplest way perhaps to initiate our query would be to enter column S, title it something like AssistCount, and enter in S2:

=COUNTA(J2:K2)

And copy down. That insurgent field readies this straightforward pivot table:

Rows: Season

Values: AssistCount (average, formatted to two decimals)

I get:

gretz6

Not much pattern guiding the data, but if you want to group the seasons in say, five-year bins, remember that because the season entries are purely textual you’ll have to mouse-select five seasons at a time and only then successively click the standard Group Selection command, ticking the collapse button as well if you wish:

gretz7

Even here, then, the variation, is minute – strikingly so.

Now for a last question we could ask about those teammates who were literally Gretzky’s most reliable assistants – that is, the players whose assist counts top their collaborative pairings with the Great One. The problem here is the two-columned distribution of the assist names, one for the first assist on a goal, the other for the (possible) second. I don’t know how a pivot table can return a unique complement of names across two fields simultaneously, preparatory to a count. If you do, get back to me; but in the meantime I turned again to the Get & Transform Data button group in the Data ribbon and moved to unpivot the data set via Power Query, by merging only the assist fields, e.g.:

gretz8

By selecting Assist1 and Assist2 and advancing to Transform > Unpivot Columns and Home > Close and Load the result looked like this, in excerpt:

gretz9

And of course you can rename Attribute and Value, say to Assist and Player.

Once there, this pivot table beckons – after you click TableTools > Tools > Summarize with Pivot Table:

Rows: Player

Values: Player (Count, sort Highest to Lowest)

I got, in excerpt:

gretz10

Nearly 22% of Gretzky’s goals received a helping hand – at least one wrapped around a stick – from his erstwhile Edmonton Oiler and Los Angeles King colleague Jari Kurri, no scoring slouch either with 601 goals of his own – a great many doubtless the beneficiary of a Gretzky assist. Then slip the Assist field beneath Player in Rows and:

gretz11

Now we learn that more than 60% of Kurri’s assists were of the proximate kind; that is, he was the penultimate custodian of the puck, before he shipped it to Gretzky for delivery into the net.

Now that’s how you Kurri favor with the Great One.

 

 

 

The Hockey Stick Effect: Wayne Gretzky’s Goals, Part 1

1 Apr

What is the measure of greatness? How about 894 records, one for each of the goals driven home by the National Hockey League’s Wayne Gretzky, aka the Great one?

That spreadsheet is as large as it gets for NHL scorers, and Tableau ace Ben Jones has infused the goal count with lots of supplementary background about each and every one of the 894, archiving the data for download on the data.world site here.

In fact the workbook makes itself available in both Excel and CSV mode, the latter requiring a text-to-columns parsing that likens it to the former. Either way, a few organizational points need to be entered.

For one thing, you’ll note that what’s called the Rank field in column A numerically ids Gretzky’s goals, in effect sorting them by newest to oldest. That is, Gretzky’s first goal – scored on October 14, 1979 – has received id 894, with the numbers decrementing ahead in time until his final score – tallied almost exactly 20 years ago on March 29, 1999 – has bottomed out with the number 1. It seems to me – and I suspect you’ll share the opinion – that the enumeration should have pulled in the opposite direction, with Gretzky’s last goal more properly checking in at 894. With that determination in mind I reversed the sequence via a standard autofill, entering 894 in cell A2, 893 in A3, and copying down.

You’ll also be struck by the unremittingly monotonic entries in the Scorer field, comprising 894 iterations of the name Wayne Gretzky. We’ve seen this before in other data sets, of course, being dragged into the data set as a likely accessory to some generic download protocol. Again, you can either ignore the field or delete it. Either way, you’re not going to use it.

And your curiosity will be stirred anew by the blank column-heading cells idling atop columns D, F, and G. It’s difficult to believe that Ben Jones, who doubtless knows whereof he speaks, would allow these most rudimentary oversights to escape his notice, but alternative explanations notwithstanding, the headings aren’t there and must be supplied.

Column D reports a binary datum – whether a Gretzky goal was scored at his team’s arena or at the rink to which his team traveled for an away game. I’ll thus entitle the field Home/Away and proceed to do something about the data themselves, whose cells remain empty when signifying a home goal and register an @ for “at”, that is, a goal netted at someone else’s arena. A pair of finds and replaces – the first, substituting an H for the blank cells, with the second supplanting the @ signs with a companion, alphabetized A – should sharpen the field’s intelligibility.

The headless column F archives game outcomes, i.e. wins, losses, or ties, and so I’ll call the field Result, or something like it. Column G denotes the phase of a game when the goal was scored, either during regulation time or overtime – or so I assumed. But a second thought soon followed on the heels of that hunch, if I may mangle the metaphor: it occurred to me that the Regulation/Overtime opposition simply recalls whether or not the game itself swung into an overtime period, irrespective of the actual times at which Gretzky scored. Could that uncertainty be relieved?

I think so, and I played it this way: first, I named the doubtful field Reg/OT, and ran a find and replace at the F column, substituting Reg for any empty cell therein. I then moved toward a pivot table:

Row Labels: Date (ungrouped, in order to exhibit each date)

Columns: Reg/OT

Values: Date (Count)

What I found is that no game date featured a value for both a regulation and overtime goal, a discovery that goes quite some way toward clinching the second speculation – namely, that the Reg/OT field entries do no more than inform us if the games necessitated an overtime period.

After all, if we confine the analysis momentarily to the games that spilled into overtime, one could most reasonably imagine that a scorer with Gretzky’s gifts would have occasionally lodged a goal in both the regulation and overtime phases of the same game; but the pivot table uncovers no such evidence. For any given date, Gretzky’s score(s) appear in either the OT or the Reg column. Moreover, some of the games – for example, November 27, 1985 – record two overtime goals, a unicorn-like impossibility in a sport in which overtime ends when the first goal is scored. (You’ll note by the way that the overtime-column goals only begin to appear in 1983, when a five-minute overtime period was instituted.)

Thus I’d aver that the Reg/OT field conveys little understanding of Gretzky’s scoring proclivities; all it does is identify games that happened to have extended themselves into overtime, and in which he scored – some time.

The Strength field cites the demographic possibilities under which Gretzky accrued his goals: EV refers to even strength, when both teams’ numeric complements on ice were equal, PP, or power play, during which the scoring team team temporarily outnumbered the other after a player was remanded to the penalty box, and SH or shorthanded, the rarest eventuality – when Gretzky scored while his team was outnumbered.

I do not, however, know with certainty what the EN entry in the Other field represents even though I probably should, and I see nothing in Data World’s data dictionary that moves to define it. It may very well stand for end, as in end of game, however; each of its 56 instances are joined to goals there were scored with fewer than two minutes left in their respective games. EN may then stand for scores achieved after the opposing goalie skated off in a losing cause and was replaced by offensive player, in order to buttress a desperate try at equalizing the game. Indeed – all 56 of the EN goals were scored in wins by Gretzky’s team.

As a matter of fact, I think I’m right. Filter the Other field for its ENs and look leftward at the Goalie field in L. There’s nothing there.