Making Up For Lost Time: London Tube Lateness Data

22 May

If you haven’t heard it before – and if you’re espying this post in Sydney or Sofia you probably haven’t – you’re hearing it here:  ride the London underground 15 minutes longer than the ride is supposed to take and you get your money back.

In fact, I’d allow that a sizeable crew of Londoners haven’t heard that before either, but it’s true, and if you doubt me look here. Indeed – your intrepid correspondent has collected twice, a pair of lucrative postscripts to some down time I experienced down under.

It’s all true; but how do the powers that be know you’re on the up-and-up when you press your cash-back claim? They know because they’re watching your Oyster card, your ticket through the transit system that means to record your every point of departure and disembarkation, and whether you really were stuck on that Northern Line idling at Tooting Bec on the 17th at 2:12 pm (yes, editor, there is such a stop).

And that means in turn that all the trips of all the descenders into the underground have to be summed and reckoned, with much of that tallying brought to us here:

 Copy of tfl-tube-performance

The dates in the Key Trends sheet kick off in cell A2 from the second reporting period of the 2011-12 financial year, and don’t be flummoxed by the 02 at the head of the cell. That isn’t February, but rather the second of the fiscal year’s 13 28-day reporting intervals, a curious baker’s dozen that I suspect has something to do with 364’s divisibility by 13, and the cooperative absence of any underground service at all on December 25. But the apportionments aren’t quite that neat. In fact the day total for 2012-13 comes to 365 (note the 29 in cell C26, probably a leap-year emendation), while C27 registers a 27-day interval, and C39 and C52 have inherited 30-days stretches.

In any case, Key Trends gives us more to think about than period day distributions – for one thing, that arpeggio of cascading comments, in striking but largely illegible array:


Apart from bidding us to think about how extended textual reflections might fitly reconcile themselves to the data to which they’re annexed, clarity calls for a most simple remedial step – just click the Show All Comments button in the Comments button group (Review tab), and make the comments disappear. To recall any comment in particular, click on its cell and try Show/Hide Comment, in the singular. Why the sheet insisted on its in-your-face comment revelation is a matter to put before Transport for London.

But once you do get to the data you’ll need to run through a number of think-throughs in order to decide what exactly it is you want to learn from them. Consider for starters the Number of Lost Customer Hours entries. These could be divided by the corresponding Excess Journey Times, thus returning the apparent number of passengers experiencing a tardy ride (of course in view of the 15-minute rule most of these won’t be compensated in any event).  You’ll also need to multiply the Lost Hours by 60 so that you’ll be dividing minutes by minutes.  If we command the J column, then, and title it Number Late, we could enter in J2:


(Note: no parentheses required here. Excel’s order of operations of will attend to the math.) That expression gives me 23,609,932.35 or so latecomers, a rather large number to be sure, but one of course that needs be juxtaposed to the total rider figure for any given 28-day period. And to proceed here we need to hazard an educated guess. Transport for London puts the annual tube traffic nowadays at around 1.265 billon riders a year. Giving leap years their due, divide that number by 364.25 (remember December 25 takes the day off) and a day’s worth of tube travel yields an average of 3,472,889.5 passengers, which in turn needs to be multiplied by the Days in Period numbers in C. Simplest tack: enter that average in a free cell and name it daily, then slide into the K column, call the prospective field Pct. Late, and post, in K2:


(This time you do need the parentheses. Note in addition the 3.4 million daily average is a current one, and as such may artificially, if slightly depress lateness percentages for earlier years. The determined researcher could doubtless recover the actual year-specific totals.)

Copy down K and observe the notable flux among the percentages, particularly the spikes in C38 and C41. Those quantum leaps owe their king-sizes to job actions, and here you would profit from consulting the attached explanatory comments.

How you’d choose to evaluate these lateness proportions, which tend to oscillate across the 20-30% band (though the variation is real – observe the 14.88% for the period ending October 13 2012 in row 20), is perhaps a matter for the glass half empty/full disputants. A rough-hewn null hypothesis might predict that as many rides would pull into their rider-intended stops late as early, by some margin. A minute-late ride is late, after all, and so a 25% aggregate lateness measure might not fairly serve as a full-on indictment of the underground service. Still, late is late, once the baseline, expected lengths of journeys are decided.

But another simple but trenchant measure of punctuality offers itself from the data. We could divide the number of customer hours lost by Operated Kilometres (as Transport for London would spell it)counted in the G column; that is, the accumulated distance travelled by all trains in the period. It stands to reason that all things being equal greater distances traveled associate with more lost hours, and because we have all the data we need here – unlike the Pct Late metric that works with the same daily passenger average throughout – this little fraction may be all the more incisive. We could title the L column something like Ratio of Hrs Lost to KM, simply divide the value in F by the one in G, and copy down.

You’ll also note the matrix of lost customer hours in its eponymous tab, an enhanced compilation of data pushing further back in time to the 2003-4 financial year and treated to an off-the-shelf conditional format. Just one question here: the matrix provides lost customer hours for the first reporting period of 2011-12 – even as the listings for that year in the Key Trends sheet only start at period 2.

Is that what they mean by lost customer hours?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: