Search results for 'toronto'

Phone Numbers, Part 2: Toronto 311 Call Data

2 Jan

No analysis of the Toronto 311 call data can press its task without firming its grasp of the meaning of the fields themselves, not all of which can be fairly described as self-evident.

Let’s see.  Calls Offered appears to mean nothing other but calls received, and Calls Answered appears to mean precisely what it says – namely, a count of the calls which 311 operatives actually attended, though this measure does not want for ambiguity (Look here, in particular pages 10 and 11, for some of the definitional issues, along with a battery of critiques of the 311 operation). Exactly why the unanswered calls lurched into the void ain’t, as the song says, exactly clear, particularly in view of the fact that some of these may have been in fact mollified by recorded messages. Again, check the report linked above.) Call Answered %, in any case, simply divides Calls Answered by Calls Offered.

Service Level %, on the other hand, begs for clarification and enlightenment was kindly provided by Open Data’s Keith McDonald, who told me that the field records the “percent of received calls that is answered within the service level goal of 75 seconds”, an understanding that in turn calls for a rumination or two all its own. (Note: I appear to have misapprehended the Service Level %’s formulation, having originally, supposed “that, because the Service Level % often exceeds its companion Calls Answered %, Service Level adopts the less numerous Calls Answered as its denominator, and not Calls Offered.  Thus if anything the Service percentages overstate 311’s responsiveness to its public, by discarding unanswered calls from the fraction. For example, the 100% Service Level boasted on January 2 obviously invokes the 2039 Calls Answered, since another 325 calls on the 2nd were never answered at all. You can’t extol a 100% Service Level for all 2264 calls if some of them simply weren’t responded to, for whatever reason.” However, read Keith McDonald’s comment below.)

Second, note that unlike the Calls Answered %, Service Level % presents itself in percent terms without pointedly identifying its numerator -Calls Answered times Service Level %. If, then, you go on to merely calculate the average of the Service Level percentages as set forth, you run the risk of appointing disproportionate weight to days with relatively few calls. I’d thus open a column between H and I, call it something like Calls Meeting Service Level, and enter in what is now H4:

=G4*E4

and copy it down I, remembering to mint the column in Number format, sans decimals. These quantities can now be properly summed and averaged, etc.., thus honoring their real contribution to any calculated whole.

OK – so how about those service levels, say by Day of Week? We could try this:

Row Labels: Day of Week

Values: Calls Answered

Calls Meeting Service Level (both Sum. Note this field likewise requires a decimal-point reduction, because the round-off we executed earlier won’t extend to the pivot table. Remember that in any case these roundings only format the numbers, without transmuting their actual values.

And here we could cultivate a Calculated Field (see my August 22, 2013 post, for example, for some how-tos), called Service Pct or some such:

                       31121

formatted perhaps in Percentage terms to two decimal places:

31122

Not much variation here, the results baring a response consistency of sorts. Even the quieter Saturdays and Sundays cling to the 80% notch on the dial, goading questions about weekend staffing levels and call intake procedures. Indeed – correlating Calls Answered with Service Level %:

=CORREL(F4:F368,H4:H368)

evaluates to an indifferent -.135, or a very small negative association between numbers of calls and response alacrity, per the 311 guidlines.

But days of the week are interlaced across the year, after all, and as such might be driven by more potent seasonal currents. If we substitute Date for Day of Week in Row Labels, and group the former by Month:

Here the picture changes. Service percentages associate far more tightly with call volume (operationalized here by Calls Answered), at -.55 (I simply ran a CORREL with the 12 Calls Answered and Service Pct cells above); and that makes rather straightforward sense. More calls to answer, longer response times. And if you add Calls Offered to the above table (Sum, again) and correlate that parameter with Service Pct, you’ll get -.716, a pretty decisive resonance. The more calls in toto, the fewer that penetrate the 75-second transom.

What about Calls Abandoned > 30 seconds? According to Keith McDonald, “Abandoned means caller hung up. In high call waiting periods, customers hear the upfront message and decide to hang up [following a wait of at least 30 seconds, apparently. I’m not sure about callers who gave up sooner].” (Note as well that Calls Abandoned + Calls Answered won’t add up to Calls Offered.) Here Calls Abandoned % builds its percentages atop the Calls Offered denominator, and the abandonment numbers seem to range all over the place.  Time for a more systematic look, e.g:

Row Labels: Date (grouped by Month)

Values: Calls Offered (Sum)

Calls Abandoned > 30 sec (Sum)

31124

Correlation between Calls Offered and Calls Abandoned: a ringing .745. Not surprising, perhaps, but when the numbers jibe with common sense – a less-than-ineluctable state of the data – it’s kind of pleasing. And if you slot in Average Speed of Answer (sec) (summarized by Average):

 31125

Correlation between that field and Calls Offered: a mighty .881.

Doesn’t a certain existential charm devolve upon confirmations of the obvious?

Phone Numbers, Part 1: Toronto’s 311 Call Data

26 Dec

Sometimes one catches a spreadsheet in mid-construction; that is, the sheet exudes traces of a preparatory, pre-publication primping that has already driven the data some way from their native, unrefined state, before they submit themselves to the reader.

And so it appears to be with the Toronto’s open data site’s resume of 311 calls placed in 2012, that three-digit sequence serving as the number of choice in many cities for their denizens’ non-emergency requests and notifications. For the Toronto spreadsheet, click here:

 Toronto 311 calls

It’s clear that the sheet has undergone a set of pre-publication tweaks, born both of practical need and some grander data-organizational strategy. For one thing, the calls are merely totalled by day, even though Toronto surely owns the details to each and every one of the calls. On the other hand, an Excel sheet’s 1,048,576 rows simply aren’t fit for the purpose of engaging with the 1,390,412 entreaties put to 311 in 2012, and in any event the data were originally tacked onto in a far-smaller-capacity Google doc sheet. And it was obviously decided in turn that an alternative data vehicle, e. g., a dedicated database, would rather be misapplied here (note: there are other, larger, 311 worksheets on the Toronto site, but these nevertheless report only a fraction of all calls; see, for example, this page).

Note too the sheet’s Day of Week and Weekday fields, both worthy parameters, to be sure. The minimalist in me would have renounced these fields, with the understanding that these data could have been user derived anyway (the former via the WEEKDAY function, for example); but that’s a churlish quibble because Toronto decided to do the work for us, and the fields serve us well. But either way, the data do indeed seem to have been subjected to a prior think-through.

Now before one moves to actually do something with the data – and there are in fact interesting things to be done – we need to perform some supplementary tweaks of our own, starting with the standard column autofit. And because column A contains exactly one data-freighted cell – A1 and its misshapen text-wrapped title – and because as a result that column will feature by default in any pivot table – I’d simply delete the column. And if you sort the Date field from Oldest to Newest you’ll discover that the last six records pull us through the first six days of 2013; and because they do I’d shoot a row between 368 and 369, thus dismissing the 2013s from the dataset.

Then you can start to do what you do best. For a first consideration, try simply breaking out all calls by the Weekday/Weekend binary:

Row Labels: Weekday/Weekend

Values: Calls Offered (Sum)

Calls Offered (again, this time by Show Values as % of Column Total):

3111

 

(Note that the dataset bears no record for January 1, 2012; and 2012 was a 366-day year. And note in addition the 1,368,086 call total has shed a few smidgens from the 1,390,412 figure adduced earlier; that’s because we’ve brushed away the 2013 days.)

I don’t know about you, but I’d call that interesting.  Left to “chance”, weekend calls would be expected to comprise 2/7ths, or about 28.6% of all 311 calls, but the actual weekend slice grabs a far slimmer apportionment. Perhaps fewer operators are standing by on those two days, or perhaps the good citizens of Toronto have decided en masse against troubling their city with their less-then-emergency inquiries away from normal business hours, though 311 maintains a 24/7 ubiquity.  Consider this question your extra-credit homework assignment.

Now exchange Day of Week for Weekday/Weekend in the table. I get:

3112

 

This bit of clarifying nuance exposes Sunday’s mere 4.61% of all calls, and also points up Tuesday’s modal call accumulation. Note as well the slow but vivid Tuesday-Friday diminution.

What about some seasonality? Substitute Date for Day of Week, grouping by Month:

3113

 

 Again, the variation is substantial, with February callers the least vociferous, even controlling for that 29-day month. It’s clear that demand crested during the summer and troughed in winter, even in spite of the snow-fomented complications likely to beset a Canadian city (again, review the 311 home page for a taxonomy of complaint types.) But then again, Toronto, unlike say, London, is snow-ready.

A last look for now. Reinstate Weekend/Weekday into the Row Label area, empty Values, and roll in Average Talk Time (sec) (Summarized by Average, and formatted to say, two decimal points):

3114

We see that, for what it’s worth, weekend calls took up notably less of 311’s time, about 23 seconds less, or about 10.5%; and those seconds add up across the workday. What explains the differential?  Do weekend calls, perhaps concomitant with their sparsity, recount simpler, or at least more easily describable, complaints or requests? That’s a good question, if I do say so myself, and an answer would require a species of per-call data that hasn’t found their way into our spreadsheet.

Be that as it may, I hope to say more in part 2. In the meantime, let me put you on hold.

Getting There: UK Worker-Commute Data

1 Aug

You say you can’t get there from here? You better find a way, my friend, because your job is there and staying here won’t pay the rent; and the UK’s Office for National Statistic’s data on work commutes tell us much about the daily to-ing and fro-ing that takes its labor force to its appointed rounds. Download all of that here, by clicking on the Commuting Patterns link.

http://www.ons.gov.uk/ons/datasets-and-tables/index.html?pageSize=50&sortBy=none&sortDirection=none&newquery=commuting+flows

(If necessary, you’ll want to save the file in xlsx mode. Note there are other official looks at commutation data out there, e.g. here, and the numbers elsewhere may differ. As usual, the methodological fine print is key.)

The workbook discretely devotes its several sheets to incoming and outgoing commuting flows from and to about 380 local education authorities (that’s a UK-specific entity, in London’s case to its 33 boroughs), and for the years 2010 and 2011. The numbers before us were extrapolated from the country’s Annual Population Survey data, and as such are beholden to the confidence intervals the book describes in its Metadata sheet. Moreover (and thanks to the ONS’s Alexa Bradley on this), that sheet brings some significant estimates issues to our attention.

The Inward commuting sheets (Table 6, for example), drape workplace destinations down column A, with the localities from which workers head there named in B. The Outward data flip the parameters, commending Place of Residence to A instead. But it occurred to me that the Inward/Outward numbers could be brought together in a single sheet, preparing us to study each locality comparatively for its I/O movement as a result. With those marching orders in mind I tried this move for the 2011 data: I copied and pasted the Outward flow data in Table 8 to and beneath the Inward rows (Table 6), deleting the newly expendable header row in 9190. I then pushed the A column aside via an Insert, entering an o for outward in A9190 (now home to a usable record, after all) and copying from there down the remainder of the column. Back in A5, the row upholding first Inward record, I entered i and copied down (double-clicking the fill handle will take the copy only as far as the empty cells, and won’t touch all those Os below.), naming the late-coming field Inward/Outward, or something like that. The plan then was to pivot-table the data by locality, proceeding to break these out by the workers emanating from, and steaming toward, each one.

Now we need to think about that data graft. After all – the Inward/Outward sheets aren’t structural equivalents. Columns A and B in the one sheet have been mirror-imaged as B and A in the other; and so my copy-and-paste appears to have admixed residence data with destinations, and vice versa.

The allegation is true, and at the very least seems to expose itself to the charge of bad form. But in fact the ostensible hybridization of the Workplace field is nothing more than, for example, the multiple listing of a student’s name in a data set strung to a nearby column naming various academic subjects and a third field setting forth respective test scores. Don’t be disoriented, then, by the Ins and Outs and their literal semantic opposition. Understood in spreadsheet field-item terms, In and Out are equivalent to Sociology and Physics – different items, and that’s all.

But if you’re not mollified yet humor me and try this pivot table:

Row Labels: Workplace

Column Labels: Inward/Outward

Values: Number of Commuters (Sum)

(I’d turn off Grand Totals here.)

Yes, the Workplace field is rife with Places of Residence data, but again, the Is and Os keep each directional type at arm’s length.

Once put into play, the data draw some most notable, and in at least one case colossal, symmetries. Look at the numbers for the City of London and its minute residential population, for example: An inflow of 616,719, an outflow of 1,303. Broadland, on the other hand, a community in Norwich north of London on the UK’s east coast, “loses” far more commuters than it gains, mustering an I/O split of 32,000 to 60,000. My home authority of Barnet is likewise top-heavy with the outs. Unsurprisingly, larger cities receive more commuters than they give.

And as such, precision might be better served and captured by a simple inward-to-outward ratio, in which a place’s Inward number, for example, could be divided by its Outward. But while the math is simple, making it happen in this pivot table requires a bit of a step back.

What’s needed – though as always I’m prepared to be persuaded by possible alternatives, too – is a Calculated Item, not to be confused with a Calculated Field, an instrument we’ve dusted off on several occasions including here. Because we want to divide a locality’s inward numbers by its outward – that is, fractionate a particular item in a field with another item in the same field – we’re in effect hatching a brand-new item, and one that appears not at all in the source data set.

To start hatching, click in the Inward/Outward column label strip:

io1

Next, click PivotTable Tools > Options > Fields, Items, & Sets > Calculated Item:

 io2

Substitute a field name if you wish for Formula 1 (i.e., In-Out Ratio; you can introduce a legal space here)

Double-click the i, thereby lodging that item name in the Formula space; then enter a / in there, double-click the o, and click OK, seeing to it that each locality’s Inward total gets divided by its Outward companion. You should see, in excerpt:

io3

 

 

Note that I’ve formatted the numbers to two decimal places – that is all the numbers, whether I’ve wanted to or not, and even the Inward and Outward numbers for which the places are utterly dilatory. That’s because the formatting here is field-wide, and since i, o, and In-Out Ratio are items sprouting in the same field, the decimals are indiscriminately applied.

And note the City of London’s In-Out ratio of 473 to 1. A nice place to work, but folks just don’t seem to want to live there.