First principles: before you subject a dataset to your imposing, if caffeinated, spreadsheet acumen you have to actually get the data. But that blitheringly obvious stipulation is, nevertheless, sometimes easier stated than achieved.
It’s true that most open data sites aim to please and affably release their holdings, via a tick of a well-placed and intelligible Download or Export button, or something like it. But there is a particular brand of spreadsheet manqué that issues its data in stages – that is, it mothballs them across several web pages, and not then in a unitary place.
That segmented storage strategy might enhance the data’s readability, or might not, though if nothing else the multipage design spares viewers from scrolling relentlessly down the page. My question about these kinds of datasets is simple: can they be downloaded directly into a single spreadsheet without contortion?
And that question placed itself before me anew a short while ago when I met up with the US Department of State’s travel advisory dataset, brought to my attention by the Far and Wide site. That dataset looks like this, necessarily in part, given its multi-page distribution:
The set’s three fields seem perfectly limpid (save perhaps the Worldwide Caution entry, which appears to commend a global, and not a country-specific advisory); but their data are spread across five pages, each of which must be clicked separately:
Yes; as earlier indicated, It’s one of those kinds of datasets. And again – can I get it all into my spreadsheet without that most unbecoming of resorts: five copy-and-pastes, rife with pinched column widths and text that needs to be unwrapped? (Note by the way that as of this writing New Zealand’s Level 1 assessment I the data set hadn’t changed, having been last updated on November 15 2018. However, an alert for the country has been entered here.)
When assailed by such questions, a hopeful right-click upon the data gives us nothing to lose and something, perhaps, to gain e.g.
The third option from the bottom looks promising. Click there and we’re told that the data before us will find their way into my waiting spreadsheet. But I seem to recall having viewed that Export instruction elsewhere on other sites, with decidedly mixed results.
But why not. I gave it a click and to my surprise observed something actually happening on my blank worksheet, an ellipsis-freighted “External Data_1: Getting Data…” message that deliberated on screen for several minutes before finally giving way to an actual, unitary spreadsheet, e.g.
In other words, the export actually exported. Note that the advisory data in column A appears on site in a stream of hyperlinks that when clicked directs the viewer to a deeper background on the country in question; presumably the export routine thus executes a Paste Values protocol that strips the records of their more exotic contents. On the other hand, the download did introduce the Date Updated field to the spreadsheet in actual, numerically viable date mode.
So it worked, to my pleasant surprise, though a few significant qualifications of the process need be entered. First, a right-click on the first, Advisory field did not summon the Export to Microsoft Excel option; it was only when I attempted a click over the second or third field that the context menu disclosed the command. Second, and perhaps most importantly, the Export possibility only seems to make itself available when the sites are broached with Internet Explorer. Attempts to coax Export from the menu in the course of perusals conducted in Chrome or Firebox failed, and I am presently unaware of an enabling workaround for either browser, though I am happy to be reeducated on this count.
Now of course Excel and Internet Explorer spring from the same shop in Redmond, and so one could be left to draw one’s own conclusions on the matter. Is this what they mean by “seamless integration”? Seamless, but unseemly? I don’t know.
In any case, my success with the export fired up the obvious follow-on question: could the deed be replicated with similar datasets thronging the internet?
In search of answer, I stopped off at the US News and World Report rankings of law schools and clicked its Table View button, assuming that something like a spreadsheet would eventuate as a consequence. Once in view I again right-clicked the data’s second, Tuition field, revisited Export to Microsoft Excel (remember, I’m back in Internet Explorer), and was delivered this compendium (in excerpt):
While the above tableau and its quiver of alternating blank rows won’t pass Spreadsheet Design 101 (and I’m an easy grader), the “fault”, if I may be so judgmental, lies with the site and not the export, which seems to have captured the data as they appear on site. In short, the export seems to have worked again.
In the interests of building up a scientifically workable sample I turned next (again in Explorer) to the Times Higher Education world university rankings, which look something like this it situ:
Again, a right click upon the data ushered Export to Microsoft Excel to view (though the command appeared after clicking either the data’s first and third fields, but not the second), but this time the spreadsheet registered nothing but a companionless header row for the rankings. I retried the export numerous times, but met with the identical result on each attempt.
I can’t explain the discrepancy, i.e. why some data sets comply with the export request and others resist. That’s not to say an explanation can’t be adduced, of course, but I’ll have to assign that accounting to a web programmer. Clearly, the kinds of data tropes we’ve reviewed here embody a different genotype from that inherited by the standard open-data-site collections we usually confront here, those designed in large measure to download immediately into spreadsheets; and while it’s true that these web-emplaced data are probably meant to facilitate searches for a particular item and little more, it might be a good idea for their designers to anticipate the prospect that someone out there might wish to analyze the whole lot, and by treating the data as a whole, and seeing to it that they find their target in a spreadsheet all the time.
Leave a Reply