TWDOCS: The Comeback

3 Jul

I still don’t know what Google wants to happen when it asks, or tells, me that I’m Feeling Lucky, but either way I’m not sure if it was luck or an inertially-driven pair of hands that fell upon my crumb-dappled keyboard and tapped twdocs.com into my URL bar – again. I’d been engaged in the Sisyphean pursuit of the incommunicado site for some time, hopefully expounding its error-messaging links for some sign that something else was going to happen next.

And so on and so on until this week, when force of habit yet auto-piloted me back to twdocs’s still-listed address, whereupon the next click… made something actually happen. A spreadsheet, bearing the results of a Twitter keyword search, the very object I had requested vainly from twdocs lo these many months, offered itself to me in the form of a long-lost, happily-attended prompt. Talk about wow factor; viva inertia, I say.

I’m not asking any questions. Twdocs is back, at least for now, streaming near-real time, key-worded tweets your way in their countable, measurable, eminently usable rows. We’ve been to the site and done that, in October and November 2012, before twdocs went on its unexplained hiatus. That’s when I posted a couple of looks at the site, endeavoring, among other things, to monitor US election-day tweets for mentions of Obama and Romney.

The site is indeed back, and while the keyword search instructions set down in my 2012 posts seem unchanged, a few adjustments have worked their way into the user’s marching orders, starting with the maximum tweet-requisition limit of 800, down from the earlier 1200. Does that pullback matter? I suspect it doesn’t, because in any case you’re not going to net every Tweet out there on a trending theme – even with a 1200-tweet max – and so again, it’s tweet velocity instead – the numbers of tweets per some or other unit of time – that’s going to matter.

And something else. Twdocs now reports its date/time-field data this way:

Thu Jul 03 10:27:05 +0000 2014

But in 2012 that date would have looked like this:

Thu, 03 Jul 2014 10:27:05 +0000

The problem isn’t with the Euro to American date format shift, but rather with the +0000, having usurped the 2014’s previous date-hugging pride of placement. Because in both cases above the dates dial into the text format, I had originally called for a Text-to-Columns maneuver that would salvage real, quantified dates from the expressions; and that measure should continue to work, provided you engage the Do not import column (skip) option with all due savvy; but it’s occurred to me that a simpler, perhaps fleeter move could do much the same. Simply select the time data in their column (that should be B) and boot up two rapid-fire find and replaces, the first searching for the three-character day abbreviation in force, and the blank space following those three characters, e.g.

Thu[blank space]

And replace it with nothing.

The blank space is integral here, as it too stands as a bona fide character that, if ignored, would fail to renovate the entries’ text format. Then find the +0000s, and again replace them with nothing. What’s left in the cells now should submit themselves as quantifiable dates and times to your inspection. (Note that if a twdocs keyword search returns results tweeted through several different days, as might typify a keyword that appears but occasionally across the twittersphere and hence takes its time to accumulate, you may have to run the three-character find-and-replace more than once, in order to replace each day.)

Next, my early twdocs posts promoted a means for timing a given keyword’s velocity (again, October 18) that prevailed upon the SEARCH function and a copy-down-the-column device . Here I’ll advertise what I take to be a more parsimonious take on that chore, using as example a current search for the word Wimbledon, a timely reference, I think .

First, you need to calculate the duration across which the tweets were dashed off. Click in a blank cell – say N1, and name it d for duration, or whatever. Then name the newly-quantified date/time field in B t for time (I’m all for parsimony in range naming, too). Range-name the data in Column A – the one storing the actual tweets – text. Then back in N1 enter:

=(MAX(t)-MIN(t))*24

That figures the fraction of an hour in effect within which the tweets were received. Then enter Wimbledon in L4 (the cells have of course been chosen in the interests of choosing something, somewhere). But why am I doing that, you’re sure to wonder. After all, the original search put to twdocs was for that very word; why look for Wimbledon again inside the spreadsheet? Good question – but we do need the additional Wimbledon search, in order to tighten the imprecision of twdocs’ own initial foray for the word. As I observed in the earlier posts, not every record retrieved by the site will necessarily contain the desired term (I told you I wasn’t asking any questions), and so we need to press another search for Wimbledon, this one under our own steam. Thus enter in M4:

=COUNTIF(text,”*”&L4&”*”)/d

Note the syntactical demands COUNTIF makes of us here. We’re looking for a word nested in a larger collocation of words – the battery of all our tweets – and so COUNTIF wants us to string-concatenate the word, in effect here:

*Wimbledon*

(and COUNTIF is case-insensitve; you could lower-case the W).

Once COUNTIF is understood its outcome is divided by d, rounded off say to two decimals, and the instances of Wimbledon per hour are duly cited. I got 7334.40, by the way. Look for Wimbledon now and your number should of course be different.

Once that’s working, you can enter as many supplementary search terms as you wish beneath Wimbledon in the L column, copying the associated formula for each term to develop a per-hour read for each.

Of course this latter strategy draws a subset from a subset. Once we’ve asked twdocs to find every instance of Wimbledon and then go on to interrogate those data for say, Murray or Federer, we’re really in effect hunting for tweets that feature both the words Wimbledon and Murray. But it’s impossible to know how many Murray-laden tweets are out there – the ones, that is, that don’t have Wimbledon as well. Thus if we’re trolling for all Murrays or Federers in the first instance, we’d need to put that question back to twdocs.

And if you want to itemize all the tweets containing both Wimbledon and Murray, after you’ve already called up Wimbledon from twdocs, you can count on COUNTIFS, the multi-criteria extension on COUNTIF. Assuming the search terms Wimbledon and Murray claim cells L4 and L5, for example, you’d enter:

=COUNTIFS(text,”*”&L4&”*”,text,”*”&L5&”*”)/d

That should work. Now will the ballboy please get me my towel?

Comments Leave a Comment
Categories Uncategorized

Search

TWDOCS: The Comeback

Leave a comment Cancel reply

Follow Spreadsheet Journalism via Email

Monthly Posts

My Pivot Table Chapter

Search

TWDOCS: The Comeback

Share this:

Related

Leave a comment Cancel reply

Follow Spreadsheet Journalism via Email

Monthly Posts

My Pivot Table Chapter