Word Counts Refined: Part 2

30 Apr

It was not a week ago that I put the matter before you, in our course of our stone-unturning, spreadsheet-pressed search for keywords:  In light of the methodology we’ve been touting, would it also be possible to uncover multi-word phrases, e.g. “tax relief”, in addition to the singletons “tax” and “relief” we’ve already learned to count?

It seems we can, and the means of choice toward that end is an array formula, a feature I’ve sedulously avoided until now because of its rep for conceptual obtuseness, a not entirely misdirected allegation. And while it’s true that array formulas have their moments –some of them rather hairy – and equally true that I’m not the industry’s go-to guy for array constructions, but you’ll nevertheless profit from making at least a passing acquaintance with them.

Array formulas do the work of multiple formulas, but muscle them into but one expression. In fact, you’ve probably used array formulas already without having been formally introduced to them, because a good many of Excel’s built-in functions buy into the array programme. By way of very good example, consider SUMPRODUCT, which multiplies pairs of values and then, befitting its name, add them all up. Thus if we subject these entries in A4:B8

text21

to SUMPRODUCT, written as follows:

=SUMPRODUCT(A4:A8,B4:B8)

Excel will multiply the pairs A4 by B4, A5 by B5, etc., and cap the process by totalling all those products, turning out in this case 1369. The by-the -book alternative, on the other hand, would of course require the user to compose five multiplication formulas, e.g. =A4*B4, =A5*B5 (positioned say in the C column), and so on, and bring these together with a =SUM(C4:C8); but SUMPRODUCT subsumes all that activity in its solitary expression (and imagine working with 500,000 pairs of values instead).

Thus SUMPRODUCT qualifies as an array formula because it scoots through all those multiplications, each one glancing off the same pair of parentheses and bounding into the same sum calculation. But some tasks call for user-inspired, customized array solutions, and the savvy to build them .

For example, compare these student answers to ten multiple-choice test questions with the correct ones:

text22

These data occupy H6:I16. To deliver an array-derived total of correct answers, I’d enter in a free cell

=SUM(IF(H6:H15,I6:I15,1,0))

Translation: let the formula compare the ten pairs of same-row responses, e.g. H6 with I6, H7 with I7, and so on. Each match delivers a 1, after which all the 1’s are summed (just for the record, the 0 in the expression –signifying the value-if-false contingency of the IF segment – could in fact be omitted, in contradistinction to its indispensability to a standard IF statement. But we’re speaking in Readers Digest terms here, so not to worry. And  note that the above expression was originally mistyped: thanks to Mark Serebriakoff for the error catch).

And when that formula is good to go, don’t put it into play with a reflexive press of the Enter key. At the point of completion, array formulas always require the spreadsheet equivalent of a musical triad instead – Ctrl-Shift-Enter. That chordal touch presumably instructs the formula to do its array thing, which here should return 5. And know that the hallmark squiggly brackets that attend every array formula can’t be merely typed; they properly appear only after Ctrl-Shift-Enter is struck. (For a relatively straightforward introduction to array formulas see this entry.)

Now that our crash course has collided with your medulla let’s reprise the keyword question with which we inaugurated the exercise. In the first instance, we’re looking for a way to count the appearances of a specified two-word phrase in our text.

For the sake of the illustration, I’ve conducted the text-preparation routine I described last week on about 620 words lifted from the Conservative Manifesto, which I’ve pasted them at B2. You can view the set-up here:

 manifesto excerpt

Suppose I’m looking to count incidences of the phrase “We will”, a phrase that seems to proliferate here, and probably not insignificantly; I’ll enter “We will” in D1. And in another free cell I’ll enter, after having remembered the Ctrl-Shift-Enter coda:

{=SUM(IF(B2:B624&” “&B3:B625=D1,1,0))}

Now what is the formula meaning to do? It’s string-concatenating consecutive words in the single-column text range, but doing so again and again, for each pair of words brocaded down the entire column. Note here that the two ranges, or arrays, as they’re officially called, are staggered – B2 is concatenated with B3, B3, concatenates with B4, and so on, thus capturing every stepped pair all the way down the column (the ” ” interposes a space that emulates the phrase as it’s written in D1). But note – and this is rather essential – the arrays are equivalently sized; each one here consists of 623 cells, and that parity is fundamental to the array process.

Thus if any contiguous word pair equals the “We will” in D1 it receives a 1; and once followed all the way through, the array formula brings 16 “We wills” to the surface, and I’d suggest that so broadly distributed and inclusionary a phrase is a hook on which the Conservatives want to hang their hat.  And you can copy the array formula, squiggles and all, the better to mint duplicates that can search for any other phrase(s); and you’ll be pleased to know that the dups will exhibit relative addressing, just like any another expression.

And by way of postscript to Part 1 we can also propose a friendly amendment to the COUNTIF usages we endorsed there. You’ll recall that we nailed together a most basic pivot table last week that would tally the occurrences of every word in the considered text:

Row Labels: Words

Values: Words (Count, of course; the data are textual)

But what had escaped my Starbucks-flooded attention was the most simple but powerful fact that the table itself performs a global COUNTIF; that is, it develops a count for every word in the corpus. And because the all the counts are already there, you need only substitute a simple VLOOKUP that’ll return any particular word count from the pivot table columns.

And you could even just sort the words alphabetically, or just bang out a Ctrl-F. That’s a Find as in Find and Replace; not FIND.

Advertisements

2 Responses to “Word Counts Refined: Part 2”

  1. Mark May 1, 2015 at 10:46 am #

    Nice use of array formula. However I think there is something wrong with your first example formula, or I can’t make it work. The {=SUM(IF(H6:I15,1,0))} seems to be missing something.
    I have duplicated the effect using {=SUM((H6:H15=I6:I15)*1)}. The central part of the formula does the row by row comparison, each resulting in a logical 1 or 0 that is then multiplied by an arithmetical 1 and added to the sum.
    I like array formula, so much action in one short line, but a real challenge to the intellect, when showing people how to use Excel, if I thought they were good enough I would really test them with array formula.

    • Abbott Katz May 1, 2015 at 10:53 am #

      Hi, and you’re absolutely correct. I had omitted one of the arrays. I can correct in it the post and credit you for the observation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: