US Patents: Inventing an Angle on the Data

12 Sep

Humans are nothing if not a resourceful lot; and one telling record of the human will to innovate continues to be written by the United States Patent Bureau, whose tally of certified inspirations submitted by both American states and other countries makes itself available here:

http://www.uspto.gov/web/offices/ac/ido/oeip/taf/cst_utl.htm

You’ll note the site wants us know that “…a user may import this report file into spreadsheet software such as Microsoft Excel to facilitate printing.” Could be, but we’re not instructed how that import might be carried out, apart from the decidedly public-domain select-copy-and-paste to which I’ve resorted here:

united-states-patent-data

You’ll recognize that the State/Country data in column A has been ordered to wrap its text in awkward places; a double-clicked auto-fit of A in conjunction with a column auto-fit should set things straight, literally. You should also delete the wholly blank B column, because it’s…blank.

Now before you get to work you may want to flit across the aggregated data in rows 2-4. Here we see the striking and determined ascendancy of international patents, surmounting the US total for the first in 2008 and continuing to pull away from the host country’s numbers ever since. That means something, of course, perhaps speaking more to the alacrity of knowledge development worldwide than any becalming of the imagination in the States. After all, patents in the US boomed 45% across the 2008-2015 frame, too.

But while those are most interesting data indeed, they don’t need us to discover them; they’ve already been compiled. We, however, may want to work with the state and country details arrayed below by pivot tabling them, and if we do, a round or two of preliminary decision-making needs to be thought through.

First, you’ll note the two rows separating the US state data (and these include American territories such as Puerto Rico as well as Washington DC, the country’s capitol that doesn’t quite hold state status) from those of the other countries, a segregational practice of which you’ll probably approve. States are states and countries are countries, after all, and any admixture of the two might confuse the analysis – maybe. There might be some profit in comparing patents from California with those issuing from Poland, but that prospect leaves me hesitant, though I suspect there’s a minority view to be propounded here.

But assuming you’re pleased with the status quo you’ll need to copy the header row in 1 to rows 7 and 67, thus topping the two data sets appropriately. Next – if we’re pivot tabling – I’d delete the data in the All Years field, because keeping them in their place will risk a double-count of the patent numbers once the tables do their aggregating thing. Left as is, All Years will be regarded as just another year, and tabulated accordingly.

Next, you could treat each row to a Sparkline line chart, seeing to it that you identify C8:P65 as the data range, thus seeing to it in turn that (what is now) the B column data – bearing pre-2002 sums of patents dating across the 1963-2001 span – have been excluded from the line construction. 39 years’ worth of patents can’t be made to comport with the year-by-year counts that adjoin them, it seems to me. Select the Q8:Q65 location range, and your OK will plot some interesting, and very far from uniform, trajectories down the range, e.g. Delaware’s one-year dive in patent production last year, or South Carolina’s 140% splurge between 2008 and 2015. Overall, the lines certainly describe a cross-state, upward pull in patent yields, but again the anomalies, e.g. Louisiana,

patent1

could be deemed at least slightly provocative. Remember of course that Sparklines’ conformations are drawn relative to their ranges’ values; South Carolna’s spike cannot be matched in absolute terms to California’s, for example, and the very small patent numbers for the US territories need be understood as such, whatever their Sparklines look like.

Those qualifications understood, we could likewise apply Sparklines to the country data. Note that Czechoslovakia’s string of zeroes (following its pre-2002 2121) reflects that country’s reorganization into the Czech Republic and Slovokia, both of which do present substantive patent numbers. The country slopes are remarkably comparable, though some curious investigator might want to do some wondering about Italy’s slump in 2007 and its subsequent bounce back.

And now I’d like to perform a bit of a back-track. Earlier I had reflexively begun to prepare the patent data for some pivot tabling refinements, e.g., by offering that the empty B column should be made to disappear, and by filing the usual brief for a data-set row header. But we – or I – need to ask: given the nature of the rows comprising the data set(s), what might a pivot table really teach us? We need to remind ourselves that pivot tables are, in the first instance, instruments of aggregation, and there doesn’t appear to be much among the state/territory data to aggregate – no recurring, labelled entries there to submit themselves for consolidation, at least not as the data presently stand.

It might be possible, for example, to append a Region field to the country patent data by recalling that B column we had so unblinkingly eliminated, and devising a set of items meant to recur – e.g., Europe, Africa, Oceania, etc. – but that device would impose other problems on the set, those hinted at in my discussion of dataset reconstruction (which perhaps could be reviewed if you’ve decided to pivot table the data anyway) that I’ve cited many times, including now.

So given the data as we’ve received them, and appreciating that rows 1-4 have already done some of the aggregating work for us, the Sparkline strategy might stand as among the most trenchant services we could do for the data – though that superlative leaves me hesitant, too.

But if we’re Sparklining, there’s another subtle point to file away in your presentational rolodex that needs to be entertained. I had earlier advised that the erratic line breakages inflicted on the labels in the A column by the sheet’s Wrap Text effect should be smoothed away by simply turning off the wraps; and in fact that’s a near-imperative move for the Sparklines. That’s because the wrapped text feature by definition heightens the rows in whose text it wraps; and because, and unlike conventional Excel charts, Sparklines are cell-bound, they’re going to be heightened too. And the resulting distortion will skew the relation of the X to the Y axis.

I like that observation. Can I at least copyright it?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: