# A Tree Grows in Paris: Les Arbes d’Alignement, Part 2

6 Jul

Our nature walk through Paris’ 100,000 trees is about to get us somewhere, now that we seem to have completed the formulaic fix that should let us distill the arrondissement numbers from the locales in the Adresse field and free us to break out the tree totals by arrondissement:

=VALUE(MID(C2,FIND(” “,C2)+1,FIND(“E”,C2)-FIND(” “,C2)-1))

That expression is still good as far as it goes, but it has occurred to me since I last went to press that we might want to invite back the 10,000 bits of flora sown in the Bois de Boulogne and Bois de Vincennes, the ones we had separated from the greater pack of tree data because they had no arrondissements to report. Upon reflection, I’d allow we still could and probably should count them, and so by deleting the interposing row separating the forests from the tree we could amend the above formula thusly:

=IF(LEFT(C2,1)=”B”,LEFT(C2,FIND(“-“,C2)-2),VALUE(MID(C2,FIND(” “,C2)+1,FIND(“E”,C2)-FIND(” “,C2)-1)))

This extension to the existing formula poses a condition that asks: If an Adresse starts with the letter B -confirming a Bois adresse – then find the sequential position of its dash somewhere in its cell and subtract 2 from it, thus ordering the LEFT function to retrieve all the text segment exhibiting that length. For example – the “-” appears in the 18th position of cells featuring the Bois de Boulogne; thus we know that a 16-character LEFT extraction will return that name.

Once this new edition is copied down the now-enlarged Arr field we improvised last week our pivot table is duly posed for action:

Row Labels: Arr

Values: Arr (Count)

Arr (again, this time % of Column Total; dispense with Grand Totals)

I get:

Again, I don’t know what pretensions to completeness the census entertains, but we see the Left Bank’s 13th and the posh 16th have set aside the largest spaces for their trees. The smallish totals for the two Bois entries may be wholly complete, but they furnish cause for us to wonder just the same. The figures also need to be calibrated to arrondissement size, or course; the inconsiderable totals for the 1st and 2nd do in fact scale to their pinched areas, for example. And a correlation is available, thanks to Wikipedia’s square-kilometer enumerations of the arrondissements:

Throwing Excel’s CORREL function at all the numbers, I get .927, a massive association suggesting that Paris’ urban designers strove ardently to smooth their tree distributions across the districts as per their size. (Note, however, that the Wikipedia listings pair two area figures for the 12th and 16th, the larger incorporating the Bois de Boulognes and Bois de Vincennes respectively. The above table displays the former, smaller area totals.)

And some causes for wonder abound in the Date Plantation field, too. We’ve already taken stock of the two-thirds of all putative date stamps in there that are in fact inactive text, nullities that I assume mean to tell the analyst that these are trees for which no planting dates are available, though they may also want to let us know that some of these in fact took root prior to 1900, the year before which Excel just can’t compute. The great bulk of these dormancies show the 1700-01-01 pseudo-date, but some others insist on some curious variations, e.g., 1700-03-06. Either way, these are text. There are in addition exactly 100 trees submitting a 1/1/1900 planting date, a real date this time but one that, given the understanding that January 1 1900 is the earliest date with which Excel can work, one needs to ask what these are trying to tell us. We also are told about a tree with a circumference of 165 centimeters ostensibly planted on May 27, 2015, about a month-and-a-half ago. Some potent fertilizer’s in that turf.

But if, in spite of all, you do want to learn something about the planting dates, you first need to do something about all the text entries before you dare to group the dates by year, for example. I’d take over the column to the right of Date Plantation, call it Date or whatever, and enter in what should be I2 (remember our Arr field is in there):

=IF(ISTEXT(H2),1,H2)

And what is this formula doing? It’s telling its cell: if the entry in H2 is indeed text, replace it with a numeric 1. Otherwise, if the H2 is a truly dated, return that number). And why 1?  Because 1 is really January 1, 1900, the first date in Excel’s chronology, the one naturally receiving the very first number in the date/number sequence. Thus every 1700-01-01 will be treated as January 1, 1900, a date we’ll then decide to filter away from any grouping.

Once you’ve copied the formula down. A pivot table along these lines can take shape:

Row Labels: Date

Values: Date (Count, and group by Years alone; filter out the year 1900 and the residual-category <1/1/1900).

You’ll see a bounce in plantings starting with 1987; but given my caveat above I ‘m not sure what that means. The dates require a second – and perhaps even a third – look.

And what about tree heights and their averages, say crosstabbed (remember that term?) by arrondissement?  Here the data are all quantitative (remember that last week I split off all the  ,0 decimals that can’t be read by those insular US regional settings), but about half of them record a tree height of 0 meters – and that can’t be, particularly since many of these possess measurable circumference data. Moreover, one particularly strapping stripling in the Bois de Vincennes stands 881,818 meters tall, which puts the tree’s upper branches somewhere in the Van Allen belt. Ten others register heights in the four-figured meter range, turning the Eiffel Tower into a Lego knock-off by comparison. Verdict, then: these data don’t seem terribly usable.

And the circumference (Circonference) data aren’t much better. I’m thinking of the Platanus orientalis in the 18th arrondissement, and its 120,170 centimeter girth; at more than 12,000 meters around it could fence off the whole district, and tempt the 18th’s residents into declaring their independence from Paris. And about 16,000 trees state no circumference at all. (By the way, if you rest your mouse atop any Geo point in the source data on the Paris Open Data site, the tree associated with that point is promptly reckoned on a pop-up map.)

All of which may compel me to rewrite the brief I filed on behalf of trees in my previous post.  They’re not quite as cooperative as I might have thought. After all, even the most  vain of humans wouldn’t have the nerve to declare a waistline of 0 inches.