# Paris’ Budget Participatif: Voici le Data, Part 2

16 Nov

6,200 discrepancies can get to a guy, particularly the 6,200 uncommonly literate zero-year-olds who seem to have crept past the examiners for the Budget Paticipatif vote and had their pudgy-fingered clicks duly recorded.

OK – that diapered constituency couldn’t have done what we’ve been told they did, but how the misidentifications could be understood makes for one very good question.

Remember first of all that the peculiarly plural character of the Budget vote, in which citizens were perfectly free to vote on up to five projects – militates against a straightforward demographic summary of the actual individuals who took part in the referendum. We’ve seen this problem before, but because the same individuals with the same ids could have, and did, call upon the data one to five times variably weights their presence. And that means, for example, that any intention to learn how many discrete, voting Parisians live in the 18th Arrondissement (district) suffers from an undue complication, because Pierre from the 18th may have voted for 5 projects, even as his neighbor Amelie may have elected to ratify but three. (For a graphically-driven look at individual votes look here.)

That’s one issue, along with those zeros. First, if you peel away the now-blank row 97750 and reunite the zero records with the pack and strike up a pivot table comprising exactly one field:

Row Labels: id

You will have done what Row Labels were designed to do – enumerate unique instances of each id/voter, of which I get 24001. Now roll age into Row Labels, too, sign off on the Tabular form, and stanch its subtotals (these decisions are motivated by Tabular’s collinear effect, which levell super and subordinate labels along the same row. You also don’t really need any Grand Totals here either). The new count: 24028 rows – and that means that 27 voters have something more than dual citizenship – they have dual ages, and I’ve lost the number of my local relativity theorist.

But seriously, folks, that means 27 voters incurred a second age assignment somewhere among their one to five votes. Considered in their numeric relation to all voters, the error-bearers amount to barely, and merely, one-tenth of a percent of the electorate, and statistically speaking that’s not bad. And what that also apparently means is that all the other zero-aged voters – about 1,400, according to my stress- testing of the numbers – received only zeros across all their votes.

And so while the error-trapping yield above is scant, we might as well do something to restore those 27 multi-aged participants to the unitary value they deserve. Start by lifting Age over the great divide and settling it into the Values area, and moving to summarize the numbers therein by Max. That decision puts every voter ID in touch with his/her “largest” age, i.e. it defines the 27 by their non-zero, actual age. Then select the entire pivot table (PivotTable Tools > Options > Actions > Select > Entire PivotTable) and name the whole thing Ages.

Then click back to the source data and in F2, or the free cell hard by the first record and enter

=VLOOKUP(A2, Ages,2,FALSE)

Then copy down F (note: if your computer’s like mine, all that processing take some time). That expression matches each id in the data source with its twin in the pivot table (now operating under that Ages range name); and hence the 27 ids with zeros somewhere among their several records should experience a replacement of  those ciphers with the maximum value associated with those ids – that is, their apparently real age. Then Copy > Paste Special > Values the F results onto the corresponding entries in E (the age field), and delete the VLOOKUPs, their work having been completed. Then sort the data by age, again Largest to Smallest, and the zero-aged cells now debut at row 97792 – a felicitous demotion, as it were. In other words, we’ve refit some of the ages.

Now in fact the row numbers mean we’ve added authentic ages to 42 records, then, even as we earlier counted 27 ids suffering from that zero imputation. The answer – on assumes – is that some of those 27 had more than one zero among their multiple votes. Now you can reinstitute that blank row to send the recalcitrant zero records packing – but the row goes above 97792.

But of course we’re still left with the 455 voters falling, or crawling, somewhere between the ages of one and ten. Are they any more genuine than the zeros? I don’t know, but if you throw a pivot table at the question:

Row Labels: age (grouped in bins of ten years, for example)

Column labels: projets

Values: project (Count)

You’ll see that these petite partisans allotted greatest support to the Rendre la rue aux enfants project (approved, by the way), an idea for cordoning off play streets for the younger set, and the Cultiver dans les Ecoles educational initiative for school-based gardens. A savvy – and self-interested group. The implication, then: perhaps those ages are for real. Just implying, you understand. And as for you, Francois Hollande: keep an eye on them. Don’t say you haven’t been warned.