Skip to main content

Using Fusion Tables To Get A Grip On The Big-ish Data of York City Council

My attention was drawn to the City of York Council who publish their payments to suppliers for 2012 as 'Open Data' as a collection of comma separated files ( CSVs ) which you can then import into, god forbid Excel or Google Spreadsheets.

That's very nice except, the CSV files are split into ten files. They are presumably month files ( I wonder where the other ones are?). Also, I found that the columns weren't regular - meaning that in some files, the Amount was in column 7, and column 8 in others. There is also a lot of repetitive data in the spreadsheets, making them quite big to work with. All of this makes it difficult to browse and combine the data. It's almost as if they really don't want you to read and understand it.

So I thought I'd share how I coaxed it into something more useful in terms of understanding the data. The City of York Council are of course free to do something like this if they like, it only takes a few minutes - or they could pay me a consultancy fee to help them and I might make an appearance in next year's CSVs.

Step One - Download and Combine The Spreadsheets

This is tricky. I found the easiest way to do this was download the files "by hand" and then to write a bit of python code to merge the data I needed into one CSV file. The code is here...

It creates a file called "combined.csv".

You'll notice that I only used three columns of the data and grabbed them by name ( using csv.DictReader ). You can change these values to be different columns if you want.

There is also a wonderful tool called Google Refine which fantastic for cleaning up slightly duff data. It's often the case that the thing that trips you up is data that you discover to be a bit iffy and Google Refine helps you to do some very fancy manipulations.

Step Two - Uploading To Google Fusion Table

I could have uploaded this data into a Google Spreadsheet, but spreadsheets have a limit of 400,000 cells. And so with 30,000 rows, it's easy with 10 columns to start hitting that limit very quickly.

Google Fusion tables are designed for bigger data collections that are typically more numerical. They are great at then summarising that data easily and quickly. It even can do charts of your quick and dirty summarisations. If I'm honest, my abilities in Fusion tables are poor, but I do seem to be able to muddle through well enough.

Next, upload your data. You can add details about who owns the data and where you got it from along the way.

Once it has uploaded and converted, which can take a while, you can browse your data in its raw format.

Step Three- Summarise Your Data

Then comes the clever bit, which is where you can create a Summary, like this...

... which give you this.

You can argue amongst yourselves about whether or not York City Council have deliberately obfuscated their expenses by providing such crappy files in such an unhelpful way. It's usually my default to blame lack of resources, knowledge and general incompetence before corruption, but one good thing is that it getting easier for everyone, even me, to be able to grab the data that's given and get it into a format where I can at least begin to explore it.

So can you. The data is online here.

And Beyond...

The next stage needs to be about making this data, now easily browseable, more communicative. Most of the items in the list raise more (good) questions than they answer.

Why does CYC spend a million quid a year on software licences, is that value for money considering they can barely work Excel?

Were CYC really providing open information, this data would be information that questions could be asked of. There'd be links to background information to explain exactly why nearly £2 million was spent on taxis alone (that one always catches the eye ). Last year when I also made a fusion table of the Council's expenses, @jmalexander1982 happily provided explanations of what the more immediately surprising figures were about.

Of course, York City Council might distill some of this information into infographics or charts that better communicated how well our money was being spent. Ideally, this would be interactive so that we couldn't accuse the Council of spin and manipulation, creating our own interpretations and charts of the data.

Fusion tables are great for summarising data, revealing the headlines, but less could at surfacing the interesting things at the lower end of the scale. The long tail of payments around £1,000. Ideally I'd like to throw in all the directors of the companies listed in the expenses and see what connections popped out ( if any ) and make a York-centric TheyRule. Maybe if I can get a startup grant from the council, I'll do that next year.


Popular posts from this blog

Inserting A Google Doc link into a Google Spreadsheet

This article looks at using Apps Script to add new features to a Google Spreadsheet.

At the University of York, various people have been using Google spreadsheets to collect together various project related information. We've found that when collecting lots of different collaborative information from lots of different people that a spreadsheet can work much better than a regular Google Form.

Spreadsheets can be better than Forms for data collection because:

The spreadsheet data saves as you are editing.If you want to fill in half the data and come back later, your data will still be there.The data in a spreadsheet is versioned, so you can see who added what and when and undo it if necessaryThe commenting features are brilliant - especially the "Resolve" button in comments.
One feature we needed was to be able to "attach" Google Docs to certain cells in a spreadsheet. It's easy to just paste in a URL into a spreadsheet cell, but they can often all look too si…

Writing a Simple QR Code Stock Control Spreadsheet

At Theatre, Film & TV they have lots of equipment they loan to students, cameras, microphone, tripod etc. Keeping track of what goes out and what comes back is a difficult job. I have seen a few other departments struggling with the similar "equipment inventory" problems.

A solution I have prototyped uses QR codes, a Google Spreadsheet and a small web application written in Apps Script. The idea is, that each piece of equipment ( or maybe collection of items ) has a QR code on it. Using a standard and free smartphone application to read QR codes, the technician swipes the item and is shown a screen that lets them either check the item out or return it.

The QR app looks like this.

The spreadsheet contains a list of cameras. It has links to images and uses Google Visualisation tools to generate its QR codes. The spreadsheet looks like this.

The Web Application The web application, which only checks items in or out and should be used on a phone in conjunction with a QR cod…

One-To-Many Relationship in a Google Spreadsheet

It's often the case that you want and need to be creating a database to store your data, but Google Spreadsheets are just so handy aren't they? But Google Spreadsheets are very good at relational data.

Here's an example where, you want to have one column for the name of your recipe and another for the ingredients ( comma separated ).

How you use this script is you click on the cell you want to be relational and choose the Admin > Show Relationship Editor. This opens up a dialog window showing you all the options included so far. You then alter the ingredients and it saves a comma separated list into the spreadsheet.

Here's the spreadsheet. Use File > Make a copy to see it work and rummage around in the code.

If anyone can help make the UI prettier I'd be grateful, thanks.