The wraps come off data.gov.uk!

October 1st, 2009

The UK’s version of data.gov, ably put together by the Cabinet Office, has just launched in private beta. We got to have a sneak peak, and it’s great!

data

The site is a blend of the US’s equivalent, data.gov, and Directgov | Innovate. It’s got a listing of available data packages, powered by the Comprehensive Knowledge Archive Network, and user-generated lists of apps and new ideas. This is just right: the data you need, combined with a way to promote the things you make and a place to get ideas if you’ve got itchy typing fingers but lack inspiration.

It’s not perfect. Conspicuously missing is an organised way to browse data sets: but that’s coming, along with some other tweaks and twiddlings that’ll improve the site’s usability.

The site is powered by Drupal, with packages catalogued and hosted by CKAN. Meanwhile, data.gov.uk hosts a data store powered by Talis that can scale to 100 billion triples and is hosted on Amazon EC2. The system is federated, so departments can add and control their own data, lots of which is available as RDF, with the remainder downloadable in spreadsheet form.

Speaking of spreadsheets, they’ve even written an app that departments can deploy in-house to convert spreadsheets into RDF (kudos to John Sheridan!) which makes it much easier for departments to produce structured, linked data.

This is all working now, and was put together by the team at the Cabinet Office in the last three months. This is a massive achievement, and it sounds like it’s just the beginning: they have big plans. User submissions for new datasets. Metadata to describe provenance. More data sets on the site. More data as RDF. Organised browsing for packages. Source code releases. The list goes on.

This is such an encouraging thing to see. No expensive procurement exercises for clunky, bespoke sites: instead, we have the right tools for the job, joined together. Simple things that do one job well, combined to form a more complex whole. It’s the Unix philosophy in action.

This is how all Government IT should work.

Our hearty congratulations go out to the team at the Cabinet office, with special thanks to Richard Stirling for spilling some of the beans. I had lots of questions and nitpicks, and every single one of them was answered reassuringly.

They’ve got a plan, and it’s a good one.

Andrew Stott — the new Director of Digital Engagement

May 13th, 2009

I was slightly bemused when the Cabinet Office announced that it was going to create a new £160k position for the Director of Digital Engagement.

The job seemed like a tall order: a list of requirements that it would be hard for any one person to fulfill, and a very big job to do with very limited resources. It seemed like a strange move to make when creating two positions at £80k a piece would probably still attract very qualified people, and give you more time and knowledge for your money.

Nonetheless, I watched with interest, and now, a tad later than expected, the position has been filled by Andrew Stott. My initial reaction was along the same lines as Emma Mulqueeny’s — more bemusement — but actually, I think Andrew is a good choice. Not who I’d have expected, but good nonetheless. As numerous people have said, he is very qualified, does have a brain the size of a planet, and has lots of experience pushing through the kind of change that we need. More than that, though, he’s practical.

I worked with Andrew briefly in 2008. One of the things we were looking at at the time was the quasi-XML version of the Civil Service Yearbook, which has lots of useful data in it. As is usually the case, though, it wasn’t proper XML — it’s variously broken, inconsistent and badly written. We spent a satisfying ten minutes at the end of the day bemoaning such irritations, and the next morning Andrew showed up at the office having spent all the previous evening writing a bunch of code to take the nasty XML and make it into useful data.

That, I think, is indicative of the man.

Scraping Civil Service Vacancies

February 5th, 2009

Back in July, we were asked to make a prototype system for the Central Office of Information and the Cabinet Office.

For some time, they have wanted to put civil service job vacancies together in one place so people can find them more easily and reuse the data in their own applications, much as we have already done for central government consultations. Because of our experience with consultations, we were asked to make a prototype that uses scraping to gather data about job vacancies. Some fantastic work is underway to make this really easy by embedding RDFA into departmental websites: our part in this project was to get our hands on some data, check out what departmental websites are doing now and see if scraping could be a useful part of the solution.

We put a prototype together over a couple of months last year — altogether, it took about three weeks of development time — and I’m very happy to say that it’s now been unveiled, and you can play with it. Though the site is live, the data isn’t current: it’s only there as an example. These were all real vacancies once, but they may have been filled by now!

The site is fairly simple. Several departmental websites were scraped to get information about their current vacancies. We took that data, cleaned it up a bit and added it to a database that can be searched. Users can look for jobs by keyword (like ‘assistant‘), location (for example, a post code or place name), or all of the above plus salary.

Google Maps & Civiscrape Mashup

If we can automatically identify the vacancy’s location, we geocode the it using RDFA on the site and GeoRSS in the Atom feed. We did this because it permits users to search for jobs by proximity to a location, and to import the feed into Google Maps and get an insta-mashup of vacancies plotted on a map — neat!

We think that the prototype has done rather well. It suffers from the same kinds of problems that systems relying on scraped data generally encounter: occasionally, data is missing, incomplete or in the wrong place. It would need some manual intervention if it were ever to become a real service. Thankfully, the work that’s happening at the moment to produce an RDFA vocabulary to define vacancies means that this approach shouldn’t be needed in the future.

We wrote up some recommendations as a result of doing this project: hopefully, we’ll be able to publish them at some point. We’ll definitely be helping to get departments on board when the time comes for them to start embedding RDFA in their web pages.