Posts Tagged ‘mashing’
ScenicOrNot: want to play with the data? 26th Jun 09
mySociety have added a data dump to ScenicOrNot, the site we built for them a couple of months ago. It’s got the photos and all the votes for each of the 181,300 places that have received 3 or more votes since the site launched.
If you’re one of the many people who had something to say about the voting system that ScenicOrNot uses, we hope you might have some fun playing with the raw data! If you do make something, let us know how you get on…
Scraping Civil Service Vacancies 5th Feb 09
Back in July, we were asked to make a prototype system for the Central Office of Information and the Cabinet Office.
For some time, they have wanted to put civil service job vacancies together in one place so people can find them more easily and reuse the data in their own applications, much as we have already done for central government consultations. Because of our experience with consultations, we were asked to make a prototype that uses scraping to gather data about job vacancies. Some fantastic work is underway to make this really easy by embedding RDFA into departmental websites: our part in this project was to get our hands on some data, check out what departmental websites are doing now and see if scraping could be a useful part of the solution.
We put a prototype together over a couple of months last year — altogether, it took about three weeks of development time — and I’m very happy to say that it’s now been unveiled, and you can play with it. Though the site is live, the data isn’t current: it’s only there as an example. These were all real vacancies once, but they may have been filled by now!
The site is fairly simple. Several departmental websites were scraped to get information about their current vacancies. We took that data, cleaned it up a bit and added it to a database that can be searched. Users can look for jobs by keyword (like ‘assistant‘), location (for example, a post code or place name), or all of the above plus salary.
If we can automatically identify the vacancy’s location, we geocode the it using RDFA on the site and GeoRSS in the Atom feed. We did this because it permits users to search for jobs by proximity to a location, and to import the feed into Google Maps and get an insta-mashup of vacancies plotted on a map — neat!
We think that the prototype has done rather well. It suffers from the same kinds of problems that systems relying on scraped data generally encounter: occasionally, data is missing, incomplete or in the wrong place. It would need some manual intervention if it were ever to become a real service. Thankfully, the work that’s happening at the moment to produce an RDFA vocabulary to define vacancies means that this approach shouldn’t be needed in the future.
We wrote up some recommendations as a result of doing this project: hopefully, we’ll be able to publish them at some point. We’ll definitely be helping to get departments on board when the time comes for them to start embedding RDFA in their web pages.
ConsultationXML: the mashups have landed 4th Feb 09
People have already started doing interesting things with ConsultationXML. I have to admit — I couldn’t be more pleased!
Richard Goodwin took PDF attachments from the London Gazette, uploaded them to ConsultationXML, got the HTML preview output and fed it into Wordle — and voila! A Wordle map of the London Gazette’s honours list was born.
Has anyone else done interesting things? Do let us know.


