Posts Tagged ‘jobs’

JobcentreProPlus, tricky geocoding and unreliable datasets 26th Mar 09

One of the problems with working with large datasets — especially when you’re scraping them — is that they don’t always work the way one might think.

We’ve recently had reports that JobcentreProPlus.com turns up jobs that aren’t close to the postcode that the user entered when they started their search. We’ve done a bit of digging, and turned up two problems. Unfortunately, neither is easily fixable.

The first problem is that JobcentrePro’s website doesn’t expose very good location data. It’s often as little as “Camden Town, London” or “Sevenoaks, Kent”. For this to be useful, we need to convert it to a latitude and longitude, so we can see if it’s near the postcode you enter when you start a search.

This process is called geocoding, and it’s an inherently error-prone process. There’s often no way to tell the the difference between places with similar names. Usually, it works well enough, but sometimes, it’ll generate a result that’s unexpected: in real terms, you see a search result for a job in Glasgow when you were searching for things in London.

There’s not a lot we can do about this. If JobcentrePlus included better geographical information in their listings — like a postcode, or a latitude/longitude — we wouldn’t have to geocode things, which would be a great improvement.

Unfortunately, in this case, it gets more complicated. The second problem is that the JobcentrePlus database (which also drives their service!) doesn’t store good location data. Sometimes the location refers to the address of the Jobcentre shop. Sometimes, it’s the agency advertising the job. Sometimes, it’s the employer’s head office, but not the actual building you’d be working in if you took the job.

In summary: the way we’re forced to gather data introduces errors, and the underlying dataset has quite a few errors to begin with.

Despite this, we still think JobcentreProPlus.com is useful. Most of the time, the job will in fact be near the jobcentre, the employer’s head office or the job agency. That’s why our “distance from postcode” field defaults to 10 miles — we’re confident that that’ll be right, most of the time.

The bottom line is that the quality of our site is completely dependent on the quality of the underlying data. Until that data is better, there’s not much we can do to improve things — but we’re not too worried. From a plain reading of search results, we think we’re doing ok. This search for stuff in London returns mostly stuff that, according to the job ad, is in London.

We think it’s good enough to be useful, and that’s really our only goal.

Rewired State: JobcentreProPlus 8th Mar 09

On Saturday I was at RewiredState. A bunch of geeks got together to build things. We wanted to show government how it’s done!

rewired state

At the end of the day, we each got two minutes to present what we’d done to each other, and an assemblage of government types. People did some really cool stuff, from Rob McKinnon & co’s Compani.es, which is the website that Companies House ought to have, to a reimplementation of ActivePlaces. They scraped this multimillion pound website, got all their data, and then did with it in an afternoon what the site hasn’t managed to do with a massive budget and years of time. Great stuff. Emma Mulqueeny’s written some more about the day, and the other hacks.

Sam Smith and I got together to do a project. Given the current economic malaise, it’s quite important for people to be able to find jobs, and a little birdy turned us on to the fact that the JobCentre Plus site really isn’t good. In fact, it’s quite painful. To get any jobs out of it at all, you have to fill in 4 reasonably large forms. Once you have some jobs to look at, you can’t do anything with them. There’s no RSS, you can’t get email alerts for new jobs, and you can’t bookmark jobs you’re interested in, because their URLs don’t work properly. The next time you want to find jobs, you have to go through the whole ordeal again. Bleh.

jobcentre pro plus

Our task was to make this better. Sam wrote some scrapers to pull down Jobcentre’s data — which was no mean feat in itself — and I made a website to display it. It’s a bit rough and ready, but it works. You can go to www.jobcentreproplus.com, search for jobs in your area, view them, bookmark them, get email alerts, subscribe in your feed reader and use the API to search and display jobs on your own site. Everything that the real site should do and doesn’t.

We didn’t realise it at the time, but there were prizes for the hacks that the organisers liked the most. Rather suprisingly — given the very high quality of all the other projects — Sam and I won!

We’re really glad that they liked it, and we hope you will too. Have a look, and let us know what you think.

Scraping Civil Service Vacancies 5th Feb 09

Back in July, we were asked to make a prototype system for the Central Office of Information and the Cabinet Office.

For some time, they have wanted to put civil service job vacancies together in one place so people can find them more easily and reuse the data in their own applications, much as we have already done for central government consultations. Because of our experience with consultations, we were asked to make a prototype that uses scraping to gather data about job vacancies. Some fantastic work is underway to make this really easy by embedding RDFA into departmental websites: our part in this project was to get our hands on some data, check out what departmental websites are doing now and see if scraping could be a useful part of the solution.

We put a prototype together over a couple of months last year — altogether, it took about three weeks of development time — and I’m very happy to say that it’s now been unveiled, and you can play with it. Though the site is live, the data isn’t current: it’s only there as an example. These were all real vacancies once, but they may have been filled by now!

The site is fairly simple. Several departmental websites were scraped to get information about their current vacancies. We took that data, cleaned it up a bit and added it to a database that can be searched. Users can look for jobs by keyword (like ‘assistant‘), location (for example, a post code or place name), or all of the above plus salary.

Google Maps & Civiscrape Mashup

If we can automatically identify the vacancy’s location, we geocode the it using RDFA on the site and GeoRSS in the Atom feed. We did this because it permits users to search for jobs by proximity to a location, and to import the feed into Google Maps and get an insta-mashup of vacancies plotted on a map — neat!

We think that the prototype has done rather well. It suffers from the same kinds of problems that systems relying on scraped data generally encounter: occasionally, data is missing, incomplete or in the wrong place. It would need some manual intervention if it were ever to become a real service. Thankfully, the work that’s happening at the moment to produce an RDFA vocabulary to define vacancies means that this approach shouldn’t be needed in the future.

We wrote up some recommendations as a result of doing this project: hopefully, we’ll be able to publish them at some point. We’ll definitely be helping to get departments on board when the time comes for them to start embedding RDFA in their web pages.

Categories

Recommended reading

  • A selection of interesting links. Refresh to see more
  • Cabinet Office Digital Engagement Digital engagement, from the heart of government
  • Neil Williams Interesting and useful writings from within e-comms in a large, central government department
  • Joel on Software Insights on software, startups and technology business from the one and only Joel Spolsky
More »