JobcentreProPlus, tricky geocoding and unreliable datasets

One of the problems with working with large datasets — especially when you’re scraping them — is that they don’t always work the way one might think.

We’ve recently had reports that JobcentreProPlus.com turns up jobs that aren’t close to the postcode that the user entered when they started their search. We’ve done a bit of digging, and turned up two problems. Unfortunately, neither is easily fixable.

The first problem is that JobcentrePro’s website doesn’t expose very good location data. It’s often as little as “Camden Town, London” or “Sevenoaks, Kent”. For this to be useful, we need to convert it to a latitude and longitude, so we can see if it’s near the postcode you enter when you start a search.

This process is called geocoding, and it’s an inherently error-prone process. There’s often no way to tell the the difference between places with similar names. Usually, it works well enough, but sometimes, it’ll generate a result that’s unexpected: in real terms, you see a search result for a job in Glasgow when you were searching for things in London.

There’s not a lot we can do about this. If JobcentrePlus included better geographical information in their listings — like a postcode, or a latitude/longitude — we wouldn’t have to geocode things, which would be a great improvement.

Unfortunately, in this case, it gets more complicated. The second problem is that the JobcentrePlus database (which also drives their service!) doesn’t store good location data. Sometimes the location refers to the address of the Jobcentre shop. Sometimes, it’s the agency advertising the job. Sometimes, it’s the employer’s head office, but not the actual building you’d be working in if you took the job.

In summary: the way we’re forced to gather data introduces errors, and the underlying dataset has quite a few errors to begin with.

Despite this, we still think JobcentreProPlus.com is useful. Most of the time, the job will in fact be near the jobcentre, the employer’s head office or the job agency. That’s why our “distance from postcode” field defaults to 10 miles — we’re confident that that’ll be right, most of the time.

The bottom line is that the quality of our site is completely dependent on the quality of the underlying data. Until that data is better, there’s not much we can do to improve things — but we’re not too worried. From a plain reading of search results, we think we’re doing ok. This search for stuff in London returns mostly stuff that, according to the job ad, is in London.

We think it’s good enough to be useful, and that’s really our only goal.

2 Responses to “JobcentreProPlus, tricky geocoding and unreliable datasets”

  1. Anonymous

    Harry

    JobCentrePlus gets away with it because they are a monopoly.

    What do you need from us employers to enable Pro Plus to become more reliable ? Should we go for a Pro Plus on twitter and Facebook ? After all, I presume we do not really need JCP ( the official one ) at all if we were being honest. In India, a country much larger and more diverse than ours, I do not remember ever seeing a Job Centre. It must be a western concept.

    As the Chancellor is looking for £ 15,000,000,000 of efficiency savings, should your tool be one way of getting there ?

  2. Anonymous

    Hmm — yes. Can’t argue with that.

    From employers — that’s an interesting question. Given the appropriate tools they could certainly enter jobs themselves (perhaps they already do?) but that wouldn’t guarantee any improvement.

    I think it’s probably a UI problem. If the box on the page was labeled “Postcode of job location” rather than just “location”, I suspect the data would get a lot better quite quickly. The only reason the location field is set to different things is that different people have different ideas about what it should be.

    It would be interesting to know what Fujitsu would charge to make such a change.

Leave a Reply

Categories

Recommended reading

  • A selection of interesting links. Refresh to see more
  • Neil Williams Interesting and useful writings from within e-comms in a large, central government department
  • Tom Watson MP The original pioneering, blogging MP, ex-minister for “digital stuff”, a right-thinking man with a plan. Long may he last.
  • Joel on Software Insights on software, startups and technology business from the one and only Joel Spolsky
More »