Archive for March, 2009

JobcentreProPlus, tricky geocoding and unreliable datasets 26th Mar 09

One of the problems with working with large datasets — especially when you’re scraping them — is that they don’t always work the way one might think.

We’ve recently had reports that JobcentreProPlus.com turns up jobs that aren’t close to the postcode that the user entered when they started their search. We’ve done a bit of digging, and turned up two problems. Unfortunately, neither is easily fixable.

The first problem is that JobcentrePro’s website doesn’t expose very good location data. It’s often as little as “Camden Town, London” or “Sevenoaks, Kent”. For this to be useful, we need to convert it to a latitude and longitude, so we can see if it’s near the postcode you enter when you start a search.

This process is called geocoding, and it’s an inherently error-prone process. There’s often no way to tell the the difference between places with similar names. Usually, it works well enough, but sometimes, it’ll generate a result that’s unexpected: in real terms, you see a search result for a job in Glasgow when you were searching for things in London.

There’s not a lot we can do about this. If JobcentrePlus included better geographical information in their listings — like a postcode, or a latitude/longitude — we wouldn’t have to geocode things, which would be a great improvement.

Unfortunately, in this case, it gets more complicated. The second problem is that the JobcentrePlus database (which also drives their service!) doesn’t store good location data. Sometimes the location refers to the address of the Jobcentre shop. Sometimes, it’s the agency advertising the job. Sometimes, it’s the employer’s head office, but not the actual building you’d be working in if you took the job.

In summary: the way we’re forced to gather data introduces errors, and the underlying dataset has quite a few errors to begin with.

Despite this, we still think JobcentreProPlus.com is useful. Most of the time, the job will in fact be near the jobcentre, the employer’s head office or the job agency. That’s why our “distance from postcode” field defaults to 10 miles — we’re confident that that’ll be right, most of the time.

The bottom line is that the quality of our site is completely dependent on the quality of the underlying data. Until that data is better, there’s not much we can do to improve things — but we’re not too worried. From a plain reading of search results, we think we’re doing ok. This search for stuff in London returns mostly stuff that, according to the job ad, is in London.

We think it’s good enough to be useful, and that’s really our only goal.

The Office of National Statistics and Postcodes 12th Mar 09

Here’s a story from FreeOurData which is, quite frankly, incredible. The Office of National Statistics, in preparing for the next census, has found that the postcode databases offered by the Royal Mail and Ordnance Survey aren’t accurate enough for their purposes. Their solution: to build their own database. This is fair. The postcode database is not amazingly accurate, and ONS have different requirements anyway.

Unfortunately, Royal Mail and Ordnance Survey make good money from selling the postcode databases to other organisations. These datasets are very valuable: you’ve probably made use of them whenever you’ve put your postcode into a website. Royal Mail and Ordnance survey did not — apparently — like the idea of ONS making another postcode database with which they’d presumably have to compete. So, rather than take that nice dataset and do useful things with it — like giving it back to us taxpayers — the ONS have pledged to build the database, use it for the census, and then destroy it.

Postcode databases are almost a holy grail. Of all the datasets in the country, liberating the postcode database for free reuse would probably create more value than any other. The thought of spending £12m on a new, super-accurate postcode database and then destroying it is wasteful, a huge missed opportunity and to be frank, completely idiotic.

We implore you: don’t do it.

Rewired State: JobcentreProPlus 8th Mar 09

On Saturday I was at RewiredState. A bunch of geeks got together to build things. We wanted to show government how it’s done!

rewired state

At the end of the day, we each got two minutes to present what we’d done to each other, and an assemblage of government types. People did some really cool stuff, from Rob McKinnon & co’s Compani.es, which is the website that Companies House ought to have, to a reimplementation of ActivePlaces. They scraped this multimillion pound website, got all their data, and then did with it in an afternoon what the site hasn’t managed to do with a massive budget and years of time. Great stuff. Emma Mulqueeny’s written some more about the day, and the other hacks.

Sam Smith and I got together to do a project. Given the current economic malaise, it’s quite important for people to be able to find jobs, and a little birdy turned us on to the fact that the JobCentre Plus site really isn’t good. In fact, it’s quite painful. To get any jobs out of it at all, you have to fill in 4 reasonably large forms. Once you have some jobs to look at, you can’t do anything with them. There’s no RSS, you can’t get email alerts for new jobs, and you can’t bookmark jobs you’re interested in, because their URLs don’t work properly. The next time you want to find jobs, you have to go through the whole ordeal again. Bleh.

jobcentre pro plus

Our task was to make this better. Sam wrote some scrapers to pull down Jobcentre’s data — which was no mean feat in itself — and I made a website to display it. It’s a bit rough and ready, but it works. You can go to www.jobcentreproplus.com, search for jobs in your area, view them, bookmark them, get email alerts, subscribe in your feed reader and use the API to search and display jobs on your own site. Everything that the real site should do and doesn’t.

We didn’t realise it at the time, but there were prizes for the hacks that the organisers liked the most. Rather suprisingly — given the very high quality of all the other projects — Sam and I won!

We’re really glad that they liked it, and we hope you will too. Have a look, and let us know what you think.

The UKGovWeb Teacamp 7th Mar 09

On Thursday, we ran this month’s UKGovWeb Teacamp — a strangely named event that brings together civil servants and contractors working in e-comms and digital engagement with each other, and anyone else who’s interested and wants to come along to talk about government and the web.

People at the Teacamp

This month, Jenny came along to talk to people about monitoring online news stories using free online tools. It turns out that iGoogle, in comination with Google Reader and Google News, can provide a fairly powerful solution for monitoring news. It can pull in stories based on complex searches, display them, allow them to be shared with a team and flagged for further action. She’s even got it set up to send subject-specific emails containing relevant content to different business areas within the department. Cool stuff.

About 40 people turned up to meet, network and listen to Jenny, which was great. Next month, Tom Steinberg, the Director of MySociety, will be along to talk about something. This will doubtless be very interesting.

We’ll be there — do join us if you can, at Cafe Zest on the top floor of House of Fraser on Victoria St on the 2nd April.

ConsultationXML Update 5th Mar 09

Mark Little kindly reported some bugs in the ConsultationXML distribution. The INSTALL file was missing a couple of salient details:

  • ConsultationXML requires the PHP HTMLTidy extension
  • The pdf directory in ConsultationXML’s root directory needs to be writeable.

We had also unwittingly left some Javascript in the codebase, which was responsible for displaying the welcome page that you’ll probably have seen in the sandbox. This isn’t supposed to be a part of the software — only the demo — so we’ve removed that.

For more information about ConsultationXML, or to download the new version, head over to the Labs page.

Categories

Recommended reading

  • A selection of interesting links. Refresh to see more
  • Neil Williams Interesting and useful writings from within e-comms in a large, central government department
  • Paul Clarke Insightful, sensible and useful.
  • Digigov @ COI These guys set cross-government digital policy. Lots of interesting things.
More »