
It’s very wisdom of the masses, the power of crowds, web 2.0, wikinomics and all that – but your very own website can digitise books without trying. Using captcha’s the little squiggles many of us spend much of the time entering in can miraculously help digitise texts.
It’s all handled by the people at Carnegie Mellon University (http://recaptcha.net/ ) Basically –they provide a plug it to website that provide the captcha display for your website. It’s a few lines to hook it up and then you’re off. I did it in PHP but it available in .Net, Java and more languages than I’ll ever know.
To implement
- download the relevant library i.e recaptchalib.php
- Sign up to recatpcha for a public and private keys
- Code to display the captcha in PHP below
<?echo recaptcha_get_html($publickey, $error); ?>
4. And to deal with the post back – a bit more but not rocket surgery.
if ($_POST["submit"])
{
$resp = recaptcha_check_answer ($privatekey,
$_SERVER["REMOTE_ADDR"],
$_POST["recaptcha_challenge_field"],
$_POST["recaptcha_response_field"]);
if ($resp->is_valid)
{
.. do some processing to the valid form
}
else
{
# set the error code so that we can display it. You could also use
# die ("reCAPTCHA failed"), but using the error message is
# more user friendly
$error = $resp->error;
$displayForm = true;
}
}
I was listening to the leader of the project on a Digital Planet podcast. They use OCR to get about 80% of the words and farm the hardest 20% out to people doing captchas. Apparently they are digitising hundreds of books a day through this. I am genuinely impressed.
Sadly though – it appears that captchas generally are gradually becoming less effective . But I guess so long as your site is more secure that the next guys… (all IT security professional grind teeth at this statement – OK, I realise it’s no better than recommended that I hide my website registration under a stone).
No comments:
Post a Comment