KittenAuth is a really good subject for brainstorming about how to get a secure, usable, and accessible system for keeping bots out of public forums. I’ll describe some of my ideas and the stuff people presented in the KittenAuth comments (I can’t find the page any more), and then look for security, usability, and accessibility flaws in the different approaches.
All of the ideas are funded in the wish for an accessible system (in the WCAG / Section 508 sense), so as to include disabled users and plain text browsers, while avoiding manual work on the part of the site owner with users who need to circumvent the barriers in place. Thus we get to the basic requirement: An algorithm for creating problems which can be formulated as plain text, are simple to solve for humans, and difficult to solve for computers.
This is the approach proposed in the previous post regarding KittenAuth: The user will be asked to select the words which match a given keyword. The headline refers to the generalization of this, where you could also ask which words “belong” to the keyword, which are newer, which denote parts of the keyword, or any other relation.
This has a serious flaw: It mandates having a huge collection of relations between words. With the advent of the semantic web and tagging, it could be that creating such collections will be necessary to get some order into the chaos of parsing human generated text. But using a public source would make it simple for spammers to fetch their own and hack around you.
You could try to keep the source secret, but that’s just security by obscurity, and could be cracked by comparing a sequence of output from your site with the available sources.
Another way of mitigating the problem would be to use a changing source, such as inferred relations between tags in e.g. Flickr or del.icio.us. But the relations may be inaccurate if created by software, the software may be acquired by spammers, such sources change slowly, and I’d assume that useful sources which change quickly are rarely of any statistically usable size.
This approach works with the assumption that computers are complete idiots when translating text. Of course, many users are not bilingual, but remember that you can also use dialects! This could be the textual equivalent of prime factorization: Present the user with a single sentence output from a dialect translator, and ask them which is the original sentence. Noo yawk, noooo yawk! :)
Sources for sentences are abundant, and computers are likely to continue being morons in linguistics, so I’d say it’s safe for now. Of course, there’s the usability issue of scratching your head when presented with Jive or Scouse, but put a bit of humor into it, and the medicine goes down.
An elusive report from “an English university”, Cambridge according to some, reports that “it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.”
Could this be used like the translation algorithm? Probably, but it would probably be difficult to generate alternative sentences which would result in the same scrambled text, and still be in that elusive area of clarity where a computer would have problems dismissing it as garbage, and a user will be able to. I’d rather suggest using a long sentence, and ask the user to type the original. I don’t have the time to back this up with some math, but I suspect that brute force finding a gramatically sound result would be something in the order of O(C*N^2), where C is the number of valid words which can be created from the scrambled text, and N is the number of words.
This assumes that the text source is kept secret or sufficiently large (Random Google results or a sentence generator), and that the user is familiar with the language in question. That said, it should be simple to create a multilingual version – Just make sure you keep the text sources apart.
The same as the previous algorithm, but scrambling only the sequence of the words. I believe this would be very difficult to solve for the computer, but the user may also be stumped. Unfortunately, this is probably easier to solve statistically, by using a web search engine to find typical sequences with the words in question.
Bonus idea: Vector image recognition
Bear with me for a second. You have an enormous supply of images from the web, many of which are available for your use as you wish (big dataset requirement). You could use edge detection algorithms to generate a black-and-white sketch-like version of the image (obfuscating the source while retaining the important information). You could then ask the user to match it with some words, some of which could be tags or plain text (headers, captions) from the original page.
Vector images can be changed to ASCII art on textual browsers, and a nice braille relief for blind people. So you could easily include these users. But beware: audio users would have to get an extra hardware devices, and some users may have neither vision nor touch enough to use any of these methods.