Re: Guns don’t kill people, people kill people

This humungous over-simplification of a complex problem (entropy vs. optimism) seems to crop up whenever there is talk about banning something which has both practical and malicious uses. The latest example is the discussion about a stupid, frightening, or just weird proposal to criminalize “mak[ing] network monitoring tools publicly available […]”.

I really have no idea how such issues can keep being used as examples for why guns are not “inherently bad”. I also can’t understand why non-lethal means of self protection seem to be ignored as viable alternatives. The founding fathers really messed up when they didn’t foresee more humane and efficient means of protection than guns. Using a stun gun or other non-lethal self defence methods / tools, you

  • avoid being tried for involuntary manslaughter, or worse
  • avoid basically any fatal or permanent injuries in case of accident
  • What, you need another reason?!

I’d love to keep going for a couple hundred paragraphs, if only to get this steam out, but I think my point has been made.

Oh, and if you’re looking for a way to sneak in “If guns are illegal, only criminals will have guns”: The only thing that matters if both of you have weapons of any kind, is who gets hit first. You are not Bruce Willis, and the “bad guy” is not a fucking terminator! So leave out the heavy artillery, and learn to use a stun gun (if you really need one) quickly.

Advertisements

To simplify or disambiguate, that is the question

In “Creeping featurism and the ratchet effect“, Mark Dominus discusses (among other things) how adding parentheses to an expression to disambiguate the operator precedence is a Bad Thing™. Of course, in the example where there are no operators in the parentheses – next if !($file =~ (/txt/)); – he’s right: (x) will never be more clear than x.

However, I find the argument for avoiding parentheses for disambiguation strange: Each parenthesis makes the code a little harder to read. Proper engineers should instead look up the answer whenever the precedence is ambiguous. This begs the questions: Which takes the most time?

Without parentheses, the developer will have to either know the precedence rules by heart (something even Mr. Dominus admits he doesn’t), or to check out a table in the documentation. But if Perl treats parentheses like most other languages, I’ll never have to look up that table if I’m using them. Even presuming that you’d keep the documentation in a nearby browser window when programming, I’d be hard pressed to think that it’s faster to switch application, look up the proper web page, switch back, and apply that piece of information, than simply looking at the code, and applying that piece of information directly. Even someone who’s only ever programmed in other languages would be able to understand such code without checking the manual.

Harddisk space is abundant, but developer time is not.

KittenAuth follow-up

KittenAuth is a really good subject for brainstorming about how to get a secure, usable, and accessible system for keeping bots out of public forums. I’ll describe some of my ideas and the stuff people presented in the KittenAuth comments (I can’t find the page any more), and then look for security, usability, and accessibility flaws in the different approaches.

All of the ideas are funded in the wish for an accessible system (in the WCAG / Section 508 sense), so as to include disabled users and plain text browsers, while avoiding manual work on the part of the site owner with users who need to circumvent the barriers in place. Thus we get to the basic requirement: An algorithm for creating problems which can be formulated as plain text, are simple to solve for humans, and difficult to solve for computers.

Word relations

This is the approach proposed in the previous post regarding KittenAuth: The user will be asked to select the words which match a given keyword. The headline refers to the generalization of this, where you could also ask which words “belong” to the keyword, which are newer, which denote parts of the keyword, or any other relation.

This has a serious flaw: It mandates having a huge collection of relations between words. With the advent of the semantic web and tagging, it could be that creating such collections will be necessary to get some order into the chaos of parsing human generated text. But using a public source would make it simple for spammers to fetch their own and hack around you.

You could try to keep the source secret, but that’s just security by obscurity, and could be cracked by comparing a sequence of output from your site with the available sources.

Another way of mitigating the problem would be to use a changing source, such as inferred relations between tags in e.g. Flickr or del.icio.us. But the relations may be inaccurate if created by software, the software may be acquired by spammers, such sources change slowly, and I’d assume that useful sources which change quickly are rarely of any statistically usable size.

Translation

This approach works with the assumption that computers are complete idiots when translating text. Of course, many users are not bilingual, but remember that you can also use dialects! This could be the textual equivalent of prime factorization: Present the user with a single sentence output from a dialect translator, and ask them which is the original sentence. Noo yawk, noooo yawk! :)

Sources for sentences are abundant, and computers are likely to continue being morons in linguistics, so I’d say it’s safe for now. Of course, there’s the usability issue of scratching your head when presented with Jive or Scouse, but put a bit of humor into it, and the medicine goes down.

Word unscrambling

An elusive report from “an English university”, Cambridge according to some, reports that “it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.”

Could this be used like the translation algorithm? Probably, but it would probably be difficult to generate alternative sentences which would result in the same scrambled text, and still be in that elusive area of clarity where a computer would have problems dismissing it as garbage, and a user will be able to. I’d rather suggest using a long sentence, and ask the user to type the original. I don’t have the time to back this up with some math, but I suspect that brute force finding a gramatically sound result would be something in the order of O(C*N^2), where C is the number of valid words which can be created from the scrambled text, and N is the number of words.

This assumes that the text source is kept secret or sufficiently large (Random Google results or a sentence generator), and that the user is familiar with the language in question. That said, it should be simple to create a multilingual version – Just make sure you keep the text sources apart.

Sentence unscrambling

The same as the previous algorithm, but scrambling only the sequence of the words. I believe this would be very difficult to solve for the computer, but the user may also be stumped. Unfortunately, this is probably easier to solve statistically, by using a web search engine to find typical sequences with the words in question.

Bonus idea: Vector image recognition

Bear with me for a second. You have an enormous supply of images from the web, many of which are available for your use as you wish (big dataset requirement). You could use edge detection algorithms to generate a black-and-white sketch-like version of the image (obfuscating the source while retaining the important information). You could then ask the user to match it with some words, some of which could be tags or plain text (headers, captions) from the original page.

Vector images can be changed to ASCII art on textual browsers, and a nice braille relief for blind people. So you could easily include these users. But beware: audio users would have to get an extra hardware devices, and some users may have neither vision nor touch enough to use any of these methods.

Job trends in web development

The job search service Indeed has an interesting “trends” search engine: It visualizes the amount of job postings matching your keywords the last year. Let’s see if there is some interesting information for modern web technologies there…

XHTML vs. HTML

The relation between XHTML and HTML Relative popularity of XHTML and HTML in job offers could be attributed to a number of factors:

  • XHTML is just not popular yet (1 Google result for every 19 on HTML).
  • The transition from HTML to XHTML is so simple as to be ignored.
  • The terms are confused, and HTML is the most familiar one.
  • XHTML is thought to be the same as HTML, or a subset of it.

The XHTML graph alone Popularity of XHTML in job offers could give us a hint as to where we stand: At about 1/100 of the “popularity” of HTML, it’s increasing linearly. At the same time, HTML has had an insignificant increase, with a spike in the summer months (it is interesting to note that this spike did not occur for XHTML). XHTML could be posed for exponential growth, taking over for HTML, but only time will tell.

AJAX

This is an interesting graph Popularity of AJAX in job offers: It grows exponentially, which is likely to be a result of all the buzz created by Google getting on the Web 2.0 bandwagon. Curiously, the growth rate doesn’t match that of the term “web 2.0” Relative popularity of AJAX and "Web 2.0" in job offers. Attempting to match it with other Web 2.0 terms such as “RSSRelative popularity of AJAX and RSS in job offers, “JavaScript” Relative popularity of AJAX and JavaScript in job offers, and “DOMRelative popularity of AJAX and DOM in job offers also failed. The fact that AJAX popularity seems to be irrelevant to Web 2.0 and even JavaScript popularity is interesting, but I’ll leave the creation of predictions from this as an exercise for the readers. :)

CSS

While insignificant when compared to HTML Relative popularity of HTML and CSS in job offers, the popularity of CSS closely follows that of XHTML Relative popularity of XHTML and CSS in job offers. Based on that and the oodles of best practices out there cheering CSS and XHTML on, I predict the following: When CSS is recognized for its power to reduce bandwidth use and web design costs, it’ll drag XHTML up with it as a means to create semantic markup which can be used with other XML technologies, such as XSLT and RSS / Atom.

Discussion of conclusions

The job search seems to be only in the U.S., so the international numbers may be very different. I doubt that, however, based on how irrelevant borders are on the Web.

The occurence of these terms will be slowed by such factors as how long it takes for the people in charge to notice them, understand their value / potential, and finally find areas of the business which needs those skills.

Naturally, results will be skewed by buzz, large scale market swings, implicit knowledge (if you know XHTML, you also know HTML), and probably another 101 factors I haven’t though of. So please take the conclusions with a grain of salt.

My conclusions are often based on a bell-shaped curve of lifetime popularity, according to an article / book I read years ago. I can’t find the source, but it goes something like this:

  1. Approximately linear growth as early adopters are checking it out.
  2. Exponential growth as less tech savvy people catch on; buzz from tech news sources.
  3. Stabilization because of market saturation and / or buzz wearing off.
  4. Exponential decline when made obsolete by other technology.
  5. Approximately linear decline as the technology falls into obscurity.

PS: For some proof that any web service such as Indeed should be taken with a grain of salt, try checking out the result for George Carlin’s seven dirty words Relative popularity of George Carlin's seven dirty words in job offers ;)