SVG example code

While trying to reduce the size of the CERN Central Library bookmark and getting away from the messy Inkscape SVG, I’ll post the resulting parts separately.

Advertisements

Library bookmark redesign

Here’s a little hobby project that I’ve been working on at the CERN Central Library. Instead of the familiar blue bookmark with only a title, the idea is to add anything that can be useful to library guests (and even strangers) that will fit inside the space of the bookmark. Hopefully it can give more people an idea of what we can do for them, what they can do on their own, how to find us and provide a simple way to send feedback.

References:

Work overview (italicized items are unfinished or undecided):

  • Title
    • Replace “CERN” with the official logo? Doesn’t look very good, and we wouldn’t be respecting the 25% margin rule.
  • Logos:
    • CERN
    • Library?
  • Due date, style:
    • Date stamp
      • Pros: Very fast to mark, usually good readability
      • Cons: Need to get one :)
    • Circle month + date
      • Pros: Fast to mark, medium readability
      • Cons: Computer science-y?
    • Text boxes
      • Pros: Easy to distinguish month from day
      • Cons: Legibility depends on the writer, slow, visual noise
    • Open space
      • Pros: Clean design with open space
      • Cons: Legibility depends on the writer, slow, could be too close to other text
  • Location code 52 1-052
  • Main web page
  • Email address
    • Desk
    • ILL?
  • Phone
  • Fax
  • Opening hours
  • Staffed hours
  • Other links:
  • CERN map:
    • Buildings
    • Main roads
    • Landmarks
      • Logos where possible, text elsewhere
      • Library
      • Restaurants
      • Reception
      • Entrances
      • P1
      • ATLAS
      • Globe
      • Hostels
    • Walking directions inside buildings
    • North arrow
  • Library map:
    • Both floors
    • Desk
    • Shop
    • Computers
      • Mark OS with logos
    • News shelf
    • Number ranges for shelves
    • Outdoor area
    • A/V equipment
    • Theses drawers
    • Reference section
    • Paper cutter
    • Printers
    • Copy machines
    • Return / delivery box
  • Feedback form
  • CC-BY license

TED.com bloat

If you’re a TED.com user, I’m pretty sure you’ve noticed the slow page loads compared to … Well, just about any other site out there. I’ve sent some feedback (below), and I’m hoping you’ll help out as well by suggesting general and specific improvements.

Hello,

While your web site is some of the best content collections I’ve ever come across, the style sheets / scripts are so huge as to require the full attention of a Pentium IV 3 GHz CPU for several seconds for every page displayed. 122 KB of CSS and 259 KB of JavaScript is massive, even today.

As a first fix, I’d suggest to use some of the online tools to compress CSS and JavaScript. Also, with 8 years of web development behind me (3 professionally), I’m confident that you can reduce the amount an order of magnitude without losing the overall look and feel of the site.

Thank you for your time and magnificent content!

PS: I’ve asked for feedback, and I’ll post it here if I receive any.

Properly formatting <del> and <ins>

Here are some alternative ways to format <del> and <ins>, with explanations. I’d love to hear your own improvements. Solution so far.

The most straightforward solution is just using a space character between the elements:
Speling Spelling is hard.

However, this is not semantically correct, since the space inside is not part of the text.

You could insert a space character using CSS (del + ins:before {content: ' ';}):
Speling Spelling is hard.

Now the markup is semantically correct, but the content shown to the user is worse: Both display and semantics are wrong.

Inserting a margin between the elements should do the trick (del + ins {margin-left: 0.3em;}):
SpelingSpelling is hard.

This should be semantically correct, but the width of the space character can depend on the browser implementation and the font, so make sure you check the results.

KittenAuth follow-up

KittenAuth is a really good subject for brainstorming about how to get a secure, usable, and accessible system for keeping bots out of public forums. I’ll describe some of my ideas and the stuff people presented in the KittenAuth comments (I can’t find the page any more), and then look for security, usability, and accessibility flaws in the different approaches.

All of the ideas are funded in the wish for an accessible system (in the WCAG / Section 508 sense), so as to include disabled users and plain text browsers, while avoiding manual work on the part of the site owner with users who need to circumvent the barriers in place. Thus we get to the basic requirement: An algorithm for creating problems which can be formulated as plain text, are simple to solve for humans, and difficult to solve for computers.

Word relations

This is the approach proposed in the previous post regarding KittenAuth: The user will be asked to select the words which match a given keyword. The headline refers to the generalization of this, where you could also ask which words “belong” to the keyword, which are newer, which denote parts of the keyword, or any other relation.

This has a serious flaw: It mandates having a huge collection of relations between words. With the advent of the semantic web and tagging, it could be that creating such collections will be necessary to get some order into the chaos of parsing human generated text. But using a public source would make it simple for spammers to fetch their own and hack around you.

You could try to keep the source secret, but that’s just security by obscurity, and could be cracked by comparing a sequence of output from your site with the available sources.

Another way of mitigating the problem would be to use a changing source, such as inferred relations between tags in e.g. Flickr or del.icio.us. But the relations may be inaccurate if created by software, the software may be acquired by spammers, such sources change slowly, and I’d assume that useful sources which change quickly are rarely of any statistically usable size.

Translation

This approach works with the assumption that computers are complete idiots when translating text. Of course, many users are not bilingual, but remember that you can also use dialects! This could be the textual equivalent of prime factorization: Present the user with a single sentence output from a dialect translator, and ask them which is the original sentence. Noo yawk, noooo yawk! :)

Sources for sentences are abundant, and computers are likely to continue being morons in linguistics, so I’d say it’s safe for now. Of course, there’s the usability issue of scratching your head when presented with Jive or Scouse, but put a bit of humor into it, and the medicine goes down.

Word unscrambling

An elusive report from “an English university”, Cambridge according to some, reports that “it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.”

Could this be used like the translation algorithm? Probably, but it would probably be difficult to generate alternative sentences which would result in the same scrambled text, and still be in that elusive area of clarity where a computer would have problems dismissing it as garbage, and a user will be able to. I’d rather suggest using a long sentence, and ask the user to type the original. I don’t have the time to back this up with some math, but I suspect that brute force finding a gramatically sound result would be something in the order of O(C*N^2), where C is the number of valid words which can be created from the scrambled text, and N is the number of words.

This assumes that the text source is kept secret or sufficiently large (Random Google results or a sentence generator), and that the user is familiar with the language in question. That said, it should be simple to create a multilingual version – Just make sure you keep the text sources apart.

Sentence unscrambling

The same as the previous algorithm, but scrambling only the sequence of the words. I believe this would be very difficult to solve for the computer, but the user may also be stumped. Unfortunately, this is probably easier to solve statistically, by using a web search engine to find typical sequences with the words in question.

Bonus idea: Vector image recognition

Bear with me for a second. You have an enormous supply of images from the web, many of which are available for your use as you wish (big dataset requirement). You could use edge detection algorithms to generate a black-and-white sketch-like version of the image (obfuscating the source while retaining the important information). You could then ask the user to match it with some words, some of which could be tags or plain text (headers, captions) from the original page.

Vector images can be changed to ASCII art on textual browsers, and a nice braille relief for blind people. So you could easily include these users. But beware: audio users would have to get an extra hardware devices, and some users may have neither vision nor touch enough to use any of these methods.