Software bloat can be a good thing

Remember Mozilla? Not the foundation, but the browser + news reader + email client + kitchen sink that spawned Firefox, the lean and mean browser accused of arriving at just the same kitchen sink stage while taking over the world. I think there’s at least one (maybe more) very important software development lessons to be learned here:

Extensibility and open source are keys to long term success. In the case of user developed extensions, open source is just a guarantee that what they do will not be wasted due to abandonment or outright sabotage by the software “owners”. Extensibility leverages the resources of the users (which outnumber the core developers by orders of magnitude in successful projects) to let them each build what is most useful to themselves. Ideas (and often code) can then be put into the core whenever a new extension has been shown to be particularly successful. At some point the core will probably grow to the stage where all the stitched-together code starts to be a hindrance to new development or efficient use. At that point, a new kernel (with the most fundamental features) can be extracted, while the rest of the code is split into new extensions and applications, depending on how separate they are from the remaining core. And so the cycle begins anew, with the equivalent of Thunderbird, Sunbird and tons of extensions as the offspring.

The difficulty of bug reporting

This can be pretty hard: Say the GNOME screensaver password prompt sometimes won’t accept your password, no matter how many times you try. How do you check that you haven’t mixed up your password with one of the other 500 that you have to remember if you’re not using a password bookmarklet or a password manager or you change passwords every month (You do use different passwords, don’t you?)? How do you make sure that you’re typing all those special characters right, especially since you’re using multiple keyboard layouts? Does the “Leave a message” function show you the same string that would be sent to the password checking routine? Is it possible that any other software is at fault, mangling or interrupting your input? How do you even find out which software you’re typing the password in? Could a buggy keylogger be involved? Is there any debugging output, and where can you find it? Is Caps Lock or Num Lock on (EEE laptops don’t have those handy LEDs)? Are encodings in any way involved? Where should the bug be reported? Should I ask in the official mailing list before submitting one? If so, what’s the address and do I have to be a member to post? Is there some alternative forum which is more used? Is the list even active anymore? The bug is still marked as unconfirmed (although I’ve had to killall gnome-screensaver for months), so I guess we’ll have to see…

How to teach users the command line

Now that computers are firmly in the hands of casual users, is there some way to teach them how to use the command line without throwing them head first into grep? Of course, anyone with the time and inclination (and someone to contact when they get stuck) can learn to use the shell. Also, of course, there’s an entire spectrum from those who would never touch a mouse to those who would never touch the keyboard if they could avoid it, so there must be room for shifting the bar to entry.

A modest suggestion is to, when possible, show users what is going on behind the scenes to give them an idea of how tools interact and work. Some software like TortoiseSVN, Emacs and Hugin already do this, but those are mostly expert level tools, and I guess rarely used by anyone reluctant to push the mouse into the farther recesses of the desk. Also, they show a ton of output with not much information about what to do to reproduce it. Since the user is not typing the commands herself, she won’t know which line in the output is the command and which are the output without a lot of work (or previous knowledge). So another very useful feature would be to emphasize the commands and tone down the output.

What else could be done?

Overselling Linux

Shortly after starting college I heard about the wonderful strategy / ideology of open source (hey, I’m from the sticks, OK?). As I recall, I’d never seen a machine running GNU/Linux before. Being an over-the-top Windows tweaker (Windows 98 broke completely about every two weeks), all that control sounded like gravy to a fat f***.

So in 1999 I installed Linux for the first time; SuSE 6.3 as I recall. The PC was brand spankin’ new, and I had some trouble setting up the network card. After a few days of emails*, dependency hell, man pages and make menuconfig, I went back to Windows. A friendly voice in the back of my head told me to “Please try again later.”

My next attempt was with Mandrake, which was said to be a lot more user friendly. I was told it would install all my hardware automatically. Sweet! Turns out that was the first time I heard this lie which some users perpetuate even today. I didn’t have the stomach for another dead end, so it was gone after half a day of fiddling.

At this point I’m a happy Linux user (desktops, server and laptop), only switching to Windows for games. It’s a lot more productive than Windows, but remembering my own problems I’m careful when recommending it to others. Don’t lie to newbies. They will not forget it.

Corollary 1: Don’t tell users that all hardware works on Linux (there are lists, however), and then blame the vendor for not supplying an open spec. Users care about working hardware, not excuses (Hi, Apple! Wanna downgrade my iPod for me please?)
Corollary 2: Linux software has lots of bugs. Sure, high profile software like Apache, Emacs and Firefox are close to infallible (not counting third-party extensions), but there are still crashes, wildly incoherent interfaces (look at media players, for instance), outdated or missing documentation documentation (yay FreeBSD), interoperability issues (one format per program and few translators) and more.

* At this point it would be extremely rude not to give eternal thanks to all those who have been willing to aid a stupid (and often angry) newbie to grasp Linux enough to finally switch to it for good a couple years ago. Thank you all!

vCard 3.0 validator and parser

Did you know that even the vCards listed in the official RFC are not valid? It clearly says The vCard object MUST contain the FN, N and VERSION types. Still, the example vCards are both clearly missing the N type. As somebody else remarked, releasing a format spec without some reference validator is bound to result in all sorts of invalid implementations.

After searching for a vCard validator without success, I’ve therefore started my own vCard module in Python. It tries to create an object with all the information from a vCard string, and returns what I hope are useful error and warning messages if there’s anything wrong.

Update: Added file validation – Now you can validate files with several vCards from the command line.

Install / upgrade:
sudo pip install --upgrade vcard

Validate vCard files:
vcard *.vcf

Sort blocks of text in files

Ever had to sort a file alphabetically, only to realize that you’d have to do it manually because every item that needs to be sorted is spread over more than one line? This just happened when I exported my Gmail contacts to vCard, which it turned out were sorted by formatted name (FN) instead of name (N). The result was the following script, which takes two pattern and some input, and returns the sorted output. The example returned by ./sort_blocks.py --help is exactly the code to re-sort Gmail contacts. I’d love to know if you find any bugs or possible improvements to this script. Enjoy:

#! /usr/bin/env python
# -*- coding: utf-8 -*-
## Copyright (C) 2009 CERN.
##
## Sort any multi-line block text
##
## This file is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

"""sort_blocks.py - Multiline sort of standard input

Default syntax:

./sort_blocks.py -b 'pattern' -s 'pattern' < input_file > result_file

Options:
-v,--verbose    Verbose mode
-h,--help       Print this message
-b,--bp         Block pattern (dotall multiline); used to extract blocks
-s,--sp         Sort pattern (dotall multiline); extracted to sort blocks

Example:

./sort_blocks.py -b 'BEGIN:VCARD.*?END:VCARD\\r\\n' -s '^N:(.*)$' \
< contacts.vcf > contacts2.vcf

Orders vCards in contacts.vcf by name, and puts the results in contacts2.vcf."""

import getopt
import re
import sys

class Usage(Exception):
    """Raise in case of invalid parameters"""
    def __init__(self, msg):
        self.msg = msg

def _compare_pattern(sort_pattern, text1, text2):
    """Function to sort by regex"""
    matches = [
        re.search(sort_pattern, text, re.DOTALL | re.MULTILINE)
        for text in [text1, text2]]
    text_matches = []
    for match in matches:
        if match is None:
            text_matches.append('')
        else:
            text_matches.append(match.group(1))

    return cmp(text_matches[0], text_matches[1])

def split_and_sort(text, block_pattern, sort_pattern):
    """Split into blocks, sort them, and join them up again
    @param text: String of blocks to sort
    @param block_pattern: Regular expression corresponding to the border between
    the blocks
    @param sort_pattern: Gets a subset of each block to sort by"""

    text_blocks = re.findall(block_pattern, text, re.DOTALL | re.MULTILINE)
    #print text_blocks

    text_blocks.sort(lambda x, y: _compare_pattern(sort_pattern, x, y))

    return ''.join(text_blocks)

def main(argv = None):
    """Argument handling"""

    if argv is None:
        argv = sys.argv

    # Defaults
    block_pattern = ''
    sort_pattern = ''

    try:
        try:
            opts, args = getopt.getopt(
                argv[1:],
                'hb:s:',
                ['help', 'bp=', 'sp='])
        except getopt.GetoptError, err:
            raise Usage(err.msg)

        for option, value in opts:
            if option in ('-h', '--help'):
                print(__doc__)
                return 0
            elif option in ('-b', '--bp'):
                block_pattern = value
            elif option in ('-s', '--sp'):
                sort_pattern = value
            else:
                raise Usage('Unhandled option ' % option)

        if block_pattern == '' or sort_pattern == '' or args:
            raise Usage(__doc__)

        text = sys.stdin.read()

        print split_and_sort(text, block_pattern, sort_pattern)

    except Usage, err:
        sys.stderr.write(err.msg + '\n')
        return 2

if __name__ == '__main__':
    sys.exit(main())