Stop asking your students to write command line UIs

How often have you used a UI like this?


/==========================\
| 1. List files            |
| 2. Show the current time |
| 3. Show Top              |
| 4. Quit                  |
\==========================/

Enter your selection: 

Even if you are a banker, travel agent or a medical doctor I would argue never. These groups are unfortunate enough to still have to use arcane command line interfaces to do unspeakably complex things like recording last week’s hours or reserve a ticket to your home town. But none of these systems are razor thin wrappers for simple shell tools – they are that way because they are really hard to replace. And more importantly, no employer is ever going to ask anyone to make a menu based command line UI for their shell script. It just doesn’t happen anymore. It is not a valuable skill. ASCII art is recreation, not work. So the time spent fiddling with echo and read is wasted, and could be put to better use.

There are many generally applicable skills you can teach shell newbies:

  • The Unix philosophy: writing programs that do one thing, that work with other programs, and that handle text streams. An hour of cobbling together a pipeline of grep, cut and a light sprinkling of sed can save days or weeks of data processing which might take a week to write in Python or six months in a spreadsheet.
  • On the flip side, they should know the limitations of the shell. Why while read is several orders of magnitude slower than other language equivalents. Why writing secure shell scripts is basically impossible. Or why big shell scripts are a maintenance nightmare compared to other languages.
  • Which tools are available to do what. There are so many useful tools you could probably spend a week full time just touching briefly on each of them. Check out for example BusyBox for a set of generally available tools.
  • Where to look for answers and how to ask good questions.
Advertisements

Firefox add-on to highlight insecure links

Insecure Links Highlighter does what it says on the tin. On a web page like

it adds a bright red border around any insecure links, turning it into

It supports HTTP, FTP and (by default) links with event handlers which may or may not be doing bad things. Useful for security and privacy-oriented users and web devs alike.

Shell scripting dos and don’ts

Shell scripting is like a room full of power tools: handy but dangerous.

Don’t:

  1. Build complex systems. There are just too many ways that external state can affect any piece of shell code. Do you know what your script will do if you change IFS before running it?  What if you give it a file name starting with a dash or containing a newline? How do you recover the state of the system if the script crashed somewhere in the middle? Complex shell script environments invariably end up looking like Rube Goldberg machines of chainsaws and power drills. Use languages and frameworks appropriate for the task.
  2. Expose them to the Internet. Safe input handling is just too damn hard. Unless you’re GreyCat or Stéphane Chazelas.
  3. Use eval. Don’t be evil. There are safer ways to do whatever you’re trying to do.
  4. Write portable code. (By this I mean code which works in multiple shells without change, as opposed to code which can easily be ported to other shells.) Writing portable code means limiting what language features you use and adding complexity to make sure it works the same way in all the supported shells. Because of this, the end result will be more complex and less flexible than the simplest script that supports the shell you have.
  5. Minimise the number of characters. The next maintainer will hate you.
  6. Create interactive menus. Very few shell tools like less and top only make sense interactively. Use command-line arguments instead, so that your tool will be useable both standalone and with other tools.

Do:

  1. Test everything automatically. This gives you and others the confidence that your script actually works. Bonus: Allows you to modify your code without having to test everything manually. Extra bonus: Experiencing how difficult it is to test shell scripts exhaustively will convince you to never use them for anything complex.
  2. Provide --long-names for every -s -h -o -r -t option. And if you can bear the screams of dogmatic developers, don’t support short options at all. As long as the names make sense this allows people to write readable scripts. Bonus: No wondering whether -n5f0 is two, three or four options.
  3. Use guard statements like the POSIX set -o errexit -o noclobber -o nounset. While there are some caveats to how these work, they can save a whole lot of headache. Bonus: Use -o xtrace to see what the script does in detail.
  4. Add an auto-complete script. The users will be grateful. Bonus: Gives an incentive to keep the structure of your options sane.

Howto: Timelapse video from photos

It’s amazing what shell tools can do: Flickr accepts HD video (720p, or max 1280×720) up to 30 FPS, so I tried to create one within those limits from the high resolution photos from today’s sunrise. Turns out to be incredibly easy with free tools on Linux:

  1. Resize to 720 pixels height (if your images are still wider than 1280 you’ll have to replace x720 with 1280 (without the “x“): mogrify -resize x720 *
  2. Find the width of the images, and plug that into the following command instead of 1080.
  3. Create the video: mencoder mf://* -mf w=1080:h=720:fps=30:type=jpg -ovc copy -oac copy -o output.avi

The result

vCard 3.0 validator and parser

Did you know that even the vCards listed in the official RFC are not valid? It clearly says The vCard object MUST contain the FN, N and VERSION types. Still, the example vCards are both clearly missing the N type. As somebody else remarked, releasing a format spec without some reference validator is bound to result in all sorts of invalid implementations.

After searching for a vCard validator without success, I’ve therefore started my own vCard module in Python. It tries to create an object with all the information from a vCard string, and returns what I hope are useful error and warning messages if there’s anything wrong.

Update: Added file validation – Now you can validate files with several vCards from the command line.

Install / upgrade:
sudo pip install --upgrade vcard

Validate vCard files:
vcard *.vcf

Sort blocks of text in files

Ever had to sort a file alphabetically, only to realize that you’d have to do it manually because every item that needs to be sorted is spread over more than one line? This just happened when I exported my Gmail contacts to vCard, which it turned out were sorted by formatted name (FN) instead of name (N). The result was the following script, which takes two pattern and some input, and returns the sorted output. The example returned by ./sort_blocks.py --help is exactly the code to re-sort Gmail contacts. I’d love to know if you find any bugs or possible improvements to this script. Enjoy:

#! /usr/bin/env python
# -*- coding: utf-8 -*-
## Copyright (C) 2009 CERN.
##
## Sort any multi-line block text
##
## This file is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

"""sort_blocks.py - Multiline sort of standard input

Default syntax:

./sort_blocks.py -b 'pattern' -s 'pattern' < input_file > result_file

Options:
-v,--verbose    Verbose mode
-h,--help       Print this message
-b,--bp         Block pattern (dotall multiline); used to extract blocks
-s,--sp         Sort pattern (dotall multiline); extracted to sort blocks

Example:

./sort_blocks.py -b 'BEGIN:VCARD.*?END:VCARD\\r\\n' -s '^N:(.*)$' \
< contacts.vcf > contacts2.vcf

Orders vCards in contacts.vcf by name, and puts the results in contacts2.vcf."""

import getopt
import re
import sys

class Usage(Exception):
    """Raise in case of invalid parameters"""
    def __init__(self, msg):
        self.msg = msg

def _compare_pattern(sort_pattern, text1, text2):
    """Function to sort by regex"""
    matches = [
        re.search(sort_pattern, text, re.DOTALL | re.MULTILINE)
        for text in [text1, text2]]
    text_matches = []
    for match in matches:
        if match is None:
            text_matches.append('')
        else:
            text_matches.append(match.group(1))

    return cmp(text_matches[0], text_matches[1])

def split_and_sort(text, block_pattern, sort_pattern):
    """Split into blocks, sort them, and join them up again
    @param text: String of blocks to sort
    @param block_pattern: Regular expression corresponding to the border between
    the blocks
    @param sort_pattern: Gets a subset of each block to sort by"""

    text_blocks = re.findall(block_pattern, text, re.DOTALL | re.MULTILINE)
    #print text_blocks

    text_blocks.sort(lambda x, y: _compare_pattern(sort_pattern, x, y))

    return ''.join(text_blocks)

def main(argv = None):
    """Argument handling"""

    if argv is None:
        argv = sys.argv

    # Defaults
    block_pattern = ''
    sort_pattern = ''

    try:
        try:
            opts, args = getopt.getopt(
                argv[1:],
                'hb:s:',
                ['help', 'bp=', 'sp='])
        except getopt.GetoptError, err:
            raise Usage(err.msg)

        for option, value in opts:
            if option in ('-h', '--help'):
                print(__doc__)
                return 0
            elif option in ('-b', '--bp'):
                block_pattern = value
            elif option in ('-s', '--sp'):
                sort_pattern = value
            else:
                raise Usage('Unhandled option ' % option)

        if block_pattern == '' or sort_pattern == '' or args:
            raise Usage(__doc__)

        text = sys.stdin.read()

        print split_and_sort(text, block_pattern, sort_pattern)

    except Usage, err:
        sys.stderr.write(err.msg + '\n')
        return 2

if __name__ == '__main__':
    sys.exit(main())

Range arithmetic in Python

The XML 1.0 and 1.1 standards define some ranges of Unicode code points which are valid, and some “compatibility characters” which should not be used. CDS Invenio (a FOSS CMS) already has some code to clean up text to remove invalid characters, but it doesn’t remove the compatibility characters. Using the existing code for HTML 4.01 made the W3C Markup Validation Service complain, so I wanted to exclude the compatibility character ranges from the valid ranges, and get the most concise hexadecimal ranges corresponding to the resulting set to plug into a Python regular expression. Here’s the resultingsloppy and ugly code (I’ll post updated code and/or a link to the source repository if this is included at some point):

# -*- coding: utf-8 -*-
## Copyright (C) 2009 CERN.
##
## This file is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

"""Creates the minimal set of Unicode character ranges for valid XML 1.0 and 1.1
characters minus the compatibility changes"""

INCLUDE_XML10 = "#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] \
| [#x10000-#x10FFFF]"
EXCLUDE_XML10 = "[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF], \
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], \
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], \
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], \
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], \
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], \
[#x10FFFE-#x10FFFF]"

INCLUDE_XML11 = "[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]"
EXCLUDE_XML11 = "[#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F], \
[#xFDD0-#xFDDF], \
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], \
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], \
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], \
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], \
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], \
[#x10FFFE-#x10FFFF]"

def cleanup(value):
    """Prepare string for conversion to hex ranges
    @param value: String with ranges
    @return: String with ranges"""
    return value.replace('#', '0').translate(None, '[]')

def list_to_range(value):
    """Convert a list of strings (ranges and not)
    @param value: List of strings corresponding to hexadecimal numbers and
    ranges
    @return: List of numbers"""
    result = []
    for item in value:
        if item.find('-') == -1:
            result.append(int(item, 16))
        else:
            numbers = [int(hex_str, 16) for hex_str in item.split('-')]
            result.extend(range(numbers[0], numbers[1] + 1))
    return result

def range_minus(include_range, exclude_range):
    """Subtract one range from another
    @param include_range: String from http://www.w3.org/TR/xml/#charsets or
    http://www.w3.org/TR/xml11/#charsets
    @param exclude_range: Ditto
    @return: String with hex numbers and ranges"""
    include_range = cleanup(include_range)
    includes = include_range.split(' | ')

    exclude_range = cleanup(exclude_range)
    excludes = exclude_range.split(', ')

    include_numbers = list_to_range(includes)
    exclude_numbers = list_to_range(excludes)

    numbers = set([
        number for number
        in include_numbers
        if number not in exclude_numbers])
    lows = [
        number for number
        in numbers
        if number - 1 not in numbers]
    highs = [
        number for number
        in numbers
        if number + 1 not in numbers]

    result = zip(lows, highs)

    result_hex = [
        '\\U%0*X-\\U%0*X' % (8, pair[0], 8, pair[1])
        for pair in result]
    result_hex = [
        text.replace('-' + text[:10], '')
        for text in result_hex] # Single ranges

    result_hex = [
        text.replace('\\U0000', '\\u')
        for text in result_hex] # Shorten where possible

    return '\n'.join(result_hex)

print 'XML 1.0:\n' + range_minus(INCLUDE_XML10, EXCLUDE_XML10) + '\n'

print 'XML 1.1:\n' + range_minus(INCLUDE_XML11, EXCLUDE_XML11)

In case you just want the results, here you go:

XML 1.0:
\u0009-\u000A
\u000D
\u0020-\u007E
\u0085
\u00A0-\uD7FF
\uE000-\uFDCF
\uFDF0-\uFFFD
\U00010000-\U0001FFFD
\U00020000-\U0002FFFD
\U00030000-\U0003FFFD
\U00040000-\U0004FFFD
\U00050000-\U0005FFFD
\U00060000-\U0006FFFD
\U00070000-\U0007FFFD
\U00080000-\U0008FFFD
\U00090000-\U0009FFFD
\U000A0000-\U000AFFFD
\U000B0000-\U000BFFFD
\U000C0000-\U000CFFFD
\U000D0000-\U000DFFFD
\U000E0000-\U000EFFFD
\U000F0000-\U000FFFFD
\U00100000-\U0010FFFD

XML 1.1:
\u0009-\u000A
\u000D
\u0020-\u007E
\u0085
\u00A0-\uD7FF
\uE000-\uFDCF
\uFDE0-\uFFFD
\U00010000-\U0001FFFD
\U00020000-\U0002FFFD
\U00030000-\U0003FFFD
\U00040000-\U0004FFFD
\U00050000-\U0005FFFD
\U00060000-\U0006FFFD
\U00070000-\U0007FFFD
\U00080000-\U0008FFFD
\U00090000-\U0009FFFD
\U000A0000-\U000AFFFD
\U000B0000-\U000BFFFD
\U000C0000-\U000CFFFD
\U000D0000-\U000DFFFD
\U000E0000-\U000EFFFD
\U000F0000-\U000FFFFD
\U00100000-\U0010FFFD