Shell scripting dos and don’ts

Shell scripting is like a room full of power tools: handy but dangerous.

Don’t:

  1. Build complex systems. There are just too many ways that external state can affect any piece of shell code. Do you know what your script will do if you change IFS before running it?  What if you give it a file name starting with a dash or containing a newline? How do you recover the state of the system if the script crashed somewhere in the middle? Complex shell script environments invariably end up looking like Rube Goldberg machines of chainsaws and power drills. Use languages and frameworks appropriate for the task.
  2. Expose them to the Internet. Safe input handling is just too damn hard. Unless you’re GreyCat or Stéphane Chazelas.
  3. Use eval. Don’t be evil. There are safer ways to do whatever you’re trying to do.
  4. Write portable code. (By this I mean code which works in multiple shells without change, as opposed to code which can easily be ported to other shells.) Writing portable code means limiting what language features you use and adding complexity to make sure it works the same way in all the supported shells. Because of this, the end result will be more complex and less flexible than the simplest script that supports the shell you have.
  5. Minimise the number of characters. The next maintainer will hate you.
  6. Create interactive menus. Very few shell tools like less and top only make sense interactively. Use command-line arguments instead, so that your tool will be useable both standalone and with other tools.

Do:

  1. Test everything automatically. This gives you and others the confidence that your script actually works. Bonus: Allows you to modify your code without having to test everything manually. Extra bonus: Experiencing how difficult it is to test shell scripts exhaustively will convince you to never use them for anything complex.
  2. Provide --long-names for every -s -h -o -r -t option. And if you can bear the screams of dogmatic developers, don’t support short options at all. As long as the names make sense this allows people to write readable scripts. Bonus: No wondering whether -n5f0 is two, three or four options.
  3. Use guard statements like the POSIX set -o errexit -o noclobber -o nounset. While there are some caveats to how these work, they can save a whole lot of headache. Bonus: Use -o xtrace to see what the script does in detail.
  4. Add an auto-complete script. The users will be grateful. Bonus: Gives an incentive to keep the structure of your options sane.

WordPress.com blog daily backup

WordPress.com Blog Export, The Next Generation is now on GitHub! Please go there for any future updates (and more export/backup scripts).

Based on the following documents:

cvs2git2svn

After discovering Ohloh, cleaning up and publishing repositories of yore seemed like a good idea. One of them was established back in the CVS newbie days, and contained lots of external binaries – Not the kind of thing you want to version control. Having used CVS, Subversion and Git (in that order), there was only one choice: Interactive rebase with Git. Also, the software was created while at CERN, so it should continue to be hosted there. And they had started a Subversion service in the meantime, so it was time to upgrade as well.

These instructions should fit for any CERN project, and can easily be modified to fit any repository. The usual warnings apply: YMMV and RTFM.

  1. Set some variables to avoid typing: svn_repo=Repository_name
    svn_user=User_with_edit_access
  2. Install the tools: sudo apt-get install cvs2svn git-core git-svn
  3. Create the cvs2git working directory: cvs2git_wd=$(mktemp -dt cvs2git.XXXXXXXXXX)
  4. Copy the contents of the repository (not a working copy) to the working directory: scp -r $svn_user@lxplus.cern.ch:/afs/cern.ch/project/svn/reps/${svn_repo}/* $cvs2git_wd. Don’t worry if /hooks is not copied – You don’t need it. If you don’t have filesystem access to the repository, you can try cvssuck. Be warned: It’s really slow.
  5. Set cvs2git global options:
    1. zcat /usr/share/doc/cvs2svn/examples/cvs2git-example.options.gz > $cvs2git_wd/cvs2git.options
    2. Modify at least ctx.username and author_transforms in $cvs2git_wd/cvs2git.options.
  6. Make the new Git repository: git_wd=$(mktemp -dt git.XXXXXXXXXX) && git init $git_wd
  7. Convert to Git (repeat for each module):
    1. Modify run_options.set_project in $cvs2git_wd/cvs2git.options
    2. Create Git import files: cd $cvs2git_wd && cvs2git --options=cvs2git.options. If you get any warnings or errors you might have to change the options again.
    3. Import to Git: cd $git_wd && cat $cvs2git_wd/cvs2svn-tmp/git-blob.dat $cvs2git_wd/cvs2svn-tmp/git-dump.dat | git fast-import
  8. Make a backup in case the rest goes hairy.
  9. If you need to (which was kind of the point of this exercise), do an interactive rebase from the first commit: git rebase -i $(git log --format=%H | tail -1).
  10. git-svn needs at least one commit to be in the Subversion repository: svn_wd=$(mktemp -dt svn.XXXXXXXXXX) && svn co --username $svn_user svn+ssh://${svn_user}@svn.cern.ch/reps/${svn_repo} $svn_wd && cd $svn_wd && touch .temp && svn add .temp && svn ci -m "git-svn dummy commit"
  11. Convert to Subversion:
    1. Prepare git-svn repository: git2svn_wd=$(mktemp -dt git2svn.XXXXXXXXXX) && git svn clone --username $svn_user svn+ssh://${svn_user}@svn.cern.ch/reps/${svn_repo} $git2svn_wd && cd $git2svn_wd
    2. Get Git commits: git fetch $git_wd
    3. Apply Git commits as master branch: git branch tmp $(cut -b-40 .git/FETCH_HEAD) && git tag -am "Last fetch" last tmp && first_commit=$(git log --format=%H | tail -1) && git checkout $first_commit . && git commit -C $first_commit
    4. Apply Git commits: git rebase master tmp && git branch -M tmp master
    5. Check if this works : git svn dcommit --rmdir --find-copies-harder --dry-run
    6. If it does, you’re good to go: git svn dcommit --rmdir --find-copies-harder

If the last step fails, the easiest way to continue is just to remove all commits from the Subversion repository, fix the Git repository, and restart at step 10.

Tag cloud shell script

As an interesting challenge, I wanted to output a tag cloud (aka. word cloud) for a text file using standard shell tools. The result is surprisingly fast (2 minutes to create the tag cloud for War and Peace), and surprisingly short: As you can see, less than 10 lines doing anything more complex than echo. The latest version is much more flexible, but the main work is still just some 20 lines (tr -s … and below), and it’s still fast.

If you do anything more fancy with this, I’d be interested to know about it. I’ve got a couple ideas, but I’m not sure if I’ll ever get around to them:

  • Exclude words from another file
  • Multiple word tags from another file

Example usage:
txt2cloud.sh < foo.txt > foo.xhtml

Update: The code is now on GitHub. Fork away!