Tag cloud shell script

As an interesting challenge, I wanted to output a tag cloud (aka. word cloud) for a text file using standard shell tools. The result is surprisingly fast (2 minutes to create the tag cloud for War and Peace), and surprisingly short: As you can see, less than 10 lines doing anything more complex than echo. The latest version is much more flexible, but the main work is still just some 20 lines (tr -s … and below), and it’s still fast.

If you do anything more fancy with this, I’d be interested to know about it. I’ve got a couple ideas, but I’m not sure if I’ll ever get around to them:

  • Exclude words from another file
  • Multiple word tags from another file

Example usage:
txt2cloud.sh < foo.txt > foo.xhtml

Update: The code is now on GitHub. Fork away!

Advertisements