As an interesting challenge, I wanted to output a tag cloud (aka. word cloud) for a text file using standard shell tools. The result is surprisingly fast (2 minutes to create the tag cloud for War and Peace), and surprisingly short: As you can see, less than 10 lines doing anything more complex than echo. The latest version is much more flexible, but the main work is still just some 20 lines (tr -s … and below), and it’s still fast.

If you do anything more fancy with this, I’d be interested to know about it. I’ve got a couple ideas, but I’m not sure if I’ll ever get around to them:

  • Exclude words from another file
  • Multiple word tags from another file

Example usage:
txt2cloud.sh < foo.txt > foo.xhtml

Update: The code is now on GitHub. Fork away!


2 thoughts on “Tag cloud shell script

  1. Very cool script. Little typo on line 160. Instead of bc -l, I am sure you meant wc -l and were just seeing if any was paying attention. Thanks for sharing.

    • Thanks for the tip, but actually the line is correct. What it does is calculating a logarithm using bc, which needs -l to use the correct math library. Just make sure you run it with txt2cloud.sh < file, not just with a filename at the end, or it’ll never finish.

