Tag cloud shell script

As an interesting challenge, I wanted to output a tag cloud (aka. word cloud) for a text file using standard shell tools. The result is surprisingly fast (2 minutes to create the tag cloud for War and Peace), and surprisingly short: As you can see, less than 10 lines doing anything more complex than echo. The latest version is much more flexible, but the main work is still just some 20 lines (tr -s … and below), and it’s still fast.

If you do anything more fancy with this, I’d be interested to know about it. I’ve got a couple ideas, but I’m not sure if I’ll ever get around to them:

  • Exclude words from another file
  • Multiple word tags from another file

Example usage:
txt2cloud.sh < foo.txt > foo.xhtml

Update: The code is now on GitHub. Fork away!

About these ads

2 thoughts on “Tag cloud shell script

  1. Very cool script. Little typo on line 160. Instead of bc -l, I am sure you meant wc -l and were just seeing if any was paying attention. Thanks for sharing.

    • Thanks for the tip, but actually the line is correct. What it does is calculating a logarithm using bc, which needs -l to use the correct math library. Just make sure you run it with txt2cloud.sh < file, not just with a filename at the end, or it’ll never finish.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s