As an interesting challenge, I wanted to output a tag cloud (aka. word cloud) for a text file using standard shell tools. The result is surprisingly fast (2 minutes to create the tag cloud for War and Peace)
, and surprisingly short: As you can see, less than 10 lines doing anything more complex than echo. The latest version is much more flexible, but the main work is still just some 20 lines (
tr -s … and below), and it’s still fast.
If you do anything more fancy with this, I’d be interested to know about it. I’ve got a couple ideas, but I’m not sure if I’ll ever get around to them:
- Exclude words from another file
- Multiple word tags from another file
txt2cloud.sh < foo.txt > foo.xhtml
Update: The code is now on GitHub. Fork away!