Delicious data loss

After many years, it looks like Delicious will never have support for excluding tags from a search, so when I saw that they had added a “Remove Tag” functionality I finally decided to get rid of all the “toread” tags from posts already tagged with “read”. This didn’t quite turn out as expected:

From: Victor Engmark
To: feedback@delicious.com
Content-Type: text/plain; charset=UTF-8

Dear Delicious Support,

For ease of searching I decided today to remove the “toread” tag from all bookmarks I have already read – That is, I went to the toread+read tags page <https://www.delicious.com/engmark/toread+read>, selected all, clicked “Remove Tag”, carefully clicked only on “toread”, and clicked Save. This I repeated for each page until there was only a single page left. During this process I discovered several bugs, here listed according to increasing severity:

When selecting a bookmark the links above (except “Rename Tag” since it’s sneaky and doesn’t actually just rename on the selected items) are “activated”. That is, they turn blue. After clicking “Save” the page would refresh, all the checkboxes would still be ticked, but the links would still be gray. Expected behavior: When bookmarks are selected (before or after an action) the links should be colored. In the case of some bookmarks it was not possible to remove the “toread” tag. I repeatedly tried, and every time the same bookmarks showed up on the page after the refresh. Expected behavior: It should be possible to remove tags from any bookmark. After selecting the bookmarks mentioned in point 2 and clicking “Remove Tag”, the “Remove Tags” dialog only displayed a single tag: “toread”. Expected behavior: This should behave like for other bookmarks, that is, show all the tags for all the bookmarks. When trying to edit individial items from the bookmarks mentioned in point 2, the “Edit Link Info” dialog doesn’t display the same tags as the toread+read tags page. For example the bookmark “Ryan let’s assume that you blend fiction and reality only on special occasions” <http://www.qwantz.com/index.php?comic=1992> has the tags “toread”, “read”, “comic” and “funny” according to the toread+read tags page, but only “RyanNorth” in the “Edit Link Info” dialog. Expected behavior: The “Edit Link Info” dialog should display the same title, URL and tags as the bookmarks list.
Somehow, instead of removing the “toread” tag Delicious removed the “read” tag! Expected behavior: Remove the tag I requested to remove, and only that.

I’ve now lost seven years worth of records of what I’ve read online! Please help me fix the state of my bookmarks. In case you can’t obtain a diff between the bookmarks of yesterday and today I can provide you with one for reference (I do daily XML backups in case of just such a situation). This procedure should be nothing more than a reversal of the tag removals done in the last 48 hours or so.

Posted to my blog
<https://github.com/l0b0/paperless/wiki/Delicious-data-loss> for
reference.

Sincerely, a long-time user.

Long story short, I’d misunderstood how the feature works: Instead of removing those tags which you leave in the tag cloud, it removes all tags which you remove from the tag cloud! I had thought it was a “remove these tags” feature, but actually it was a “keep only these tags” feature.

Restoring from the XML files was out of question (they only support HTML import now), and they were unable to provide a reference for what such an HTML file should look like and whether tags would be replaced completely (messing up the changes I did since this started) or just added to the existing ones. I guess a migration to Pinboard is finally in order.

Edit: Here’s how I ended up migrating to Pinboard, fixing the bookmarks in the process:

  1. Export delicious.html and create a backup of the file.
  2. Get the URLs and tags of the bookmarks in the XML file which used to have a “toread” tag but have changed since the fuck-up:
    git diff bookmarks.xml | \
    grep '^-.*[" ]toread[" ]' | \
    grep -Eo 'href="[^"]+".*tag="[^"]+"' | \
    sed -e 's/href="\([^"]\+\)".* tag="\([^"]\+\)"/\1 \2/' > url_tags.txt
  3. Make the URLs and tags sed-compatible:
    sed -i -e 's/[\/&]/\\&/g' url_tags.txt
  4. Replace spaces between tags with commas (otherwise “foo bar” is imported as “foo_bar”):
    while grep -q ' .* ' url_tags.txt
    do
        sed -i -e 's/\( [^ ]*\) /\1,/g' url_tags.txt
    done
  5. Loop over each of the URL tag pairs, and create a sed script from it:
    while IFS=' ' read -r -u 9 url tags
    do
        echo 's/\(HREF="'"$url"'".* TAGS="\)[^"]*"/\1'"$tags"'"/;' >> replace.sed
    done 9< url_tags.txt
  6. Replace the tags:
    sed -i -f replace.sed delicious.html
  7. Verify the difference between the HTML backup and the changed file:
    meld delicious.html{.backup,}&
  8. Import in Pinboard

Delicious bookmarks search in the shell

fil.tero.us is no more, long live filterous! After much deliberation (and switching to a more productive platform & programming language), I decided to re-implement the Delicious personal bookmarks search page in Python for the *nix shell. The result is more security (I don’t store your bookmarks for you), more uptime, and more opportunity for parsing with shell tools such as less, wc, sort and grep.

  1. Install / upgrade: sudo apt-get install libxslt1-dev && sudo easy_install -U filterous
  2. Download your bookmarks
  3. Usage: filterous --help