Dorward

Advogato Data Recovery

13 January 2005

My account vanished a few days ago, I've been hoping that waiting a few days would see it restored from backup, but then I noticed that I was not alone and that recreating the account would get my data back.

So, here I am with a new account very similar to the old one, but lacking all the certification.

A couple of days ago, zeenix mentioned he had overwritten his diary (and I would have responded sooner if not for the lost account issue).

The good news is that I archive the HTML pages I have rawdog generate from RSS et al. This means I have 88MB of HTML from all the feeds I subscribe to going back to last June. So, I thought it would be a fairly trivial operation to recover the lost entries with a splash of Perl.

#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser;
use HTML::TreeBuilder;
foreach my $file_name (@ARGV) {
        my $tree = HTML::TreeBuilder->new; # empty tree
        $tree->parse_file($file_name);
        my @elements = $tree->find_by_attribute("class", "item feed-6e2302e9");
        # feed-6e2302e9 is the id of elements that a third
        # party scrapes from advogato recent log and provides
        # as an RSS feed to a few people
        # Since it has all Advogato entries in it we need to
        # Parse the HTML to look at the name of the person
        # who wrote it. Advogato puts that in the title of the
        # entry.
        foreach my $node (@elements) {
                my @title = $node->find_by_tag_name('h3');
                if ($title[0]->as_text =~ /zeenix/) {
                        print $node->as_HTML;
                }
        }
        $tree->delete;
}

It seemed like a good idea, but it looks like the missing entries never made it to my feedparser, perhaps not even onto Advogato. So the results weren't great. (2013: 8 years later and I'm moving to a new CMS and don't think that data is important enough to import into it)