Dorward

On resuming the blog

26 February 2009

So why this burst of activity? It can mostly be attributed to me finally writing a half decent administration interface for sBuilder.

Previously, to publish an entry required writing the text, running a script to add the file to the CMS, accessing a web interface to add keywords and a title, and then publishing.

Last week, I set up a Tumblr blog so I could write short entries without going to all that effort.

When I reached my second entry I came to the conclusion that there was not much difference between the long posts and the short ones.

Still, I actually had entries written, and at a better rate then ever before. It seemed like a good way to keep up momentum. The trick was merging the entries with my main blog.

As part of Axford (the CMS that will succeed sBuilder and homepage-nu.pl), I wrote some code to suck down my RSS and Atom feeds and stuff them into a database (hear the screams associated with the phrase “special case”, but I digress…) so I decided to reuse this to populate the old system automatically.

I ended up having to throw that code out since Tumblr has issues with it’s feeds (I’ll file a bug report later), but switching to their JSON API worked around that issue (many people try to generate RSS and Atom using templates instead of XML tools — this rarely works well).

The content was easily dealt with, so I just had to figure out URIs and meta data.

The title could be pulled from the feed, and lowercasing it, replacing non-alphabumerics with dashes, then collapsing sequences of muliple dashes gave me a filename. The path is calculated based on the date.

Keywords were a little more interesting. I don’t do much with them at the moment, but I have plans for the future. Tumblr has an interface for setting them, but I haven’t used it for the posts I have written so far.

My intention has been to use auto tagging for a while, and this seemed as good a time to start as any. Yahoo! has a webservice for extracting terms from text and a module for using it appears on the CPAN. I ended up having to write just one line of code to get the data.

The results are not perfect, but good for a rough pass. Axford will have to let authors clean them up by hand afterwards (at least for some of the code heavy entries that I write!).

With that done, the only tricky bit was dealing with the various character encodings the different parts of the system expect - for legacy reasons sBuilder expects Latin-1 input. I shall have to covert everything to UTF-8 for Axford - the ISO-8859 family of encodings doesn’t belong in the twenty first century.

In conclusion, Tumblr is letting me write blog entries without the tools getting in the way, so you’ll probably be hearing more from me in future.

(The iPhone Tumble app that lets me draft blog entries while standing on the Tube helps with that too.)