Dorward

Spambots that Drink Coffee

02 February 2006

A long time ago, in a galaxy far, far away… OK, in Reading. I used to work on a bug ugly pile of PHP. It didn't do much for my liking of the language, and its advocates have tended to bring the worst out in me ever since.

However, I never got around to unsubscribing from the mailing list, and I still read it from time to time. So when the topic of hiding email addresses from spambots came up, I couldn't resist (I'm also a big advocate of accessibility, and I'm yet to find a good way to hide information from humans, but not from "evil robots").

One thing led to another and the inevitable "Write it out to the screen with complicated JavaScript" option came up, in this case using Enkoder.

<script language="javascript">
function hiveware_enkoder(){var i,j,x,y,x=
"x=\"783d227a3f24327a343f375e2465383436343835686738383536393837333839663838" +
"38373b383867363936363234386736393839683939343438393b3939383633383937343438" +
"3a363434346735386566383833373434326738393a37393834643835376538343868353866" +
"3337356567343434343834663b3835336438353b3238356564395e24363d387b683f352963" +
"29383d366838713374392a386b383f3b32383d366b363e327a3830366e3867687039693476" +
"396a393d386b332d393f3434382b367d347b672d383f667738703767347567653963377238" +
"67642a372965273429342d347a32303975367738643b7539763674382a656b382e3734352b" +
"662b373d6521347b243d6c3f6778636e2a7a30656a637443762a322b2b3d7a3f7a30757764" +
"7576742a332b3d7b3f29293d6871742a6b3f323d6b3e7a306e677069766a3d6b2d3f342b7d" +
"7b2d3f7a307577647576742a6b2e332b3d216871742a6b3f333d6b3e7a306e677069766a3d" +
"6b2d3f342b7d7b2d3f7a307577647576742a6b2e332b3d217b3f7b307577647576742a6c2b" +
"3d223b793d27273b783d756e6573636170652878293b666f7228693d303b693c782e6c656e" +
"6774683b692b2b297b6a3d782e63686172436f646541742869292d323b6966286a3c333229" +
"6a2b3d39343b792b3d537472696e672e66726f6d43686172436f6465286a297d79\";y='';" +
"for(i=0;i<x.length;i+=2){y+=unescape('%'+x.substr(i,2));}y";
while(x=eval(x));}hiveware_enkoder();
</script>

Oh look, so good it generates HTML 3.2.

But in all seriousness, this type of "solution" always has two main issues:

  • There are bots out there gathering email addresses that can parse JavaScript.
  • There are users out there who have JavaScript turned off, or browsers that don't support it in the first place.

Just saying this wasn't enough though! Evidence was demanded!

Just one problem. I've never tried parsing JavaScript programmatically before. First stop - CPAN.

I grabbed JavaScript::SpiderMonkey as I vaguely recalled hearing good things about it and got it installed. The documentation was nice and clear, so I hacked away.

The only issue I had was that since the code was not being executed in a browser, there was no document object. It turned out to be relatively simple to create such an object and add a write() method to it.

#!/usr/bin/perl
use strict;
use warnings;
use JavaScript::SpiderMonkey;

# In a real Evil Spam Bot, 
# this would actually do something.
my $the_javascript = get_the_javascript_from_somewhere();

# Get a JavaScript interpreter ready.
my $js = JavaScript::SpiderMonkey->new();
$js->init();

# Browsers can document.write
# This script should too
my $document = $js->object_by_path("document");

# We need somewhere to document.write to
my $extracted_html;
$js->function_set("write", sub { 
        $extracted_html .= join('', @_) 
    }, $document);

# Execute the JavaScript
my $rc = $js->eval($the_javascript);

# Output the retrieved HTML
print $extracted_html;

Now this is a simple proof of concept and won't defeat all situations. For example, it won't handle techniques to manipulate the DOM rather than document.writing. It also just dumps the generated HTML to standard out rather then trying to parse it and find the email address.

That said, it did only take about 15 minutes to write the script (from a "Never used the module before" starting point), and I can't imagine that adding such features would prove that difficult.

Finally, a point on morals. Am I helping spammers defeat attempts to hide email addresses from them? I don't think so, the problem is too trivial.