More on XHTML and Search Engines

29 January 2008

A few days ago, I published some results of testing how search engines interact with XHTML documents served as application/xhtml+xml.

This triggered a bit of discussion and raised a few more questions. So, a brief update on what I've been doing.

First, XHTML and Search Engines is a new page on the main part of this site, which puts all of this research in one place, so you don't need to go trawling through blog entries spread over the course of a few months.

Second, a number of years ago, I ran a website which used cute and decidedly non-standard file extensions — Google didn't index it. To see if this is still a factor, I've added new tests to see if search engines treat URLs ending with .html differently to those ending in .xhtml.

Third, since Internet Explorer doesn't support XHTML, it would probably be a mistake to show XHTML documents in search results, but these documents might be being indexed anyway. This would allow them to be added to results at the flick of a swich if support on the client side became common. In an effort to get some circumstantial evidence to see if they are doing this, I'm logging Accept headers from search engines to see what they are asking for.

At the moment, I don't have any publishable results for these tests, but that will probably change and I'll be making them available in the next few weeks.