This document is currently an incomplete working draft published for peer preview and review.

What is a Content-type?

When a user agent (such as a web browser) makes a request to a webserver, the server sends a response. This response comes in two parts, a header and a body (which should not be confused with the <head> and <body> of an HTML document). A typical header might look something like this:

HTTP/1.1 200 OK
Date: Sun, 23 May 2004 08:49:32 GMT
Server: Apache
X-Powered-By: PHP/4.1.2
Connection: close
Content-Type: text/html; charset=iso-8859-1

The last line of this example is the Content-type. This tells the user agent what type of data the body will contain. In this case, an HTML document.

The user agent then decides what to do with the data based on this Content-type, for example, to parse it as HTML, apply a style sheet and then display it in the browser window.

Some user agents second guess the author's intentions when they recieve a document that doesn't appear to match the specified Content-type. As a result, you should not depend on the behaviour of your web browser to determine the Content-type.

Why do we use Content-types?

Computers store data in many different formats and the format chosen depends greatly on what the data is. Images, for example, might be stored as PNG, JPEG, SVG, TIFF or one of many others. Each format has its own strengths and weaknesses, but whichever one is chosen, the computer has to know what the format is in order to decode the information contained therein.

Computers use various different techniques to decide how to handle files, one of the most common being the use of file extensions. In this case, the system looks at the end of the file name, and compares the string of characters to a database. This will then tell the system what piece of software to use to open the file, for example .html might be configured to be opened with Mozilla, whilst .doc might open using Openoffice.org.

As the above technique is the one many people are used to, it is not immediately obvious to authors that the web works rather differently. On the web there are no filenames; instead we have URIs. A URI might include a filename, for example:

http://dorward.me.uk/www/content-type/index.html

However, it might not:

http://dorward.me.uk/www/content-type/

Another technique has to be used to determine how to treat the data that the user agent receives from the web server. Thus, the Content-type header.

Which Content-type should I use?

The Content-type should, naturally, reflect the the type of content in the body of the HTTP response. This may be different from the content of the file for reasons I will come to shortly.

A few common Content-types are:

Type of dataContent-type
HTMLtext/html
XHTMLSpecial
CSStext/css
JPEGimage/jpeg
PNGimage/png
GIFimage/gif

A longer list of Content-types can be found at the INNA.

Not all documents sent over HTTP come from files, some are generated on the fly by pieces of software. Notable examples include CGI and PHP scripts. The Content-type should still reflect the body of the HTTP response, not the source. Therefore the output of a typical PHP script has a Content-type of text/html. It is possible to generate other types of content from such scripts, and care must be taken to send the data with the correct Content-type.

How do I know what Content-type is being sent by my webserver?

Mozilla

Browsers in the Mozilla family tend to provide a means to see the Content-type in the Page Info screen (found under the Tools menu in FireFox).

Opera

Opera displays information about the page, including the Content-type in a tooltip if the mouse pointer is hovered over the page tab for a few seconds.

HTTP Head

HTTP Head is an online tool that will display the http headers of another http resource.

Lynx

Lynx is a text mode browser available for a number of platforms including Mac, Linux and Windows. It provides an easy technique to view the HTTP headers in a response from a webserver, just add the -head switch.

david@cyberman david $ lynx -head -dump http://dorward.me.uk/www/content-type/
HTTP/1.1 200 OK
Date: Sun, 23 May 2004 10:10:05 GMT
Server: Apache
X-Powered-By: PHP/4.1.2
Connection: close
Content-Type: text/html; charset=iso-8859-1

cURL

cURL s a command line tool for transferring files with URL syntax, and can display http headers with the -I switch.

david@cyberman david $ curl -I http://dorward.me.uk/
HTTP/1.1 200 OK
Date: Wed, 02 Jun 2004 19:29:00 GMT
Server: Apache
X-Powered-By: PHP/4.1.2
Content-Type: text/html; charset=iso-8859-1

Telnet

You can also see the http response by issuing the request manually using a telnet client. This technique doesn't actually use telnet, but allows you to issue the HTTP request by hand.

Make sure you connect to the correct TCP/IP port! For web servers this is usually port 80.

david@cyberman david $ telnet dorward.me.uk 80
HEAD /www/content-type/ HTTP/1.1
host: dorward.me.uk

Trying 212.67.207.13...
Connected to dorward.me.uk.
Escape character is '^]'.
HTTP/1.1 200 OK
Date: Sun, 23 May 2004 10:13:40 GMT
Server: Apache
X-Powered-By: PHP/4.1.2
Content-Type: text/html; charset=iso-8859-1

How do I specify which Content-type is sent by my webserver?

This depends on a number of factors, but generally if the content being served is a static file, then the Content-type is decided by the server, and if the content is being generated dynamically, then the Content-type is decided by the tool creating the document.

Apache

Apache can be configured with the AddType directive in the usual configuration place (Possibly .htaccess or /etc/apache2/conf/apache2.conf).

AddType 'text/html' .html

CGI

CGI can be written in a number of different languages, but the common factor is that the http headers must be outputted from the program before the content (and seperated from the content by a blank line).

In the Perl language, this might be achieved with:

#!/usr/bin/perl -t
use strict;

print "Content-type: text/html\n\n";
print <<EOF
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
etc. etc.
EOF
;

Or one might choose to use the CGI module. Other languages have their own methods for outputting content to standard out.

PHP

PHP outputs a text/html content-type by default, so you will probably rarely have to change it. Some circumstances demand it though, in these cases you use the header() function.

<?php
  header('Content-type: text/css');
?>
body {
  font-family: <?php
    if (!empty($_COOKIE['font-family'])) {
      print $_COOKIE['font-family'];
    } else {
      print "sans-serif";
    }
    ?>
}

Thanks

I would like to extend my thanks to Adam Sampson, Tina Holmboe, Terje Bless and Bjoern Hoehrmann for their invaluable help in the authoring of this document. Any errors are, of course, my own.

To Do

This is a list of things I wish to add to this document before annoucing it as "ready".