Content Negotiation (aka Multi-Format Processing) for CSS and RSS

Content Negotiation or Multi-Format Processing refers to the ability of an HTTP server to use information supplied by a user agent to determine the most appropriate file to send in response to a request. The Apache documentation defines Content Negotiation as the server's ability to [...] choose the best representation of a resource based on the browser-supplied preferences for media type, languages, character set and encoding. Content negotiation is also known as multi-format processing in some web server environments (possibly just those derived from the original CERN base).

Content negotiation is typically used to provide translations of content, though it can be used to negotiate based on any Accept-* header in an HTTP request.

My interest in using content negotiation for CSS and RSS is to serve compressed files to any user agents which can accept that encoding, reverting back to uncompressed files for user agents which don't.

The value in compressing these files, specifically RSS (and Atom) files, is to reduce the overall bandwidth being used, especially when confronted by automated agents which retrieve these files repeatedly over the course of the day.

Content negotiation is one means of reducing bandwidth utilization, as well as improving user experience just a teensy bit by minimizing the size of ever-growing CSS files.

Background:

As an experiment, I started compressing my RSS and Atom files on my personal site some time during the summer of 2004. Periodically a script runs on my site which invokes gzip to compress the RSS or atom file, resulting in the deletion of the original file (eg: a request for /articles/rss.xml would result in a 404 File Not Found error) and the creation of a new compressed file (rss.xml.gz). At this stage, a request for http://epcostello.net/articles/atom.xml would still result in a 404 error. I needed to turn on content negotiation by adding Options +Multiviews to my .htaccess file for that portion of the site.

Once added, a request for http://epcostello.net/articles/atom.xml causes the server to seek a suitable response. There's no atom.xml file, but the server finds atom.xml.gz and returns that. Generally a good thing, except in the scenario where a user agent cannot handle the gzip encoding.

I've done this both with my RSS/Atom feeds, as well as my CSS style sheet on my personal site. I'm mostly satisfied with the results, however I've noticed that there are several blog aggregators or readers which don't appear to handle the gzip encoding.

Also, surprisingly, Microsoft Internet Explorer cannot handle a gzip encoded CSS file.

Options +Multiviews is the way to turn on automatic content negotiation under Apache. An alternative is to use a typemap file.

A type-map file allows you to perform content negotiation on a file-by-file basis, while Options +Multiviews applies to the directory (and I believe sub-directories).

The name of the type-map file is what I'm going to call the external URI of the document you want to negotiate. Now, one minor problem I initially had was that I didn't want to change the various URLs in my documents. So, I initially tried to have my type-map file end in .xml however that ruled out using .xml as an extension for any of the negotiated files. So, I've gotten over my resistance to changing the URLs for these files throughout my site in the interest of actually getting this working.

The remainder of this article is a step by step guide to creating and using type-maps for content negotiation of RSS/Atom files.

Associate an extension with type-map (Apache)

To use a type-map for content negotiation, you have to associate an extension to the type-map handler. You can do this in your server configuration, or in a .htaccess file.

Note: The name can be configured on a server by server basis, and the ability to add handlers may be restricted on your particular server.

Now, one other thing I decided was to avoid the default type-map extension of .var. I wanted to use a scheme where I could indicate the type of the file being negotiated in the URI. So I am using .@extension to indicate both the mime type of the negotiated file as well to indicate that a given URL needs to be negotiated using a typemap. Thus, a negotiated RSS feed rss.xml will have a type-map file named rss.@xml. A negotiated CSS style file common.css will have a corresponding type-map common.@css.

This resulted in addding the following to my .htaccess file:

AddHandler type-map @xml @txt @html @css

So, any request for a file ending in .@xml, .@txt, .@html, or .@css will result in the type-map handler being invoked.

The type-map handler will look for a corresponding type-map file in the directory, so we need to create one.

Create a type-map file

The type-map file lists the various options to negotiate across. A stanza consists of a URI: header specifying the file to return, and one or more additional headers which specify characteristics about the file which can be used in negotiation. The accepted headers are:

URI:
Content-type: corresponds to Accept: header in an HTTP request
Content-language: corresponds to Accept-language: header in an HTTP request
Content-encoding: corresponds to Accept-encoding header in an HTTP request
Content-length: I'm not sure how one would negotiate on this
Description: a textual description used in error messages if no variations are suitable

Although the RSS and Atom feeds for this site are negotiated based on the presence and value of the accept-encoding header in an HTTP request, I'm just going to refer to the Atom feed for this example.

The atom feed is saved as feeds/atom.xml relative to the articles directory of this site. A cron job is run on a regular basis which compresses atom.xml and saves it to atom.xml.gz. Both atom.xml and atom.xml.gz can be retrieved directly, however I want to negotiate and ideally return solely the gzip'd file.

The applicable type-map file is named atom.@xml and consists of:

URI: atom.xml
Content-type: application/atom+xml

URI: atom.xml.gz
Content-type: application/atom+xml
Content-encoding: x-gzip

The first line of the first stanza specifies the filename of the uncompressed file, atom.xml. The next line specifies the content-type, application/atom+xml.

A blank line is required between stanzas.

The second stanza specifies the compressed filename using the URI header, followed again by the content-type. The content-encoding line gives the type-map handler the information it needs to differentiate between this entry and the previous entry in the file.

So, what's the point?

Again, this file is saved as atom.@xml on my system. If you request http://artific.com/articles/feeds/atom.@xml with a typical browser you likely will just see the resulting Atom format data. If you bring up the page properties (eg: Tools/Page Info in Firefox), you might notice the smaller size, but otherwise the fact that you received the compressed file instead of the uncompressed file should be transparent.

You could use wget or curl to see the effect of negotiation by specifying whether or not compressed data is accpted (eg: wget -S --header=Accept-Encoding:\ gzip,deflate http://artific.com/articles/feeds/atom.@xml to request the compressed version, drop the --header option to get the uncompressed version).

The point, basically, is to minimize the bandwidth required for these files.

If you're using PHP you can turn on gzip compression for PHP files, and in theory you could wrap some PHP around your RSS/Atom feeds, your CSS and other static files. I tried that briefly, but feel it's a waste of resources especially since the Apache server can handle this work easily.

Now, there's another option that is much easier than what I've outlined here, and that's to use mod_gzip, eg:

<IfModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_can_negotiate yes
mod_gzip_min_http 1001
mod_gzip_temp_dir "/tmp"
mod_gzip_handle_methods GET POST
mod_gzip_item_include file "\.css$"
mod_gzip_item_include file "\.html$"
mod_gzip_item_include file "\.xml$" </IfModule>

However, I had to rule that out as my hosting provider doesn't include mod_gzip in the standard build.

So, back to what can be done with a typical user setup.

The benefits to using content negotiation to serve compressed static files are that you minimize your bandwidth utilization, you speed up the download to the requesting user-agent (more important for your CSS or Javascript files), and you let the server do most of the work. I don't know if mod_gzip caches compressed files, if it does not cache the compressed file then I imagine there's some minor benefit of using type-map to serve compressed files over mod_gzip since the server just has to decide which file to serve and serve it, instead of the additional step of compressing the file.

Drawbacks

There are some drawbacks to this approach. It's not totally transparent, you have to specify a filename which will appear odd in comparison to what the user agent is expecting, regardless of using my .@xml extension or the default .var default extension. There are user agents in the world which unfortunately make decisions based on the extension of the requested document, these UAs may mistakenly treat your gzip'd file as text/plain or an octet stream, and the resulting interpretation is unlikely to be what you want.

Another drawback, specific to using this technique with a content management system like MovableTpe is that there's a disconnect between the URI of the feed (eg: http://artific.com/articles/feeds/atom.@xml) and the file maintained by the CMS (in my setup: feeds/atom.xml is the only file known to MT). So, where I've been using:

<link rel="alternate" type="application/atom+xml" href="<$MTLink template="Atom Index"$>" title="Atom Index"/>

to link to my RSS and Atom feeds, I've had to change to and hard code:

<link rel="alternate" type="application/atom+xml" href="<$MTBlogURL$>/feeds/atom.@xml" title="Atom Index"/>

It's a minor thing, but it's the sort of thing which will surface to cause problems for me down the road when I decide to move my feeds somewhere.

Finish up already

Ok, ok. Basically: if you run a site which is expected to receive many visitors, or you have files like the various syndication feeds, or you're concerned about speeding up transmission and minimizing bandwidth utilization, you will want to look at compressing as much data on your site as possible. Use this technique if mod_gzip is unavailable. If you are serving pages up in PHP, look at using either ob_gzhandler or adding

php_flag zlib.output_compression On
php_value zlib.output_compression_level 9

to your .htaccess file (see these articles for my experiences with ob_gzhandler and zlib.outputcompression).

One last idea...

It occurred to me as I wrote this that it may be possible to set up a type-map for a generic feed, negotiating not only whether it's compressed or not, but whether it's RSS, Atom, or the feed around the corner.

Posted in MovableType

202: Accepted Archives

Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.