A look at IBM's Web Feeds

Via Keith Instone I learned that there's a page of all IBM RSS feeds (well, not all). Actually they appear to be all formally supported product or service feeds, but not individual blogs like On Demand Business by Todd Watson or Doug Tidwell.

So, due to equal parts OCD and procrastination (should be packing for a week away and finishing a project), I took at look at the IBM feeds.

For starters, they're all spread out all over the place. Now, I am not concerned about the employee blogs and feeds off on their own sites, I'm looking strictly at content and feeds served from *.ibm.com sites.

The formats are all over the place, some feeds are in RSS, some Atom. Some...well, some you just kind of have to have faith that there's some sort of feed there, because the content-type is set to text/html or XML.

Free beer

Ok, not beer, but here's some ideas for the corporate types who read this site:

Run your own ping service.

ping service is something that started with weblogs.com and the Radio Userland product years ago...basically, it's a service that accepts notifications about new posts to blogs. There's a standard format, and the ping service does whatever it wants to do with the data. Technorati, for example, queues up a request to fetch the updated feed and indexes it for their blog search tool. Other services, lately, seem to simply return a Server Error 500, likely due to overload.

Anyway, if you're an organization with many people writing public content on your behalf, it'd be nice to keep abreast of what's out there, both from just a simple “It'd be nice-to-know” basis, as well as from a communications basis (while it may be refreshing to some that employees are allowed to write anything they want, it's probably a nightmare for marketing communications and corporate communications since they'll field the calls about why there's conflicting messages on allegedly official web sites).

So, run a ping service, just for your internal users. You could generate a page showing what's been updated in the last day, week, month, etc. For bonus points make that page available to your customers.

Create a Feed Registry

This would be simple to do if you'd already created the ping server. If you get pinged by an unknown feed you crawl the feed and add the relevant data to the registry. Otherwise you have a bit more work to do, but again there's multiple benefits to you (establishing a point of control for your public presence) and your customers (I've already discovered another dozen official feeds from IBM for example which aren't on the ibm.com/ibm/syndication site).

Provide OPML files

If you're using a news aggregator, feed reader, etc. you know what a pain it is to subscribe to feeds. Now, if it's just one feed, it's a minor nuisance. But what if you want to subscribe to all of the Alphaworks feeds, or The Wall Street Journal's feeds? Depending on the tool you might be able to click an XML, RSS, or Atom icon and subscribe. If it's a Feedburner feed you might get the option to subscribe off a button in the stylized presentation of the feed. But usually? Usually you have to go through a multi-step process of getting the URL to the feed, going to your aggregator, pasting the URL in to a field, hitting subscribe, verifying that you want to subscribe, and so forth.

OPML is an XML file format that has become a defacto means of packaging up lists of feeds to subscribe to. Again, depending on your news reader, aggregator, or other weapon of choice, loading the OPML file may or may not save you time. I know Bloglines supports OPML import, though I can't check it just now, oddly because Google's GMail is down, not because of anything Bloglines has done.

So, provide OPML files of your feeds (either all of them, or coherent groups of feeds), because it'll make it easier for your audience to subscribe.

Verify your feeds

I was really, really, really surprised to get text/html as a content-type for several of the IBM feeds (TEXT/HTML was also served for a bunch of feeds. Many feed consumers will happily accept all sorts of gorp and try to make do with it, just like the way we authored HTML back in the dark ages of 1994-1995. Luckily Web Explorer came along and forced us all to use proper, clean HTML.

Compress Your Feeds

Use Content negotiation or multi-format processing to make compressed feeds available to those applications which can handle the format. It'll reduce your bandwidth significantly.

To Recap

If your organization is providing various feeds (whether RSS, Atom, or the NBF), consider doing the following:

Create a ping service
Create a feed registry / directory
Provide OPML files to subscribe to groups of feeds
Validate your feeds
Compress your feeds

Your company will benefit by getting a handle on what content is being generated (and it's irrelevant whether the content is informal blog posts or droll press releases), and where it's being posted. Your customers will benefit by having all the feeds in one place. And, dear to my heart, your webmaster team will benefit if this is all automated and they don't have to edit a directory page by hand.

Analysis of the IBM feeds

Now, I don't know whether IBM has a ping service, registry, or directory (ostensibly the page at www.ibm.com/ibm/syndication is a directory, but it's not thorough, and it appears to only point to one feed, even if multiple feeds are available.

Through the magic of curl, awk, sed, and MS Excel (yes, we're multi-platform here), I did a quick analysis of the IBM feeds.

Of 284 feeds pulled off here:

39 redirected to another feed
4 404'd
10 were served as text/html or TEXT/HTML
16 were served as XML (as in, that was the Content-Type)
the remainder were served as application/xml or text/xml
Almost all of the feeds are RSS 2.0, though there's maybe a dozen RSS 0.91
Atom format feeds are avalilable off Developerworks and Alphaworks
None of the feeds appear to be available in compressed (gzip or deflate) format
None of the feeds had valid Expires: headers. 33 had expirations set for 01-Dec-1994, 17 for 1/1/1980, and 5 for 1/1/1970. Furthermore, 5 of the feeds have 'no-cache' set.
231 (82%) of the feeds have a Last-modified header, however 146 (52%) of the feeds reported a Last-modified timestamp equal to the time the document was served. Somehow a random 50% of the feeds were updated late on a Saturday night. Might be possible, but I doubt it.

While the content-type issue is a concern, a far bigger problem could lie in wait with the combination of uncacheable feeds (so feed consumers will request the feed over and over again, something I complained about here) and uncompressed feeds... should any of these feeds become popular the targeted server will end up doing a lot of needless work (I'm assuming that the feeds really don't change all that often. If the feed only changes once a day, why not return a 304 to the client if the feed hasn't changed? As-is the servers will serve the feed over and over again).

Posted in RSS and Atom Feeds

202: Accepted Archives

Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.