Twittering

I signed up for twitter sometime around the start of the year. After an initial flurry of twittering I dropped off. I quickly learned to turn off the I/M to my phone (I need to add an SMS package before I fall down that hole again). It reminded me a bit of Dodgeball in getting the rush of messages occasionally, but has more lasting interest to me than Dodgeball did if only because Dodgeball seemed mostly useful for meeting up with people in your network and in your city.

Most of my people are spread throughout the U.S. with a few in Europe and Asia. Dodgeball not so useful for that.

So, I twitter (you can follow me here: twitter.com/epc).

I find it most useful through the I/M client, which has been frustratingly down for several days. I work mostly from Windows systems so twitterific is not a solution for me.

Ever the tinkerer I've started toying with writing my own client.

I have nothing to show yet, all I've been doing so far is playing with the twitter api (PS: Twitter? how about adding "API" to the <title> tag on that page?). Here are some notes and observations from a few minutes with the API:

The API does not appear to support gzip encoding (or deflate) of data sent to user agents. There is some sense to this, if the data changes frequently enough there may be absolutely no value to compressing it, especially if the time to compress is longer than the transmission time. This is certainly true for the personal status feeds which are keyed to an individual. However twitter might gain a lot by compressing the public timeline feeds, even if they change frequently. If the data transmits faster that's one less client hogging a connection, and twitter.com definitely feels like it backs up on occasion.

Each response is tagged with an e–tag, however the e–tag is useless. In theory you take the e–tag from the response and use that to perform a modified-GET request (sending an If-none-match header along with the e–tag). However, the response includes a relative timestamp <relative_created_at> which is (I'm guessing here) computed at the time of the request. This means that even if you write a poorly-behaved client which sends If-none-match every minute you will almost always get a document back in response even if nothing else has changed except for the relative timestamp. If your client is better behaved and you request every ten or fifteen minutes, you'll definitely get the entire status document back, even if again the only change is the relative timestamp.

The amount of data returned in a status message is more than necessary in my slightly eccentric opinion. Let's take a look at one element of my current status in XML format:

  <user>
    <id>420363</id>
    <name>epc</name>
    <screen_name>epc</screen_name>
    <location>Brooklyn, NY</location>
    <description>Acerbic, eccentric dog walker and occasional CTO</description>
    <profile_image_url>http://assets1.twitter.com/system/user/profile_image/420363/normal/epc-1970-240x388.jpg?1171961148</profile_image_url>
    <url>http://epcostello.net/</url>
    <status>
      <created_at>Wed Mar 21 03:48:35 +0000 2007</created_at>
      <id>10323021</id>
      <text>still playing with the API x2031</text>
      <relative_created_at>11 minutes ago</relative_created_at>
    </status>
  </user>

You have id, name, screen_name, location, description, profile_image_url, url —all of which are static across each update (and are repeated multiple times in this file, once for each of my entries).

The actual status has a slightly-RFC822-ish date (normally the year would appear before the time). We get a different id, the text of the update, and this relative_created_at timestamp.

I would drop all of the profile-ish cruft, keeping only the id, name and screen_name. The rest of the information (location, description, profile_image_url and url) should be separately retrievable and cacheable through another API (perhaps http://twitter.com/users/id.{json|xml}).

I would also drop the <relative_created_at> so that e–tags can give twitter.com some breathing room. There's no reason to make the server perform this computation on each request, let the client gussy up the time (if necessary). Also I would change that timestamp to either a Unix timestamp or an ISO8601 timestamp like 20070321T034835Z or some other variant. For all I know the various time parsing libraries can handle Wed Mar 21 03:48:35 +0000 2007.

I noticed that the APIs set a _twitter_session cookie. It's not clear what this is useful for in the APIs (it doesn't appear to cache your authentication for example). I would drop this as well unless it will have some value to the APIs.

To recap, I would:

support gzip / deflate encoding for data returned to user agents
remove the individual user profile information from the update stream
add an API to retrieve information about a user id, name or screen name (one of the three is fine).
change the timestamp to something more parseable (I concede that I've done absolutely no checking to see if the standard time libraries can parse the current timestamp). Possibly using a Unix epoch timestamp or ISO8601.
drop the <relative_created_at> from the API feeds

Now to try writing a twitter client…

Discovered the Twitter Development Group on Google Groups via The Twitter Development Blog.

Posted in Twitter

202: Accepted Archives

Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.