A collection of informal posts about web and internet technology.
I started playing with twitter in January 2007, well before that year's SXSW breakout. I didn't start playing around with the twitter API until recently, joining the Twitter Development Talk Google Group, writing first my own command line update client (a very cunning combination of bash and curl which until this week totally failed to URL encode parameters. Doh.) and more recently a followers history tool (which isn't public, and won't be until I either get over accepting people's twitter credentials or some sort of third party authentication scheme is rolled out). My other posts about twitter are collected here: Topics/Web Services/Twitter.
I'll just preface the rest of this with:
I have no inside insight, and have no idea what is wrong with
twitter other than the general it's not scaling particularly well, is it?
.
So, this is all just supposition.
I just wanted to make that clear.
The biggest problem with twitter? It's free.
I know, you're thinking WTF is that ∗ character there for? And there is a cost you bozo, there's a rate limit!
.
I concede your point, there is a rate limit, which applies on a per–user basis.
But that's not a cost.
That's not a price.
The per–user limit doesn’t cause a user to sit back and think: Ooh, do I really want to tweet "Taking the dogs for a walk to the Telectrascope on Fulton Ferry."
I'm old school. I think it's the right, the duty of anyone running a web site to protect that site from abusive behavior, whatever that may be. I regularly rant on our nextNY list that people can and should take proactive measures to protect their sites (blocking users, bots, what–not). I don't think that we, as site managers (webmasters, whatever you want to call us these days) have to suck up all of the traffic thrown at a site just because as a general principle we're open to user generated content, APIs into our services, means of extending whatever it is we're providing (and, in theory, profiting from in some way).
So, I think twitter's problems come down to three separate areas which are intersecting:
Stop! You have to be reviewed.imposes a cost to using the API, and that is a good thing. If you're just screwing around, use the sandbox. If you want to distribute code built on the twitter API, then there's a cost. Whether there's a financial cost or not is up to twitter.
Hey, let's not just slander them, let's take out what chance they have at a successful business model too!), or claimed that it's some sort of public resource. Just chill. Twitter's problems are solvable. Not necessarily easily solvable, but solvable.
.net culture is fickle. The amount of time it would take to rebuild twitter on–the–fly as I’ve outline isn't very long, maybe weeks if one starts with a clear hand and blank sheet of paper, months under the current firestorm setup. But the various .net glitterati would just as soon shove twitter under the water, than see it succeed.
Imposing a cost to using twitter would have its own cost of course: what's made twitter grow is the API and the complete flexibility to develop applications and mashups against that API. Many of the mashups are only possible when the data and API use are financially free. Many of the clients are only possible when API use is transactionally free (or close enough).
Someone has to pay to use these services. If so many people find them to be useful, why hasn't a successful business model appeared for recouping the cost?
Watching twitter's struggles have been an education for me, not just for the technical issues they face, but also the network effects of the API usage, and the reaction and criticism they've received as they work through the technical issues.
Followup: Kee Hinckley has written up an excellent post to the twitter development group: An odd request for Twitter - Please stop fixing bugs in the API.
My personal site, epcostello.net, has been up since 2003. Over five years I've redesigned, reorganized, remodeled and removed a lot of the site. When it first launched, my personal blog was epcostello.net/journal, my link and commentary blog was epcostello.net/epicrisis, and yet another blog had long form essays. Since then I've consolidated everything under epcostello.net/epicrisis.
Each time I've remodeled, I've tried to be good and redirect old URLs to the appropriate new URL.
In a few cases, specifically with web feeds, I've turned off the redirects and issue 410 Gone messages instead.
As far as I can tell, few feed readers, slurpers, indexers, what-not remove a feed, ever. They ignore 404 File Not Found, and also appear to ignore 410 (which is as explicit as you can get: The file existed, now it's gone, it's not expected to return, now go away
).
So, while looking through the raw logs for my personal site for May I came across a flurry of hits on epcostello.net/journal/rss.xml from an agent identifying itself as BuzzTracker/1.02.
Now, thanks to the wonder of cheap disk, I can tell you the following:
301 permanently moved redirect on July 25, 2005.410 on April 17, 2007Now, it does not cost me anything, really, to serve this, but this is just one of many user-agents out there that is so poorly written that it continues to fetch a URL it's been told is permanently gone over and over again. And that all adds up to wasted bandwidth and processor use on my part.
I had not heard of BuzzTracker, so I looked around on the site, intending to send feedback asking that the feed be removed from their cache so that they stop requesting it.
The feedback page reads "Page not found" (though it returns a "200" and not a "404").
Further tooling around reveals that the site was bought by Yahoo! in 2007 (a year later it has no Yahoo! branding and not much else to indicate any integration with Yahoo!). So, you get this blog post instead.
Now, I should point out that there's other agents which are hitting the same URL, getting the same 410, and continuing to do so on a hourly and daily basis:
This is just sloppy programming. And I find it really frustrating, it just adds noise and burden to the server side. You might think Oh, look, it's only one hit every couple of hours
but there's no limit (the 410 is supposed to be that limiting factor, it's an intentional statement on the server administrator's behalf that the resource is gone, gone, gone. Go away. Really. Requesting it again it an hour will not make it return.)
I follow and occasionally post to the twitter-development list, I even have a couple of twitter related projects sitting on the side waiting for the launch of oAuth (or something comparable) to access twitter. There's a lot of great ideas there, but there's a lot of dumb programming as well. In as much as twitter has its own stability problems, I wonder (and believe) if many of their problems are caused by just gut-wrenchingly bad programming on the part of some of the tools people are writing against the twitter API. Many don't seem to do any caching at all, they pound away on the API making requests that could be computed on the client side, and instead of backing off they keep retrying the failed command until the account gets locked out for exceeding the API requests limit.
Having been on the wrong end of the Internet firehose many times myself, can I just ask that developers give more than 30 seconds of thought before unleashing some of these nifty gadgets out onto the world, contemplating what the impact will be on the (likely free) services they're beating the crap out of?
It's a new year and time for some dumb data analysis.
Most interesting thing to me this year is that most of the traffic
to this site and my other sites (notably epcostello.net) is from
automated agents: search engines, random webcrawlers, SEO's link injectors.
| 155127 | 200 |
| 64662 | 304 |
| 11400 | 301 |
| 6352 | 302 |
| 5705 | 404 |
| 5308 | 202 |
| 2784 | 401 |
| 1609 | 405 |
| 126 | 400 |
| 63 | 414 |
| 31 | 500 |
| 30 | 403 |
| 3 | 501 |
| 4551 | 66.249.73.200 | crawl-66-249-73-200.googlebot.com |
| 2234 | 81.52.143.16 | natcrawlbloc03.net.m1.fti.net |
| 1890 | 81.52.143.15 | natcrawlbloc01.net.m1.fti.net. |
| 1492 | 64.152.34.36 | jfk-lv3-n4.panthercdn.com |
| 1380 | 38.99.203.110 | Panscient_Data_Services.demarc.cogentco.com |
| 991 | 128.194.135.94 | web-crawler.irl.cs.tamu.edu |
| 885 | 216.240.154.103 | |
| 884 | 66.249.73.148 | crawl-66-249-73-148.googlebot.com |
| 800 | 64.92.162.210 | |
| 773 | 72.30.177.225 | wm509310.inktomisearch.com. |
| 21547 | /robots.txt |
| 8030 | /favicon.ico |
| 6801 | /articles/2005/12/27/a_practically_u/ |
| 6062 | /articles/nav-commenters.gif |
| 6055 | /g/Google_logo_transparent.png |
| 5582 | /d/4/js/ajax/ |
| 5290 | /202/2006/06/disabling_trackbacks_in_movabl/ |
| 4445 | / |
| 4217 | /g/feed-icon-16×16.png |
| 3993 | /g/by-sa-3.0-88×31.png |
| 21547 | /robots.txt |
| 6801 | /articles/2005/12/27/a_practically_u/ |
| 5290 | /202/2006/06/disabling_trackbacks_in_movabl/ |
| 4445 | / |
| 2812 | /202/ |
| 2664 | /202/2006/12/google_reader_annoyances/ |
| 2121 | /202/2006/10/social-bookmarking-and-attention/ |
| 1278 | /202/2006/11/bloglines_new_features_playlis/ |
| 912 | /articles/ |
| 890 | /202/2006/07/yet_another_spam_retaliation_t/ |
| 2353 | crawl-66-249-73-200.googlebot.com | [66.249.73.200] | |
| 1404 | natcrawlbloc03.net.m1.fti.net | [81.52.143.16] | |
| 1192 | natcrawlbloc01.net.m1.fti.net | [81.52.143.15] | |
| 770 | wm509310.inktomisearch.com | [72.30.177.225] | |
| 736 | ct501085.crawl.yahoo.net | [74.6.86.230] | |
| 688 | wm509458.inktomisearch.com | [74.6.74.202] | |
| 498 | crawl-66-249-73-148.googlebot.com | [66.249.73.148] | |
| 496 | livebot-65-55-213-74.search.live.com | [65.55.213.74] | |
| 491 | wm508816.inktomisearch.com | [74.6.69.173] | |
| 342 | lj512274.crawl.yahoo.net | [74.6.19.77] | |
| 342 | wm511001.inktomisearch.com | [72.30.252.135] | |
| 327 | natcrawlbloc02.net.s1.fti.net [193.252.149.15] | ||
| 288 | lm502044.crawl.yahoo.net | [72.30.226.173] | |
| 262 | ct501101.crawl.yahoo.net | [74.6.86.207] | |
| 233 | 67.110.56.45.ptr.us.xo.net | [67.110.56.45] | |
| 224 | crawl-66-249-73-132.googlebot.com | [66.249.73.132] | |
| 208 | wm511565.inktomisearch.com | [72.30.226.209] | |
| 206 | c02.entireweb.com | [89.150.197.130] | |
| 199 | wm509426.inktomisearch.com | [74.6.75.46] | |
| 197 | ip67-95-51-86.z51-95-67.customer.algx.net | [67.95.51.86] | |
Last time robots.txt changed: 23 March 2007
| 97488 | "-" |
| 533 | "http://neworder.box.sk/forum.php?page=last&did=multSecurity%20and%20Networking&thread=251392" |
| 244 | "http://www.zenatode.org.uk/ian/internet/hotmail.xhtml" |
| 212 | "http://my.yahoo.com/" |
| 142 | "http://www.google.com/search?hl=en&q=phx.gbl" |
| 122 | "http://www.stumbleupon.com/refer.php?url=http%3A%2F%2Fartific.com%2Farticles%2F2005%2F12%2F27%2Fa_practically_u%2F" |
| 117 | "http://www.google.com/search?q=phx.gbl&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a" |
| 88 | "http://www.google.com/search?hl=en&q=phx.gbl&btnG=Google+Search" |
| 68 | "http://neworder.box.sk/forum.php?did=multSecurity%20and%20Networking&thread=251392" |
| 55 | "http://www.dslreports.com/shownews/DNS-Hacks-Phishing-20-90182" |
Internal referrers and obviously junk referrers have been filtered out.
| 1474 | q=phx.gbl |
| 260 | q=phx.gbl" |
| 122 | q=.phx.gbl |
| 107 | q=phx.gbl%3A1863 |
| 98 | q=phx%2egbl" |
| 76 | q=crawler.bloglines.com |
| 62 | q=phx.gbl+domain |
| 59 | q=gbl+domain |
| 53 | q=gbl+tld |
| 42 | q=%22phx.gbl%22 |
phx.gbl is a pseudo-domain used by Microsoft for a variety of services. I wrote about it in On The Importances of Revers DNS which I now realize is still using the previous design system for this site.
| 1474 | q=phx.gbl |
| 260 | q=phx.gbl" |
| 122 | q=.phx.gbl |
| 107 | q=phx.gbl%3A1863 |
| 62 | q=phx.gbl+domain |
| 42 | q=%22phx.gbl%22 |
| 30 | q=.phx.gbl%3A1863 |
| 29 | q=@phx.gbl |
| 26 | q=%40phx.gbl |
| 23 | q=phx.gbl+netstat |
| 22 | q=phx.gbl+1863 |
| 22 | q=.phx.gbl" |
| 19 | q=phx.gbl%3A1863" |
| 18 | q=by2msg2204708.phx.gbl |
| 17 | q=what+is+phx.gbl |
| 15 | q=netstat+phx.gbl |
| 15 | q=by1msg4176104.phx.gbl |
| 14 | q=phx.gbl+msn |
| 13 | q=by2msg2204912.phx.gbl |
| 13 | q=%22phx.gbl%22" |
| 76 | q=crawler.bloglines.com |
| 38 | q=artific |
| 16 | q=infobackground |
| 15 | q=ed+costello |
| 15 | q=207.46.108.36 |
| 15 | q=202+Accepted |
| 13 | q=crawler.bloglines.com" |
| 13 | q=207.46.111.86 |
| 13 | q=202+accepted |
| 11 | q=spam+retaliation |
| 11 | q=InfoBackground |
| 10 | q=google+reader+rename+folder |
| 10 | q=Reverse+DNS |
| 9 | q=kb05474 |
| 9 | q=importance+of+reverse+dns |
| 9 | q=crawler.bloglines.com+ |
| 8 | q=nokia+espionage |
| 8 | q=iab+ad+units |
| 8 | q=hotmail+reverse+dns |
| 7 | q=tvpath.com |
In March 2007 I wrote my own trackback endpoint in PHP which logs all of the trackback data to a file instead of beating up my MovableType installation and MySQL database.
Trackbacks Received since 21 March 2007: 10258
Number of Valid Trackbacks: 0
| 447 | 207-234-131-237.ptr.primarydns.com | [207.234.131.237] | |
| 156 | movinglabs.com [195.242.99.80] | ||
| 150 | u15250532.onlinehome-server.com [74.208.14.63] | ||
| 144 | 218.189.232.72.static.reverse.ltdomains.com | [72.232.189.218] | |
| 124 | [206.123.73.15] [206.123.73.15] | ||
| 122 | server.camelotwealthcreation.com | [69.50.210.8] | |
| 113 | giantlogic.net [208.101.35.52] | ||
| 99 | 89-149-195-161.internetserviceteam.com [89.149.195.161] | ||
| 96 | u15251680.onlinehome-server.com [74.208.14.215] | ||
| 95 | 210.219.232.72.static.reverse.ltdomains.com | [72.232.219.210] | |
| 169 | "Tramadol." |
| 151 | "Phentermine." |
| 119 | "Xanax." |
| 94 | "Cialis." |
| 62 | "Lexapro." |
| 56 | "Ephedra." |
| 52 | "Valium." |
| 52 | "Ultram." |
| 47 | "Zoloft." |
| 43 | "Ambien." |
| 42 | "Fioricet." |
| 37 | "Percocet." |
| 37 | "Cheapphentermine." |
| 37 | "Adderall." |
| 34 | "Soma." |
I have the following domains available for sale through Sedo:
Acquire them through Sedo or contact me at sales @ artific.com.
I realized I sort of fell off the blog beat here. I've rebuilt the site (again) using the Yahoo! User Interface Library which I've been getting to know and use for various sites over the past year. I'm currently at the Defrag 2007 conference in Denver, CO and am enjoying it, it's serving (for me) as an introduction to the intersection of the current crop of social tools and enterprises.
I'll post some notes from Defrag later today.
I just attempted to log into my FeedBurner account and got the following notice (in addition to the login information):
NOTE: Service of FeedBurner publisher accounts will not be interrupted as a result of the acquisition by Google. You will have a 14-day interim period ending June 15, 2007 to opt-out of allowing Google to service your account. If you take no action by June 15, 2007, the rights to your data will transfer from FeedBurner to Google. Opting out will terminate your user agreement with FeedBurner, permanently delete your FeedBurner account, feeds, and all related statistical data and history, and prevent the transfer of your data rights to Google. To opt-out, contact us via accountx@feedburner.com, provide your FeedBurner account Username, and request to have your FeedBurner account deleted. We will contact you at your registered email address to confirm your deletion request before completing it.
While I don't object to the sale or to the company merging into the Google Borg, 14 days seems to be awfully short to give notice to those who don't want to continue using FeedBurner after it becomes part of Google. I have feeds which I ceased advertising years ago, which either return 301 redirects or 410 gone messages and they still get tens of hits per day. If you delete your FeedBurner account you can't redirect the subscribers using that account (unless you either redirect using a 302 or 307 redirect from a URL you control, or you used the feeds.yoursitename.tld service, which you could simply point back to a site you control). With summer vacations and the vagaries of feed updates my guess is that many people or organizations who do opt-out of the FeedBurner-Google migration will lose many readers, who will just get dropped.
FeedBurner should allow, for a limited time, say 60 days, the ability to opt-out in some way and redirect the feed elsewhere.
An interesting exercise would be to track the people who drop their FeedBurner feeds (and accounts), picking out the feeds with the highest traffic volume and registering them for yourself (unless FB has changed to blocking re-registration of a feed URL).
I noticed yesterday that twitter has added Google Adsense ads to the individual status message pages, but only if you are not logged into twitter. Here's two examples:
Twitter have also added a tab to your twitter home page showing "replies" (messages sent to @username.
Update: fixed images and thumbnails.
My wife and I are in the process of moving. In the scheme of things, we're not moving all that far (approximately 660 meters if Google Earth can be trusted). In New York City you only need to move an avenue or two to be in a completely different neighborhood.
So, we're moving, and I want to get various things lined up.
I used the U.S.P.S. online address change form, am forwarding all of our mail to a P.O. Box to "cleanse" our data trail in Corporate America's databanks, and am trying to figure out our broadband solution (the new building has a T1
we're told with no limit on the exclamations. That might have been cool in 1998, but we have and use a 7Mbs DSL line today, which makes a piddling T1 look downright like dialup. I digress.).
I have found it handy to have a stamp with our address on it for the rare times we actually send postal mail, so I went to Staples.com to order a stamp online. I have an account there, which is probably the only reason I thought to go there. After tooling around the site for 30 seconds I got asked if I wanted to go to the Staples Custom Printing Shop. On clicking "ok", I ended up at http://www.staples.marktheworld.com/browsercheck.asp, which is apparently where Staples has outsourced their custom stamp printing to. The browsercheck.asp in the URL should give away what happened next:
We are sorry for the inconvenience. Our site currently supports only Internet Explorer version 4.0 or higher. This is due to the advanced features used in the product customization process.
Come on. I mean, sure, they probably used an ActiveX control written in 1999 to show what the stamp would look like. And MSIE is used by, what, 80% of the worldwide marketplace? And they probably don't want to waste the precious investment in the ActiveX and ASP coding. The net result is that they lost me as a customer, there is no reason, today, to be designing web applications solely for one browser platform. None. I will accept that if you're on a tightly controlled intranet you might consider it, but really there's just no reason for this.
Copyright 2002–2008 Artific Consulting LLC.
Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License.
Please read and understand the license before repurposing content from this site.