p. It's a new year and time for some dumb data analysis.
p. Most interesting thing to me this year is that most of the traffic
to this site and my other sites (notably epcostello.net) is from
automated agents: search engines, random webcrawlers, SEO's link injectors.
h2. HTTP Status Codes:
|155127| 200|
|64662| 304|
|11400| 301|
|6352| 302|
|5705| 404|
|5308| 202|
|2784| 401|
|1609| 405|
|126| 400|
|63| 414|
|31| 500|
|30| 403|
|3| 501|
h2. Top Ten Hosts
|4551|66.249.73.200|crawl-66-249-73-200.googlebot.com|
|2234|81.52.143.16|natcrawlbloc03.net.m1.fti.net|
|1890|81.52.143.15|natcrawlbloc01.net.m1.fti.net.|
|1492|64.152.34.36|jfk-lv3-n4.panthercdn.com|
|1380|38.99.203.110|Panscient_Data_Services.demarc.cogentco.com|
|991|128.194.135.94|web-crawler.irl.cs.tamu.edu|
|885|216.240.154.103|
|884|66.249.73.148|crawl-66-249-73-148.googlebot.com|
|800|64.92.162.210|
|773|72.30.177.225|wm509310.inktomisearch.com.|
h2. Raw Top Ten Requests
|21547|/robots.txt|
|8030|/favicon.ico|
|6801|/articles/2005/12/27/a_practically_u/|
|6062|/articles/nav-commenters.gif|
|6055|/g/Google_logo_transparent.png|
|5582|/d/4/js/ajax/|
|5290|/202/2006/06/disabling_trackbacks_in_movabl/|
|4445|/|
|4217|/g/feed-icon-16x16.png|
|3993|/g/by-sa-3.0-88x31.png|
h2. Filtered Top Ten Requests
|21547|/robots.txt|
|6801|/articles/2005/12/27/a_practically_u/|
|5290|/202/2006/06/disabling_trackbacks_in_movabl/|
|4445|/|
|2812|/202/|
|2664|/202/2006/12/google_reader_annoyances/|
|2121|/202/2006/10/social-bookmarking-and-attention/|
|1278|/202/2006/11/bloglines_new_features_playlis/|
|912|/articles/|
|890|/202/2006/07/yet_another_spam_retaliation_t/|
h2. Top 20 non-caching requestors of Robots.txt:
|2353|crawl-66-249-73-200.googlebot.com| [66.249.73.200]|
|1404|natcrawlbloc03.net.m1.fti.net| [81.52.143.16]|
|1192|natcrawlbloc01.net.m1.fti.net| [81.52.143.15]|
|770|wm509310.inktomisearch.com| [72.30.177.225]|
|736|ct501085.crawl.yahoo.net||[74.6.86.230]|
|688|wm509458.inktomisearch.com| [74.6.74.202]|
|498|crawl-66-249-73-148.googlebot.com| [66.249.73.148]|
|496|livebot-65-55-213-74.search.live.com|[65.55.213.74]|
|491|wm508816.inktomisearch.com| [74.6.69.173]|
|342|lj512274.crawl.yahoo.net||[74.6.19.77]|
|342|wm511001.inktomisearch.com| [72.30.252.135]|
|327|natcrawlbloc02.net.s1.fti.net [193.252.149.15]|
|288|lm502044.crawl.yahoo.net||[72.30.226.173]|
|262|ct501101.crawl.yahoo.net||[74.6.86.207]|
|233|67.110.56.45.ptr.us.xo.net| [67.110.56.45]|
|224|crawl-66-249-73-132.googlebot.com| [66.249.73.132]|
|208|wm511565.inktomisearch.com| [72.30.226.209]|
|206|c02.entireweb.com| [89.150.197.130]|
|199|wm509426.inktomisearch.com| [74.6.75.46]|
|197|ip67-95-51-86.z51-95-67.customer.algx.net| [67.95.51.86]|
p. Last time robots.txt changed: 23 March 2007
h2. Top Ten Referrers (filtered):
|97488|"-"|
|533|"http://neworder.box.sk/forum.php?page=last&did=multSecurity%20and%20Networking&thread=251392"|
|244|"http://www.zenatode.org.uk/ian/internet/hotmail.xhtml"|
|212|"http://my.yahoo.com/"|
|142|"http://www.google.com/search?hl=en&q=phx.gbl"|
|122|"http://www.stumbleupon.com/refer.php?url=http%3A%2F%2Fartific.com%2Farticles%2F2005%2F12%2F27%2Fa_practically_u%2F"|
|117|"http://www.google.com/search?q=phx.gbl&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a"|
| 88|"http://www.google.com/search?hl=en&q=phx.gbl&btnG=Google+Search"|
| 68|"http://neworder.box.sk/forum.php?did=multSecurity%20and%20Networking&thread=251392"|
| 55|"http://www.dslreports.com/shownews/DNS-Hacks-Phishing-20-90182"|
p. Internal referrers and obviously junk referrers have been filtered out.
h2. Top Ten Raw Search Requests:
|1474|q=phx.gbl|
|260|q=phx.gbl"|
|122|q=.phx.gbl|
|107|q=phx.gbl%3A1863|
|98|q=phx%2egbl"|
|76|q=crawler.bloglines.com|
|62|q=phx.gbl+domain|
|59|q=gbl+domain|
|53|q=gbl+tld|
|42|q=%22phx.gbl%22|
h3. Top 20 phx.gbl searches:
p. phx.gbl is a pseudo-domain used by Microsoft for a variety of services. I wrote about it in "On The Importances of Revers DNS":http://artific.com/articles/2005/12/27/a_practically_u/ which I now realize is still using the previous design system for this site.
|1474|q=phx.gbl|
|260|q=phx.gbl"|
|122|q=.phx.gbl|
|107|q=phx.gbl%3A1863|
|62|q=phx.gbl+domain|
|42|q=%22phx.gbl%22|
|30|q=.phx.gbl%3A1863|
|29|q=@phx.gbl|
|26|q=%40phx.gbl|
|23|q=phx.gbl+netstat|
|22|q=phx.gbl+1863|
|22|q=.phx.gbl"|
|19|q=phx.gbl%3A1863"|
|18|q=by2msg2204708.phx.gbl|
|17|q=what+is+phx.gbl|
|15|q=netstat+phx.gbl|
|15|q=by1msg4176104.phx.gbl|
|14|q=phx.gbl+msn|
|13|q=by2msg2204912.phx.gbl|
|13|q=%22phx.gbl%22"|
h3. Top 20 Non-phx.gbl searches:
|76|q=crawler.bloglines.com|
|38|q=artific|
|16|q=infobackground|
|15|q=ed+costello|
|15|q=207.46.108.36|
|15|q=202+Accepted|
|13|q=crawler.bloglines.com"|
|13|q=207.46.111.86|
|13|q=202+accepted|
|11|q=spam+retaliation|
|11|q=InfoBackground|
|10|q=google+reader+rename+folder|
|10|q=Reverse+DNS|
|9|q=kb05474|
|9|q=importance+of+reverse+dns|
|9|q=crawler.bloglines.com+|
|8|q=nokia+espionage|
|8|q=iab+ad+units|
|8|q=hotmail+reverse+dns|
|7|q=tvpath.com|
p. In March 2007 I wrote my own trackback endpoint in PHP which logs all of the trackback data to a file instead of beating up my MovableType installation and MySQL database.
p. Trackbacks Received since 21 March 2007: 10258
p. Number of Valid Trackbacks: 0
h2. Top Ten Trackback Sources:
|447|207-234-131-237.ptr.primarydns.com|[207.234.131.237]|
|156|movinglabs.com [195.242.99.80]|
|150|u15250532.onlinehome-server.com [74.208.14.63]|
|144|218.189.232.72.static.reverse.ltdomains.com|[72.232.189.218]|
|124|[206.123.73.15] [206.123.73.15]|
|122|server.camelotwealthcreation.com||[69.50.210.8]|
|113|giantlogic.net [208.101.35.52]|
|99|89-149-195-161.internetserviceteam.com [89.149.195.161]|
|96|u15251680.onlinehome-server.com [74.208.14.215]|
|95|210.219.232.72.static.reverse.ltdomains.com|[72.232.219.210]|
h2. Top Fifteen Trackback Titles:
|169|"Tramadol."|
|151|"Phentermine."|
|119|"Xanax."|
|94|"Cialis."|
|62|"Lexapro."|
|56|"Ephedra."|
|52|"Valium."|
|52|"Ultram."|
|47|"Zoloft."|
|43|"Ambien."|
|42|"Fioricet."|
|37|"Percocet."|
|37|"Cheapphentermine."|
|37|"Adderall."|
|34|"Soma."|
Posted in Webmastery