I was creating a cache file for webalizer and analog and came to the obvious optimization to select only hosts which received a "200" status code in response to a request (I suppose I could add 304s as well but I'm not sure that would add any value), and then strip out any hosts which made less than 10 requests in a 30 day period. This reduced the IP address list from about 3500 entries across the sites I maintain, to 470.
There appear to be about 140 unique organizations that hit the sites (mainly epcostello.net, frisket.org, and artific.com).
18% of the addresses do not reverse resolve.
Of the 387 addresses which did reverse resolve, 41 (11%) reverse resolve to an address which itself does not forward resolve to anything (that is: address
220.127.116.11 reverse resolves to
something.example.com itself does not resolve back to
has a number of hosts which reverse to a
phx.gbl top level domain (18.104.22.168 through .118). That is to say, 22.214.171.124 reverse maps to by1sch4041904.phx.gbl, not .msn.com or .msn.net.
I sent a note to the poc for the 64.4.8 network but it appeared to disappear into a black hole.
Or they're intentionally reverse mapping to a nonsense domain.
Oddly, there's another 17 hosts which all reverse resolve to
msnbot.msn.com, none of which are on the 64.4.8 network.
That is to say: 17 hosts, across a number of networks and subnets, all reverse resolve to the same hostname,
msnbot.msn.com, this hostname itself does not resolve to anything.
crawler.bloglines.com does not forward resolve to
[126.96.36.199], though 188.8.131.52 resolves back to crawler.bloglines.com.
MSN seems to have the largest number of IPs and hostname mismatches or resolution failures.
One of the IBM gateways (I'm guessing in the Southbury, CT data center) reverse resolves
bi01pt1.ct.us.ibm.com, which does not in turn resolve back to 184.108.40.206.
Nothing earth shattering here...most web sites turn off name resolution these days, doing it only in post-processing, or on a specific basis within an application.
And no one who is remotely sane turns on
HostnameLookups double in their server configurations.
Where it does come into play is if you are using hostnames in access control lists.
Unlikely on a totally public site, but if you have a protected area, a semi-private extranet, and you add a
Allow from .ibm.com, then anyone who's using that 220.127.116.11 gateway will get bounced, at least from Apache based servers since
mod_access will perform a double lookup (at least according to the documentation).
I suppose there are other situations where you might use the hostname to allow access for search engine spiders, where otherwise you might require some other form of authentication (eg: set up a
satisfy any block, add
allow *.google.com, *.msn.com, *.yahoo.com and then a check for a cookie with a mix of
I noticed more search hits coming in to this article as well as some more comments and thought I'd post this update. I feel confident that the fake .phx.gbl top level domain name is being used by Microsoft/MSN, though I cannot understand why (if you're going to create PTR records, why not make them which your identifiable domain, since anyone can eventually determine who is assigned the address block).
I created the following table of addresses in the 18.104.22.168/255 network with the corresponding PTR records (reverse DNS records), this is a snapshot as of 0347Z on 8 July 2006.
If anyone from Microsoft network operations reads this, would you mind explaining why you're using a fake top level domain for your search engine robot? (It also appears to be used in some MSN/Hotmail mail headers, that's less of a concern or interest to me)
|IP Addresses||Reverse DNS mappings|
|IP Addresses||Reverse DNS mappings|
crawler.bloglines.comnow forward and reverse resolves to
Posted in Webmastery
Copyright 2002–2011 Artific Consulting LLC.
Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.