I was creating a cache file for webalizer and analog and came to the obvious optimization to select only hosts which received a "200" status code in response to a request (I suppose I could add 304s as well but I'm not sure that would add any value), and then strip out any hosts which made less than 10 requests in a 30 day period. This reduced the IP address list from about 3500 entries across the sites I maintain, to 470.
There appear to be about 140 unique organizations that hit the sites (mainly epcostello.net, frisket.org, and artific.com).
18% of the addresses do not reverse resolve.
Of the 387 addresses which did reverse resolve, 41 (11%) reverse resolve to an address which itself does not forward resolve to anything (that is: address 1.2.3.4
reverse resolves to something.example.com
, but something.example.com
itself does not resolve back to 1.2.3.4
.
MSN
has a number of hosts which reverse to a phx.gbl
top level domain (64.4.8.113 through .118). That is to say, 64.4.8.113 reverse maps to by1sch4041904.phx.gbl, not .msn.com or .msn.net.
I sent a note to the poc for the 64.4.8 network but it appeared to disappear into a black hole.
Or they're intentionally reverse mapping to a nonsense domain.
Oddly, there's another 17 hosts which all reverse resolve to msnbot.msn.com
, none of which are on the 64.4.8 network.
That is to say: 17 hosts, across a number of networks and subnets, all reverse resolve to the same hostname, msnbot.msn.com
, this hostname itself does not resolve to anything.
crawler.bloglines.com
does not forward resolve to [65.214.39.151]
, though 65.214.39.151 resolves back to crawler.bloglines.com.
MSN seems to have the largest number of IPs and hostname mismatches or resolution failures.
One of the IBM gateways (I'm guessing in the Southbury, CT data center) reverse resolves [129.33.1.37]
to bi01pt1.ct.us.ibm.com
, which does not in turn resolve back to 129.33.1.37.
Nothing earth shattering here...most web sites turn off name resolution these days, doing it only in post-processing, or on a specific basis within an application.
And no one who is remotely sane turns on HostnameLookups double
in their server configurations.
Where it does come into play is if you are using hostnames in access control lists.
Unlikely on a totally public site, but if you have a protected area, a semi-private extranet, and you add a Allow from .ibm.com
, then anyone who's using that 129.33.1.37 gateway will get bounced, at least from Apache based servers since mod_access
will perform a double lookup (at least according to the documentation).
I suppose there are other situations where you might use the hostname to allow access for search engine spiders, where otherwise you might require some other form of authentication (eg: set up a satisfy any
block, add allow *.google.com, *.msn.com, *.yahoo.com
and then a check for a cookie with a mix of Allow
and SetEnvIf
rules.
I noticed more search hits coming in to this article as well as some more comments and thought I'd post this update. I feel confident that the fake .phx.gbl top level domain name is being used by Microsoft/MSN, though I cannot understand why (if you're going to create PTR records, why not make them which your identifiable domain, since anyone can eventually determine who is assigned the address block).
I created the following table of addresses in the 64.4.8.0/255 network with the corresponding PTR records (reverse DNS records), this is a snapshot as of 0347Z on 8 July 2006.
If anyone from Microsoft network operations reads this, would you mind explaining why you're using a fake top level domain for your search engine robot? (It also appears to be used in some MSN/Hotmail mail headers, that's less of a concern or interest to me)
IP Addresses | Reverse DNS mappings |
---|---|
IP Addresses | Reverse DNS mappings |
[64.4.8.3] | vlan300.bay-6nf-srch-4a.ntwk.msn.net |
[64.4.8.4] | vlan300.bay-6nf-srch-4b.ntwk.msn.net |
[64.4.8.7] | dc1.hotmail.com |
[64.4.8.8] | dc2.hotmail.com |
[64.4.8.14] | oe.hotmail.com |
[64.4.8.110] | by1sch4041901.phx.gbl |
[64.4.8.111] | by1sch4041902.phx.gbl |
[64.4.8.112] | by1sch4041903.phx.gbl |
[64.4.8.113] | by1sch4041904.phx.gbl |
[64.4.8.114] | by1sch4041905.phx.gbl |
[64.4.8.115] | by1sch4041906.phx.gbl |
[64.4.8.116] | by1sch4041907.phx.gbl |
[64.4.8.117] | by1sch4041908.phx.gbl |
[64.4.8.118] | by1sch4041909.phx.gbl |
[64.4.8.119] | by1sch4041910.phx.gbl |
[64.4.8.120] | by1sch4041911.phx.gbl |
[64.4.8.121] | by1sch4041912.phx.gbl |
[64.4.8.122] | by1sch4041913.phx.gbl |
[64.4.8.123] | by1sch4041914.phx.gbl |
[64.4.8.124] | by1sch4041915.phx.gbl |
[64.4.8.125] | by1sch4041916.phx.gbl |
[64.4.8.126] | by1sch4041917.phx.gbl |
[64.4.8.127] | by1sch4041918.phx.gbl |
[64.4.8.128] | by1sch4041919.phx.gbl |
[64.4.8.129] | by1sch4041920.phx.gbl |
[64.4.8.130] | by1sch4040801.phx.gbl |
[64.4.8.131] | by1sch4040802.phx.gbl |
[64.4.8.132] | by1sch4040803.phx.gbl |
[64.4.8.133] | by1sch4040804.phx.gbl |
[64.4.8.134] | by1sch4040805.phx.gbl |
[64.4.8.135] | by1sch4040806.phx.gbl |
[64.4.8.136] | by1sch4040807.phx.gbl |
[64.4.8.137] | by1sch4040808.phx.gbl |
[64.4.8.138] | by1sch4040809.phx.gbl |
[64.4.8.139] | by1sch4040810.phx.gbl |
[64.4.8.140] | by1sch4040811.phx.gbl |
[64.4.8.141] | by1sch4040812.phx.gbl |
[64.4.8.142] | by1sch4040813.phx.gbl |
[64.4.8.143] | by1sch4040814.phx.gbl |
[64.4.8.144] | by1sch4040815.phx.gbl |
[64.4.8.145] | by1sch4040816.phx.gbl |
[64.4.8.146] | by1sch4040817.phx.gbl |
[64.4.8.147] | by1sch4040818.phx.gbl |
[64.4.8.148] | by1sch4040819.phx.gbl |
[64.4.8.149] | by1sch4040820.phx.gbl |
[64.4.8.212] | by1sch40408ms.phx.gbl |
[64.4.8.215] | by1sch40419dg.phx.gbl |
[64.4.8.216] | by1sch40419ms.phx.gbl |
[64.4.8.220] | by1sch40301dg.phx.gbl |
[64.4.8.221] | by1sch40302dg.phx.gbl |
[64.4.8.222] | by1sch40408dg.phx.gbl |
[64.4.8.223] | by1sch40304dg.phx.gbl |
[64.4.8.224] | by1sch40305dg.phx.gbl |
[64.4.8.225] | by1sch40306dg.phx.gbl |
[64.4.8.226] | by1sch40307dg.phx.gbl |
[64.4.8.227] | by1sch40308dg.phx.gbl |
[64.4.8.228] | by1sch40309dg.phx.gbl |
[64.4.8.229] | by1sch40310dg.phx.gbl |
[64.4.8.252] | search.msn-int-tr.com |
crawler.bloglines.com
now forward and reverse resolves to 65.214.44.29
.Posted in Webmastery
Copyright 2002–2011 Artific Consulting LLC.
Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.