DNSSEC Interlude 3: Cache Wars
DJB responds! Not in full — which I’m sure he’ll do eventually, and which I genuinely look forward to — but to my data regarding the increase in authoritative server load that DNSCurve will cause. Here is a link to his post:
What’s going on is as follows: I argue that, since cache hit rates of 80-99% are seen on intermediate caches, that this will by necessity create at least 5x-100x increases in traffic to authoritative servers. This is because traffic that was once serviced out of cache, will now need to transit all the way to the authoritative server. (It’s at least, because it appears each cache miss requires multiple queries to service NS records and the like.)
DJB replies, no, DNSCurve allows there to be a local cache on each machine, so the hit rates above won’t actually expand out to the full authoritative load increase.
Problem is, there’s already massive local caching going on, and 80-99% is still the hitrate at the intermediates! Most DNS lookups come from web browsers, effectively all of which cache DNS records. Your browser does not repeatedly hammer DNS for every image it retrieves! It is in fact these very DNS caches that one needs to work around in the case of DNS Rebinding attacks (which, by the way, still work against HTTP).
But, even if the browsers weren’t caching, the operating system caches extensively as well. Here’s Windows, displaying its DNS Client cache:
$ ipconfig /displaydns
Windows IP Configuration
Record Name . . . . . : linux.die.net
Record Type . . . . . : 5
Time To Live . . . . : 5678
Data Length . . . . . : 8
Section . . . . . . . : Answer
CNAME Record . . . . : sites.die.net
So, sure, DNSCurve lets you run a cache locally. But that local cache gains you nothing against the status quo, because we’re already caching locally.
Look. I’ve been very clear. I think Curve25519 occupies an unserviced niche in our toolbox, and will let us build some very interesting network protocols. But in the case of DNS, using it as our underlying cryptographic mechanism means we lose the capability to assert the trustworthiness of data to anyone else. That precludes cross-user caching — at least, precludes it if we want to maintain end to end trust, which I’m just not willing to give up.
I can’t imagine DJB considers it optional either.
It’s probably worth taking a moment to discuss our sources of data. My caching data comes from multiple sources — the CCC network, XS4All, a major North American ISP, one of the largest providers to major ISPs (each of the five 93% records was backing 3-4M users!), and one of the biggest single resolver operators in the world.
DJB’s data comes from a lab experiment.
Now, don’t get me wrong. I’ve done lots of experiments myself, and I’ve made serious conclusions on them. (I’ve also been wrong. Like, recently.) But, I took a look at the experimental notes that DJB (to his absolute credit!) posted.
So, the 1.15x increase was relative to 140 stub resolvers. Not 140M. 140.
And he didn’t even move all 140 stubs to 140 caches. No, he moved them from 1 cache to 10 caches.
Finally, the experiment was run for all of 15 minutes.
[QUICK EDIT: Also, unclear if the stub caches themselves were cleared. Probably not, since anything that work intensive is always bragged about.]
I’m not saying there’s experimental bias here, though the document is a pretty strong position piece for running a local resolver. (The single best piece of data in the doc: There are situations where the cache is 300ms away while the authoritative is 10ms away. I did not know this.)
But I think it’s safe to say DJB’s 1.15x load increase claim is thoroughly debunked.