DNSSEC Interlude 2: DJB@CCC
Short Version: Dan Bernstein delivered a talk at the 27C3 about DNSSEC and his vision for authenticating and encrypting the net. While it is gratifying to see such consensus regarding both the need to fix authentication and encryption, and the usefulness of DNS to implement such a fix, much of his representation of DNSSEC — and his own replacement, DNSCurve — was plainly inaccurate. He attacks a straw man implementation of DNSSEC that must sign records offline, despite all major DNSSEC servers moving to deep automation to eliminate administrator errors, and despite the existence of Phreebird, my online DNSSEC signing proxy specifically designed to avoid the faults he identifies.
DJB complains about NSEC3’s impact on the privacy of domain names, despite the notably weak privacy guarantees on public DNS names themselves, and more importantly, the code in Phreebird that dynamically generates NSEC3 records thus completely defeating GPU hashcracking. He complains about DNSSEC as a DDoS bandwidth amplifier, while failing to mention that the amplification issues are inherited from DNS. I observe his own site, cr.yp.to, to be a 6.4x bandwidth amplifier, and the worldwide network of open recursive servers to be infinitely more exploitable even without DNSSEC. DJB appeared unaware that DNSSEC could be leveraged to offer end to end semantics, or that constructions existed to use secure offline records to authenticate protocols like HTTPS. I discuss implementations of both in Phreebird.
From here, I analyze Curve25519, and find it a remarkably interesting technology with advantages in size and security. His claims regarding instantaneous operation are a bit of an exaggeration though; initial benchmarking puts Curve25519 at about 4-8x the speed of RSA1024. I discuss the impact of DNSCurve on authoritative servers, which DJB claims to be a mere 1.15x increase in traffic. I present data from a wide variety of sources, including the 27C3 network, demonstrating that double and even triple digit traffic increases are in fact likely, particularly to TLDs. I also observe that DNSCurve has unavoidable effects on query latency, server CPU and memory, and key risk management. The argument that DNSSEC doesn’t sign enough, because a signature on .org doesn’t necessary sign all of wikipedia.org, is shown to be specious, in that any delegated namespace with unsigned children (including in particular DNSCurve) must have this characteristic.
I move on to discussing end to end protocols. CurveCP is seen as interesting and highly useful, particularly if it integrates a lossy mode. However, DJB states that CurveCP will succeed where HTTPS has failed because CurveCP’s cryptographic primitive (Curve25519) is faster. Google is cited as an organization that has not been able to deploy HTTPS because the protocol is too slow. Actual source material from Google is cited, directly refuting DJB’s assertion. It is speculated that the likely cause of Google’s sudden deployment of HTTPS on GMail was an attack by a foreign power, given that the change was deployed 24 hours after disclosure of the attack. Other causes for HTTPS’s relative rarity are cited, including the large set of servers that need to be simultaneously converted, the continuing inability to use HTTPS for virtual hosted sites, and the need for interaction with third party CA’s.
We then proceed to explore DJB’s model for key management. Although DJB has a complex system in DNSCurve for delegated key management, he finds himself unable to trust either the root or TLDs. Without these trusted third parties, his proposal devolves to the use of “Nym” URLs like http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com to bootstrap key acquisition for Twitter. He suggests that perhaps we can continue to use at least the TLDs to determine IP addresses, as long as we integrate with an as-yet unknown P2P DNS system as well.
I observe this is essentially a walk of Zooko’s Triangle, and does not represent an effective or credible solution to what we’ve learned is the hardest problem at the intersection of security and cryptography: Key Management. I conclude by pointing out that DNSSEC does indeed contain a coherent, viable approach to key management across organizational boundaries, while this talk — alas — does not.
So, there was a talk at the 27th Chaos Communication Congress about DNSSEC after all! Turns out Dan Bernstein, better known as DJB, is not a fan.
That’s tragic, because I’m fairly convinced he could do some fairly epic things with the technology.
Before I discuss the talk, there are three things I’d like to point out. First, you should see the talk! It’s actually a pretty good summary of a lot of latent assumptions that have been swirling around DNSSEC for years — assumptions, by the way, that have been held as much by defenders as detractors. Here are links:
Second, I have a tremendous amount of respect for Dan Bernstein. It was his fix to the DNS that I spent six months pushing, to the exclusion of a bunch of other fixes which (to be gentle) weren’t going to work. DJB is an excellent cryptographer.
And, man, I’ve been waiting my entire career for Curve25519 to come out.
But security is bigger than cryptography.
The third thing I’d like to point out is that, with DJB’s talk, a somewhat surprising consensus has taken hold between myself, IETF WG’s, and DJB. Essentially, we all agree that:
1) Authentication and encryption on the Internet is broken,
2) Fixing both would be a Big Deal,
3) DNS is how we’re going to pull this off.
Look, we might disagree about the finer details, but that’s a fairly incredible amount of consensus across the engineering spectrum: There’s a problem. We have to fix it. We know what we’re going to use to fix it.
That all identified, lets discuss the facts.
Well, this document got a bit larger than expected (understatement), so here’s a list of section headings.
DNSSEC’s Problem With Key Rotation Has Been Automated Away
DNSSEC Is Not Necessarily An Offline Signer — In Fact, It Works Better Online!
DNS Leaks Names Even Without NSEC3 Hashes
NSEC3 “White Lies” Entirely Eliminate The NSEC3 Leaking Problem
DNSSEC Amplification is not a DNSSEC bug, but an already existing DNS, UDP, and IP Bug
DNSSEC Does In Fact Offer End To End Resolver Validation — Today
DNSSEC Bootstraps Key Material For Protocols That Desperately Need It — Today
Curve25519 Is Actually Pretty Cool
Limitations of Curve25519
DNSCurve Destroys The Caching Layer. This Matters.
DNSCurve requires the TLDs to use online signing
DNSCurve increases query latency
DNSCurve Also Can’t Sign For Its Delegations
What About CurveCP?
HTTPS Has 99 Problems But Speed Ain’t One
There Is No “On Switch” For HTTPS
HTTPS Certificate Management Is Still A Problem!
The Biggest Problem: Zooko’s Triangle
The Bottom Line: It Really Is All About Key Management
From slide 48 to slide 53, DJB discusses what would appear to be a core limitation of DNSSEC: Its use of offline signing. According to the traditional model for DNSSEC, keys are kept in a vault offline, only pulled out during those rare moments where a zone must change or be resigned. He observes a host of negative side effects, including an inability to manage dynamic domains, increased memory load from larger zones, and difficulties manually rotating signatures and keys.
But these are not limitations to DNSSEC as a protocol. They’re implementation artifacts, no more inherent to DNSSEC than publicfile‘s inability to support PHP. (Web servers were not originally designed to support dynamic content, the occasional cgi-bin notwithstanding. So, we wrote better web servers!) Key rotation is the source of some enormous portion of DNSSEC’s historical deployment complexity, and as such pretty much every implementation has automated it — even at the cost of having the “key in a vault”. See production-ready systems by:
So, on a purely factual basis, the implication that DNSSEC creates a “administrative disaster” under conditions of frequent resigning is false. Everybody’s automating. Everybody. But his point that offline signing is problematic for datasets that change with any degree of frequency is in fact quite accurate.
But who says DNSSEC needs to be an offline signer, signing all its records in advance?
Online signing has long been an “ugly duckling” of DNSSEC. In online signing, requests are signed “on demand” — the keys are pulled out right when the response is generated, and the appropriate imprimatur is applied. While this seems scary, keep in mind this is how SSL, SSH, IPSec, and most other crypto protocols function. DJB’s own DNSCurve is an online signer.
PGP/GPG are not online signers. They have dramatic scalability problems, as nobody says aloud but we all know. (To the extent they will be made scalable, they will most likely be retrieving and refreshing key material via DNSSEC.)
Online signing has history with DNSSEC. RFC4470 and RFC4471 discuss the precise ways and means online signing can integrate with the original DNSSEC. Bert Hubert’s been working on PowerDNSSEC for some time, which directly integrates online signing into very large scale hosting.
And then there’s Phreebird. Phreebird is my DNSSEC proxy; I’ve been working on it for some time. Phreebird is an online signer that operates, “bump in the wire” style, in front of existing DNS infrastructure. (This should seem familiar; it’s the deployment model suggested by DJB for DNSCurve.) Got a dynamic signer, that changes up responses according to time of day, load, geolocation, or the color of its mood ring? I’ve been handling that for months. Worried about unpredictable requests? Phreebird isn’t, it signs whatever responses go by. If new responses are sent, new signatures will be generated. Concerned about preloading large zones? Don’t be, Phreebird’s cache can be as big or as little as you want it to be. It preloads nothing.
As for key and signature rotation, Phreebird can be trivially modified to sign everything with a 5 minute signature — in case you’re particularly concerned with replay. Online signing really does make things easier.
Phreebird can also deal with NSEC3’s “leaking”. Actually, it does so by default.
In the beginning, DNSSEC’s developers realized they needed a feature called Authoritative Nonexistence — a way of saying, “The record you are looking for does not exist, and I can prove it”. This posed a problem. While there are a usually a finite number of names that do exist, the number of non-existent names is effectively infinite. Unique proofs of nonexistence couldn’t be generated for all of them, but the designers really wanted to prevent requiring online signing. (Just because we can sign online, doesn’t mean we should have to sign online.) So they said they’d sign ranges — say there were no names between Alice and Bob, and Bob and Charlie, and Charlie and David.
This wasn’t exactly appreciated by Alice, Bob, Charlie, and David — or, at least, the registries that held their domains. So, instead, Alice, Bob, Charlie, and David’s names were all hashed — and then the statement became, there are no names between H1 (the hash of Charlie) and H2 (the hash of Alice). That, in a nutshell, is NSEC3.
DJB observes, correctly, that there’s only so much value one gets from hashing names. He quotes Ruben Niederhagen at being able to crack through 1.7 Trillion names a day, for example.
This is imposing, until one realizes that at 1000 queries a second, an attacker can sweep a TLD through about 864,000,000 queries a day. Granted: This is not as fast, but it’s much faster than your stock SSHD brute force, and we see substantial evidence of those all the time.
Consider: Back in 2009, DJB used hashcracking to answer Frederico Neves’ challenge to find domains inside of sec3.br. You can read about this here. DJB stepped up to the challenge, and the domains he (and Tanja Lange) found were:
douglas, pegasus, rafael, security, unbound, while42, zz–zz
It does not take 1.7T hashcracks to find these names. It doesn’t even take 864M. Domain names are many things; compliant with password policies is not one of them. Ultimately, the problem with assuming privacy in DNS names is that they’re being published openly to the Internet. What we have here is not a bright line differential, but a matter of degree.
If you want your names completely secret, you have to put them behind the split horizon of a corporate firewall — as many people do. Most CorpNet DNS is private.
However, suppose you want the best of both worlds: Secret names, that are globally valid (at the accepted cost of online brute-forceability). Phreebird has you taken care of — by generating what can be referred to as NSEC3 White Lies. H1 — the hash of Charlie — is a number. It’s perfectly valid to say:
There are no records with a hash between H1-1 and H1+1.
Since there’s only one number between x-1 and x+1 — X, or in this case, the hash of Charlie — authoritative nonexistence is validated, without leaking the actual hash after H1.
Probably the most quoted comment of DJB was the following:
“So what does this mean for distributed denial of service amplification, which is the main function of DNSSEC”
Generally, what he’s referring to is that an attacker can:
- Spoof the source of a DNSSEC query as some victim target (32 byte packet + 8 byte UDP header + 20 byte IP header = 60 byte header, raised to 64 byte due to Minimum Transmission Unit)
- Make a request that returns a lot of data (say, 2K, creating a 32x amplification)
- GOTO 1
It’s not exactly a complicated attack. However, it’s not a new attack either. Here’s a SANS entry from 2009, discussing attacks going back to 2006. Multi gigabit floods bounced off of DNS servers aren’t some terrifying thing from the future; they’re just part of the headache that is actively managing an Internet with ***holes in it.
There is an interesting game, one that I’ll definitely be writing about later, called “Whose bug is it anyway?”. It’s easy to blame DNSSEC, but there’s a lot of context to consider.
Ultimately, the bug is IP’s, since IP (unlike many other protocols) allows long distance transit of data without a server explicitly agreeing to receive it. IP effectively trusts its applications to “play nice” — shockingly, this was designed in the 80’s. UDP inherits the flaw, since it’s just a thin application wrapper around IP.
But the actual fault lies in DNS itself, as the amplification actually begins in earnest under the realm of simple unencrypted DNS queries. During the last batch of gigabit floods, the root servers were used — their returned values were 330 bytes out for every 64 byte packet in. Consider though the following completely randomly chosen query:
# dig +trace cr.yp.to any
cr.yp.to. 600 IN MX 0 a.mx.cr.yp.to.
cr.yp.to. 600 IN MX 10 b.mx.cr.yp.to.
cr.yp.to. 600 IN A 184.108.40.206
yp.to. 259200 IN NS a.ns.yp.to.
yp.to. 259200 IN NS uz5uu2c7j228ujjccp3ustnfmr4pgcg5ylvt16kmd0qzw7bbjgd5xq.ns.yp.to.
yp.to. 259200 IN NS b.ns.yp.to.
yp.to. 259200 IN NS f.ns.yp.to.
yp.to. 259200 IN NS uz5ftd8vckduy37du64bptk56gb8fg91mm33746r7hfwms2b58zrbv.ns.yp.to.
;; Received 414 bytes from 220.127.116.11#53(f.ns.yp.to) in 32 ms
So, the main function of Dan Bernstein’s website is to provide a 6.4x multiple to all DDoS attacks, I suppose?
I keed, I keed. Actually, the important takeaway is that practically every authoritative server on the Internet provides a not-insubstantial amount of amplification. Taking the top 1000 QuantCast names (minus the .gov stuff, just trust me, they’re their own universe of uber-weird; I wouldn’t judge X.509 on the Federal Bridge CA), we see:
- An average ANY query returns 413.9 bytes (seriously!)
- Almost half (460) return 413 bytes or more
So, the question then is not “what is the absolute amplification factor caused by DNSSEC”, it’s “What is the amplification factor caused by DNSSEC relative to DNS?”
It’s ain’t 90x, I can tell you that much. Here is a query for http://www.pir.org ANY, without DNSSEC:
http://www.pir.org. 300 IN A 18.104.22.168
pir.org. 300 IN NS ns1.sea1.afilias-nst.info.
pir.org. 300 IN NS ns1.mia1.afilias-nst.info.
pir.org. 300 IN NS ns1.ams1.afilias-nst.info.
pir.org. 300 IN NS ns1.yyz1.afilias-nst.info.
;; Received 329 bytes from 22.214.171.124#53(ns1.sea1.afilias-nst.info) in 90 ms
And here is the same query, with DNSSEC:
http://www.pir.org. 300 IN A 126.96.36.199
http://www.pir.org. 300 IN RRSIG A 5 3 300 20110118085021 20110104085021 61847 pir.org. n5cv0V0GeWDPfrz4K/CzH9uzMGoPnzEr7MuxPuLUxwrek+922xiS3BJG NfcM9nlbM5GZ5+UPGv668NJ1dx6oKxH8SlR+x3d8gvw2DHdA51Ke3Rjn z+P595ZPB67D9Gh6l61itZOJexwsVNX4CYt6CXTSOhX/1nKzU80PVjiM wg0=
pir.org. 300 IN NS ns1.mia1.afilias-nst.info.
pir.org. 300 IN NS ns1.yyz1.afilias-nst.info.
pir.org. 300 IN NS ns1.ams1.afilias-nst.info.
pir.org. 300 IN NS ns1.sea1.afilias-nst.info.
pir.org. 300 IN RRSIG NS 5 2 300 20110118085021 20110104085021 61847 pir.org. IIn3FUnmotgv6ygxBM8R3IsVv4jShN71j6DLEGxWJzVWQ6xbs5SIS0oL OA1ym3aQ4Y7wWZZIXpFK+/Z+Jnd8OXFsFyLo1yacjTylD94/54h11Irb fydAyESbEqxUBzKILMOhvoAtTJy1gi8ZGezMp1+M4L+RvqfGze+XFAHN N/U=
;; Received 674 bytes from 188.8.131.52#53(ns1.yyz1.afilias-nst.info) in 26 ms
About a 2x increase. Not perfect, but not world ending. Importantly, it’s nowhere close to the actual problem:
DNS comes from 1983, when the load of running around the Internet mapping names to numbers was actually fairly high. As such, it acquired a caching layer — a horde of BIND, Nominum, Unbound, MSDNS, PowerDNS, and other servers that acted as a middle layer between the masses of clients and the authoritative servers of the Internet.
At any given point, there’s between three and twelve million IP addresses on the Internet that operate as caching resolvers, and will receive requests from and send arbitrary records to any IP on the Internet.
Arbitrary attacker controlled records.
;; Query time: 5 msec
;; SERVER: 184.108.40.206#53(220.127.116.11)
;; WHEN: Tue Jan 4 11:10:59 2011
;; MSG SIZE rcvd: 3641
That’s a 3.6KB response to a 64 byte request, no DNSSEC required. I’ve been saying this for a while: DNSSEC is just DNS with signatures. Whose bug is it anyway? Well, at least some of those servers are running DJB’s dnscache…
So, what do we do about this? One option is to attempt further engineering: There are interesting tricks that can be run with ICMP to detect the flooding of an unwilling target. We could also have a RBL — realtime blackhole list — of IP addresses that are under attack. This would get around the fact that this attack is trivially distributable.
Another approach is to require connections that appear “sketchy” to upgrade to TCP. There’s support in DNS for this — the TC bit — and it’s been deployed with moderate success by at least one major DNS vendor. There’s some open questions regarding the performance of TCP in DNS, but there’s no question that kernels nowadays are at least capable of being much faster.
It’s an open question what to do here.
Meanwhile, the attackers chuckle. As DJB himself points out, they’ve got access to 2**23 machines — and these aren’t systems that are limited to speaking random obscure dialects of DNS that can be blocked anywhere on path with a simple pattern matching filter. These are actual desktops, with full TCP stacks, that can join IRC channels and be told to flood the financial site of the hour, right down to the URL!
If you’re curious why we haven’t seen more DNS floods, it might just be because HTTP floods work a heck of a lot better.
On Slide 36, DJB claims the following is possible:
Bob views Alice’s web page on his Android phone. Phone asked hotel DNS cache for web server’s address. Eve forged the DNS response! DNS cache checked DNSSEC but the phone didn’t.
This is true as per the old model of DNSSEC, which inherits a little too much from DNS. As per the old model, the only nodes that participate in record validation are full-on DNS servers. Clients, if they happen to be curious whether a name was securely resolved, have to simply trust the “AD” bit attached to a response.
I think I can speak for the entire security community when I say: Aw, hell no.
The correct model of DNSSEC is to push enough of the key material to the client, that it can make its own decisions. (If a client has enough power to run TLS, it has enough power to validate a DNSSEC chain.) In my Domain Key Infrastructure talk, I discuss four ways this can be done. But more importantly, in Phreebird I actually released mechanisms for two of them — chasing, where you start at the bottom of a name and work your way up, and tracing, where you start at the root and work your way down.
Chasing works quite well, and importantly, leverages the local cache. Here is Phreebird’s output from a basic chase command:
|---www.pir.org. (A) |---pir.org. (DNSKEY keytag: 61847 alg: 5 flags: 256) |---pir.org. (DNSKEY keytag: 54135 alg: 5 flags: 257) |---pir.org. (DS keytag: 54135 digest type: 2) | |---org. (DNSKEY keytag: 1743 alg: 7 flags: 256) | |---org. (DNSKEY keytag: 21366 alg: 7 flags: 257) | |---org. (DS keytag: 21366 digest type: 2) | | |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) | | |---. (DNSKEY keytag: 19036 alg: 8 flags: 257) | |---org. (DS keytag: 21366 digest type: 1) | |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) | |---. (DNSKEY keytag: 19036 alg: 8 flags: 257) |---pir.org. (DS keytag: 54135 digest type: 1) |---org. (DNSKEY keytag: 1743 alg: 7 flags: 256) |---org. (DNSKEY keytag: 21366 alg: 7 flags: 257) |---org. (DS keytag: 21366 digest type: 2) | |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) | |---. (DNSKEY keytag: 19036 alg: 8 flags: 257) |---org. (DS keytag: 21366 digest type: 1) |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) |---. (DNSKEY keytag: 19036 alg: 8 flags: 257)
Chasing isn’t perfect — one of the things Paul Vixie and I have been talking about is what I refer to as SuperChase, encoded by setting both CD=1 (Checking Disabled) and RD=1 (Recursion Desired). Effectively, there are a decent number of records between http://www.pir.org and the root. With the advent of sites like OpenDNS and Google DNS, that might represent a decent number of round trips. As a performance optimization, it would be good to eliminate those round trips, by allowing a client to say “please fill your response with as many packets as possible, so I can minimize the number of requests I need to make”.
But the idea that anybody is going to use a DNSSEC client stack that doesn’t provide end to end semantics is unimaginable. After all, we’re going to be bootstrapping key material with this.
On page 44, DJB claims that because DNSSEC uses offline signing, the only way it could be used to secure web pages is if those pages were signed with PGP.
What? There’s something like three independent efforts to use DNSSEC to authenticate HTTPS sessions.
Leaving alone the fact that DNSSEC isn’t necessarily an offline signer, it is one of the core constructions in cryptography to intermix an authenticator with otherwise-anonymous encryption to defeat an otherwise trivial man in the middle attack. That authenticator can be almost anything — a password, a private key, even a stored secret from a previous interaction. But this is a trivial, basic construction. It’s how EDH (Ephemeral Diffie-Helman) works!
Using keys stored in DNS has been attempted for years. Some mechanisms that come to mind:
- SSHFP — SSH Fingerprints in DNS
- CERT — Certificates in DNS
- DKIM — Domain Keys in DNS
All of these have come under withering criticism from the security community, because how can you possibly trust what you get back from these DNS lookups?
With DNSSEC — even with offline-signed DNSSEC — you can. And in fact, that’s been the constant refrain: “We’ll do this now, and eventually DNSSEC will make it safe.” Intermixing offline signers with online signers is perfectly “legal” — it’s isomorphic to receiving a password in a PGP encrypted email, or sending your SSH public key to somebody via PGP.
So, what I’m shipping today is simple:
http://www.hospital-link.org IN TXT “v=key1 ha=sha1 h=f1d2d2f924e986ac86fdf7b36c94bcdf32beec15″
That’s an offline-signable blob, and it says “If you use HTTPS to connect to http://www.hospital-link.org, you will be given a certificate with the SHA1 hash of f1d2d2f924e986ac86fdf7b36c94bcdf32beec15″. As long as the ground truth in this blob can be chained to the DNS root, by the client and not some random name server in a coffee shop, all is well (to the extent SHA-1 is safe, anyway).
This is not an obscure process. This is a basic construction. Sure, there are questions to be answered, with a number of us fighting over the precise schema. And that’s OK! Let the best code win.
So why isn’t the best code DNSCurve?
DNSCurve is based on something totally awesome: Curve25519. I am not exaggerating when I say, this is something I’ve wanted from the first time I showed up in Singapore for Black Hat Asia, 9 years ago (you can see vague references to it in the man page for lc). Curve25519 essentially lets you do this:
If I have a 32 byte key, and you have a 32 byte key, and we both know eachother’s key, we can mix them to create a 32 byte secret.
32 bytes is very small.
What’s more, Curve25519 is fast(er). Here’s comparative benchmarks from a couple of systems:
SYSTEM 1 (Intel Laptop, Cygwin, curve25519-20050915 from DJB):
RSA1024 OpenSSL sign/s: 520.2
RSA1024 OpenSSL verify/s: 10874.0
Curve25519 operations/s: 4131
SYSTEM 2 (Amazon Small VM, curve25519-20050915):
RSA1024 OpenSSL sign/s: 502.6
RSA1024 OpenSSL verify/s: 11689.8
Curve25519 operations/s: 507.25
SYSTEM 3 (Amazon XLarge VM, using AGL’s code here for 64 bit compliance):
RSA1024 OpenSSL sign/s: 1048.4
RSA1024 OpenSSL verify/s: 19695.4
Curve25519 operations/s: 4922.71
While these numbers are a little lower than I remember them — for some reason, I remember 16K/sec — they’re in line with DJB’s comments here: 500M clients a day is about 5787 new sessions a second. So generally, Curve25519 is about 4 to 8 times faster than RSA1024.
It’s worth noting that, while 4x-8x is significant today, it won’t be within a year or two. That’s because by then we’ll have RSA acceleration via GPU. Consider this tech report — with 128 cores, they were able to achieve over 5000 RSA1024 decryption operations per second. Modern NVIDIA cards have over 512 cores, with across the board memory and frequency increases. That would put RSA speed quite a bit beyond software Curve25519. Board cost? $500.
That being said, Curve25519 is quite secure. Even with me bumping RSA up to 1280bit in Phreebird by default, I don’t know if I reach the putative level of security in this ECC variant.
The most exciting aspect of Curve25519 is its ability to, with relatively little protocol overhead, create secure links between two peers. The biggest problem with Curve25519 is that it seems to get its performance by sacrificing the capacity to sign once and distribute a message to many parties. This is something we’ve been able to do with pretty much every asymmetric primitive thus far — RSA, DH (via DSA), ECC (via ECDSA), etc. I don’t know whether DJB’s intent is to enforce a particular use pattern, or if this is an actual technical limitation. Either way, Alice can’t sign a message and hand it to Bob, and have Bob prove to Charlie that he received that message from Alice.
In DNSCurve, all requests are unique — they’re basically cryptographic blobs encapsulated in DNS names, sent directly to target name servers or tunneled through local servers, where they are uncacheable due to their uniqueness. (Much to my amusement and appreciation, the DNSCurve guys realized the same thing I did — gotta tunnel through TXT if you want to survive.) So one hundred thousand different users at the same ISP, all looking up the same http://www.cnn.com address, end up issuing unique requests that make their way all the way to CNN’s authoritative servers.
Under the status quo, and under DNSSEC, all those requests would never leave the ISP, and would simply be serviced locally.
It was ultimately this characteristic of DNSCurve, above all others, that caused me to reject the protocol in favor of DNSSEC. It was my estimation that this would cause something like a 100x increase in load on authoritative name servers.
DJB claims to have measured the effect of disabling caching at the ISP layer, and says the increase is little more than 15%, or 1.15x. He declares my speculation, “wild”. OK, that’s fine, I like being challenged! Lets take a closer look.
All caches have a characteristic known as the hit rate. For every query that comes in, what’s the chances that it will require a lookup to a remote authoritative server, vs. being servicable from the cache? A hit rate of 50% would imply a 2x increase in load. 75% would imply 4x. 87.5%? 8x.
Inverting the math, for the load differential to be just 15%, that would mean only about 6% of queries were being hosted from local cache. Do we, in fact, see a 6% hitrate? Lets see what actual operations people say about their name servers:
In percentages is dat 80 tot 85% hits…Daar kom je ook iets boven de 80%.
–XS4ALL, a major Dutch ISP (80-85% numbers)
I’ve attached a short report from a couple of POP’s at a mid-sized (3-4 M subs) ISP. It’s just showing a couple of hours from earlier this fall (I kind of chose it at random).
It’s extremely consistent across a number of different sites that I checked (not the 93%, but 85-90% cache hits): 0.93028, 0.92592, 0.93094, 0.93061, 0.92741
–Major DNS supplier
I don’t have precise cache hit rates for you, but I can give you this little fun tidbit. If you have a 2-tier cache, and the first tier is tiny (only a couple hundred entries), then you’ll handle close to 50% of the
queries with the first cache…
Actually, those #s are old. We’re now running a few K first cache, and have a 70%+ hit rate there.
–Major North American ISP
So, essentially, unfiltered consensus hitrate is about 80-90%, which puts the load increase at 5x-10x in general. Quite a bit less than the 100x I was worried about, right? Well, lets look at one last dataset:
Dec 27 03:30:25 solaria pdns_recursor: stats: 11133 packet cache entries, 99% packet cache hits
Dec 27 04:00:26 solaria pdns_recursor: stats: 17884 packet cache entries, 99% packet cache hits
Dec 27 04:30:28 solaria pdns_recursor: stats: 22522 packet cache entries, 99% packet cache hits
–Network at 27th Chaos Communication Congress
Well, there’s our 100x numbers. At least (possibly more than) 99% of requests to 27C3’s most popular name server were hosted out of cache, rather than spawning an authoritative lookup.
Of course, it’s interesting to know what’s going on. This quote comes from an enormous supplier of DNS resolutions.
Facebook loads resources from dynamically (seemingly) created subdomains. I’d guess for alexa top 10,000 hit rate is 95% or more. But for other popular domains like Facebook, absent a very large cache, hit rate will be exceptionally low. And if you don’t descriminate cache policy by zone, RBLs will eat your entire cache, moving hit rate very very low.
–Absolutely Enormous Resolution Supplier
Here’s where it becomes clear that there are domains that choose to evade caching — they’ve intentionally engineered their systems to emit low TTLs and/or randomized names, so that requestors can get the most up to date records. That’s fine, but those very nodes are directly impacting the hitrate — meaning your average authoritative is really being saved even more load by the existence of the DNS caching layer.
It gets worse: As Paul Wouters points out, there is not a 1-to-1 relationship between cache misses and further queries. He writes:
You’re forgetting the chain reaction. If there is a cache miss, the server has to do much more then 1 query. It has to lookup NS records, A records, perhaps from parents (perhaps cached) etc. One cache miss does not equal one missed cache hit!
To the infrastructure, every local endpoint cache hit means saving a handful of lookups.
That 99% cache hit rate looks even worse now. Looking at the data, 100x or even 200x doesn’t seem quite so wild, at least compared to the fairly obviously incorrect estimation of 1.15x.
Actually, when you get down to it, the greatest recipients of traffic boost are going to be nodes with a large number of popular domains that are on high TTLs, because those are precisely what:
a) Get resolved by multiple parties, and
b) Stay in cache, suppression further resolution
Yeah, so that looks an awful lot like records hosted at the TLDs. Somehow I don’t think they’re going to deploy DNSCurve anytime soon.
(And for the record, I’m working on figuring out the exact impact of DNSCurve on each of the TLDs. After all, data beats speculation.)
[UPDATE: DJB challenges this section, citing local caching effects. More info here.]
DNSSEC lets you choose: Online signing, with its ease of use and extreme flexibility. Or offline signing, with the ability to keep keying material in very safe places.
A while back, I had the pleasure of being educated in the finer points of DNS key risk management, by Robert Seastrom of Afilias (they run .org). Lets just say there are places you want to be able to distribute DNS traffic, without actually having “the keys to the kingdom” physically deployed. As Paul Wouters observed:
Do you know how many instances of rootservers there are? Do you really want that key to live in 250+ places at once? Hell no!
DNSSEC lets you choose how widely to distribute your key material. DNSCurve does not.
Yes, Curve25519 is fast. But nothing in computers is “instantaneous”, as DJB repeatedly insisted Curve25519 is at 27C3. A node that’s doing 5500 Curve25519 operations a second is likely capable of 50,000 to 100,000 DNS queries per second. We’re basically eating 5 to 10 qps per Curve25519 op — which makes sense, really, since no asymmetric crypto is ever going to be as easy as burping something out on the wire.
DJB gets around this by presuming large scale caching. But while DNSSEC can cache by record — with cache effectiveness at around 70% for just a few thousand entries, according to the field data — DNSCurve has to cache by session. Each peer needs to retain a key, and the key must be looked up.
Random lookups into a 500M-record database (as DJB cites for .com), on the required millisecond scale, aren’t actually all that easy. Not impossible, of course, but messy.
This is unavoidable. Caches allow entire trust chains to be aggregated on network links near users. Even though DNSSEC hasn’t yet built out a mechanism for a client to retrieve that chain in a single query, there’s nothing stopping us on a protocol level from really leveraging local caches.
DNSCurve can never use these local caches. Every query must round-trip between the host and each foreign server in the resolution chain — at least, if end to end trust is to be maintained.
(In the interest of fairness, there are modes of validating DNSSEC on endpoints that bypass local caches entirely and go straight to the root. Unbound, or libunbound as used in Phreebird, will do this by default. It’s important that the DNSSEC community be careful about this, or we’ll have the same fault I’m pointing out in DNSCurve.)
DNSCurve Also Can’t Sign For Its Delegations
If there’s one truly strange argument in DJB’s presentation, it’s on Page 39. Here, he complains that in DNSSEC, .org being signed doesn’t sign all the records underneath .org, like wikipedia.org.
Huh? Wikipedia.org is a delegation, an unsigned one at that. That means the only information that .org can sign is either:
1) Information regarding the next key to be used to sign wikipedia.org records, along with the next name server to talk to.
2) Proof there is no next key, and Wikipedia’s records are unsigned
That’s it. Those are the only choices, because Wikipedia’s IP addresses are not actually hosted by the .org server. DNSCurve either implements the exact same thing, or it’s totally undeployable, because without support for unsigned delegations 100% of your children must be signed in order for you to be signed. I can’t imagine DNSCurve has that limitation.
I don’t hate CurveCP. In fact, if there isn’t a CurveCP implementation out within a month, I’ll probably write one myself. If I’ve got one complaint about what I’ve heard about it, it’s that it doesn’t have a lossy mode (trickier than you think — replay protection etc).
CurveCP is seemingly an IPsec clone, over which a TCP clone runs. That’s not actually accurate. See, our inability to do small and fast asymmetric cryptography has forced us to do all these strange things to IP, making stateful connections even when we just wanted to fire and forget a packet. With CurveCP, if you know who you’re sending to, you can just fire and forget packets — the overhead, both in terms of effect on CPU and bandwidth, is actually quite affordable.
This is what I wanted for linkcat almost a decade ago! There are some beautiful networking protocols that could be made with Curve25519. CurveCP is but one. The fact that CurveCP would run entirely in userspace is a bonus — controversial as this may be, the data suggests kernel bound APIs (like IPsec) have a lot more trouble in the field than usermode APIs (like TLS/HTTPS).
There is a weird thing with CurveCP, though. DJB is proposing tunneling it over 53/udp. He’s not saying to route it through DNS proper, i.e. shunting all over CurveCP into the weird formats you have to use when proxying off a name server. He just wants data to move over 53/udp, because he thinks it gets through firewalls easier. Now, I haven’t checked into this in at least six years, but back when I was all into tunneling data over DNS, I actually had to tunnel data over DNS, i.e. fully emulate the protocol. Application Layer Gateways for DNS are actually quite common, as they’re a required component of any captive portal. I haven’t seen recent data, but I’d bet a fair amount that random noise over 53/udp will actually be blocked on more networks than random noise over some higher UDP port.
This is not an inherent aspect of CurveCP, though, just an implementation detail.
Unfortunately, my appreciation of CurveCP has to be tempered by the fact that it does not, in fact, solve our problems with HTTPS. DJB seems thoroughly convinced that the reason we don’t have widespread HTTPS is because of performance issues. He goes so far as to cite Google, which displays less imagery to the user if they’re using https://encrypted.google.com.
Source Material is a beautiful thing. According to Adam Langley, who’s actually the TLS engineer at Google:
If there’s one point that we want to communicate to the world, it’s that SSL/TLS is not computationally expensive any more. Ten years ago it might have been true, but it’s just not the case any more. You too can afford to enable HTTPS for your users.
In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. Many people believe that SSL takes a lot of CPU time and we hope the above numbers (public for the first time) will help to dispel that.
If you stop reading now you only need to remember one thing: SSL/TLS is not computationally expensive any more.
Given that Adam is actually the engineer who wrote the generic C implementation of Curve25519, you’d think DJB would listen to his unambiguous, informed guidance. But no, talk after talk, con after con, DJB repeats the assertion that performance is why HTTPS is poorly deployed — and thus, we should throw out everything and move to his faster crypto function.
(He also suggests that by abandoning HTTPS and moving to CurveCP, we will somehow avoid the metaphorical attacker who has the ability to inject one packet, but not many, and doesn’t have the ability to censor traffic. That’s a mighty fine hair to split. For a talk that is rather obsessed with the goofiness of thinking partial defenses are meaningful, this is certainly a creative new security boundary. Also, non-existent.)
I mentioned earlier: Security is larger than cryptography. If you’ll excuse the tangent, it’s important to talk about what’s actually going on.
The average major website is not one website — it is an agglomeration, barely held together with links, scripts, images, ads, APIs, pages, duct tape, and glue.
For HTTPS to work, everything (except, as is occasionally argued, images) must come from a secure source. It all has to be secure, at the exact same time, or HTTPS fails loudly, and appropriately.
At this point, anyone who’s ever worked at a large company understands exactly, to a T, why HTTPS deployment has been so difficult. Coordination across multiple units is tricky even when there’s money to be made. When there isn’t money — when there’s merely defense against customer data being lost — it’s a lot harder.
The following was written in 2007, and was not enough to make Google start encrypting:
‘The goal is to identify the applications being used on the network, but some of these devices can go much further; those from a company like Narus, for instance, can look inside all traffic from a specific IP address, pick out the HTTP traffic, then drill even further down to capture only traffic headed to and from Gmail, and can even reassemble emails as they are typed out by the user.‘
Now, far be it from me to speculate as to what actually moved Google towards HTTPS, but it would be my suspicion that this had something to do with it:
Google is now encrypting all Gmail traffic from its servers to its users in a bid to foil sniffers who sit in cafes, eavesdropping in on traffic passing by, the company announced Wednesday.
The change comes just a day after the company announced it might pull its offices from China after discovering concerted attempts to break into Gmail accounts of human rights activists.
There is a tendency to blame the business guys for things. If only they cared enough! As I’ve said before, the business guys have blown hundreds of millions on failed X.509 deployments. They cared enough. The problems with existing trust distribution systems are (and I think DJB would agree with me here) not just political, but deeply, embarrassingly technical as well.
Once upon a time, I wanted to write a kernel module. This module would quietly and efficiently add a listener on 443/TCP, that was just a mirror of 80/TCP. It would be like stunnel, but at native speeds.
But what certificate would it emit?
See, in HTTP, the client declares the host it thinks it’s talking to, so the server can “morph” to that particular identity. But in HTTPS, the server declares the host it thinks it is, so the client can decide whether to trust it or not.
This has been a problem, a known problem, since the late nineties. They even built a spec, called SNI (Server Name Indication), that allowed the client to “hint” to the server what name it was looking for.
Didn’t matter. There’s still not enough adoption of SNI for servers to vhost based off of it. So, if you want to deploy TLS, you not only have to get everybody, across all your organizations and all your partners and all your vendors and all your clients to “flip the switch”, but you also have to get them to renumber their networks.
Those are three words a network engineer never, ever, ever wants to hear.
And we haven’t even mentioned the fact that acquiring certificates is an out-of-organization acquisition, requiring interactions with outsiders for every new service that’s offered. Empirically, this is a fairly big deal. Devices sure don’t struggle to get IP addresses — but the contortions required to get a globally valid certificate into a device shipped to a company are epic and frankly impossible. To say nothing of what happens when one tries to securely host from a CDN (Content Distribution Network)!
DNSSEC, via its planned mechanisms for linking names to certificate hashes, bypasses this entire mess. A hundred domains can CNAME to the same host, with the same certificate, within the CDN. On arrival, the vhosting from the encapsulated HTTP layer will work as normal. My kernel module will work fine.
‘Bout time we fixed that!
So. When I say security is larger than cryptography — it’s not that I’m saying cryptography is small. I’m saying that security, actual effective security, requires being worried about a heck of alot more failure modes than can fit into a one hour talk.
I’ve saved my deepest concern for last.
I can’t believe DJB fell for Zooko’s Triangle.
So, I met Zooko about a decade ago. I had no idea of his genius. And yet, there he was, way back when, putting together the core of DJB’s talk at 27C3. Zooko’s Triangle is a description of desirable properties of naming systems, of which only two can be implemented at any one time. They are:
- Secure: Only the actual owner of the name is found.
- Decentralized: There is no “single point of failure” that everyone trusts to provide naming services
- Human Readable: The name is such that humans can read it.
DJB begins by recapitulating Nyms, an idea that keeps coming up through the years:
“Nym” case: URL has a key!
Recognize magic number 123 in http://1238675309.twitter.com and extract key 8675309.
(Technical note: Keys are actually longer than this, but still fit into names.)
DJB shortens the names in his slides — and admits that he does this! But, man, they’re really ugly:
I actually implemented something very similar in the part of Phreebird that links DNSSEC to the browser lock in your average web browser:
I didn’t invent this approach, though, and neither did DJB. While I was personally inspired by the Self-Certifying File System, this is the “Secure and Decentralized — Not Human Readable” side of Zooko, so this shows up repeatedly.
For Phreebird, this was just a neat trick, something funny and maybe occasionally useful when interoperating with a config file or two. It wasn’t meant to be used as anything serious. Among other things, it has a fundamental UI problem — not simply that the names are hideous (though they are), but that they’ll necessarily be overridden. People think bad security UI is random, like every once in a while the vapors come over a developer and he does something stupid.
No. You can look at OpenSSH and say — keys will change over time, and users will have to have a way to override their cache. You can look at curl and say — sometimes, you just have to download a file from an HTTPS link that has a bad cert. There will be a user experience telling you not one, but two ways to get around certificate checking.
You can look at a browser and say, well, if a link to a page works but has the wrong certificate hash in the sending link, the user is just going to have to be prompted to see if they want to browse anyway.
It is tempting, then, to think bad security UI is inevitable, that there are no designs that could possibly avoid it. But it is not the fact that keys change that force an alert. It’s that they change, without the legitimate administrator having a way to manage the change. IP addresses change all the time, and DNS makes it totally invisible. Everybody just gets the new IPs — and there’s no notification, no warnings, and certainly no prompting.
Random links smeared all over the net will prompt like little square Christmas ornaments. Keys stored in DNS simply won’t.
So DJB says, don’t deploy long names, instead use normal looking domains like www. Then, behind the scenes, using the DNS aliasing feature known as CNAME, link www to the full Nym.
Ignore the fact that there are applications actually using the CNAME data for meaningful things. This is very scary, very dangerous stuff. After all, you can’t just trust your network to tell you the keyed identity of the node you’re attempting to reach. That’s the whole point — you know the identity, the network is untrusted. DJB has something that — if comprehensively deployed, all the way to the root, does this. DNSCurve, down from the root, down from com, down to twitter.com would in fact provide a chain of trust that would allow http://www.twitter.com to CNAME to some huge ugly domain.
Whether or not it’s politically feasible (it isn’t), it is a scheme that could work. It is secure. It is human readable. But it is not decentralized — and DJB is petrified of trusted third parties.
And this, finally, is when it all comes off the rails. DJB suggests that all software could perhaps ship with lists of keys of TLDs — .com, .de, etc. Of course, such lists would have to be updated.
Never did I think I’d see one of the worst ideas from DNSSEC’s pre-root-signed past recycled as a possibility. That this idea was actually called the ITAR (International Trust Anchor Repository), and that DJB was advocating ITAR, might be the most surreal experience of 2010 for me.
(Put simply, imagine every device, every browser, every phone FTP’ing updates from some vendor maintained server. It was a terrible idea when DNSSEC suggested it, and even they only did so under extreme duress.)
But it gets worse. For, at the end of it, DJB simply through up his hands and said:
“Maybe P2P DNS can help.”
Now, I want to be clear (because I screwed this up at first): DJB did not actually suggest retrieving key material from P2P DNS. That’s good, because P2P DNS is the side of Zooko’s Triangle where you get decentralization and human readable names — but no security! Wandering a cloud, asking if anyone knows the trusted key for Twitter, is an unimaginably bad idea.
No, he suggested some sort of split system, where you actually and seriously use URLs like http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com to identify peers, but P2P DNS tells you what IPs to speak to them at.
What? Isn’t security important to you?
After an hour of hearing how bad DNSSEC must be, ending up here is…depressing. In no possible universe are Nymic URLs a good idea for anything but wonky configuration entries. DJB himself was practically apologizing for them. They’re not even a creative idea.
No system can credibly purport to be solving the problems of authentication and encryption for anybody, let alone the whole Internet, without offering a serious answer to the question of Key Management. To be blunt, this is the hard part. New transports and even new crypto are optimizations, possibly even very interesting ones. But they’re not where we’re bleeding.
The problem is that it’s very easy to give a node an IP address that anyone can route to, but very hard to give a node an identity that anybody can verify. Key Management — the creation, deletion, integration, and management of identities within and across organizational boundaries — this is where we need serious solutions.
Nyms are not a serious solution. Neither is some sort of strange half-TLD, half P2PDNS, split identity/routing hack. Solving key management is not actually optional. This is where the foundation of an effective solution must be found.
DNSSEC has a coherent plan for how keys can be managed across organizational boundaries — start at the root, delegate down, and use governance and technical constraints to keep the root honest. It’s worked well so far, or you wouldn’t be reading this blog post right now. There’s no question it’s not a perfect protocol — what is? — but the criticisms coming in from DJB are fairly weak (and unfairly old) in context, and the alternative proposed is sadly just smoke and mirrors without a keying mechanism.