The DNS works. It creates the shared namespace that allows applications to interoperate across LANs, organizations, and even countries. With the advent of DNSSEC, it’s our best opportunity to finally address the authentication flaws that are implicated in over half of all data breaches.
There are efforts afoot to manipulate the DNS on a remarkably large scale. The American PROTECT IP act contains several reasonable and well targeted remedies to copyright infringement. One of these remedies, however, is to leverage the millions of recursive DNS servers that act as accelerators for Internet traffic, and convert them into censors for domain names in an effort to block content.
Filtering DNS traffic will not work, and in its failure, will harm both the security and stability of the Internet at large.
But don’t take just my word for it. I’ve been fairly quiet about PROTECT IP since when it was referred to as COICA, working behind the scenes to address the legislation. A common request has been a detailed white paper, deeply elucidating the technical consequences of Recursive DNS Filtering.
Today, I — along with Steve Crocker, author of the first Internet standard ever, Paul Vixie, creator of many of the DNS standards and long time maintainer of the most popular name server in the world, David Dagon, co-founder of the Botnet fighting firm Damballa, and Danny MacPherson, Chief Security Officer of Verisign — release Security and Other Technical Concerns Raised by the DNS Filtering Requirements in the PROTECT IP Bill.
Thanks go to many, but particularly to the staffers who have been genuinely and consistently curious — on both sides of the aisle — about the purely technical impact of the Recursive DNS Filtering provisions. A stable and secure Internet is not a partisan issue, and it’s gratifying to see this firsthand.
TL,DR: Identifying the responsible party for a bug is hard. VUPEN was able to achieve code execution in Google’s Chrome browser, via Flash and through a special but limited sandbox created for the Adobe code. Chrome ships Flash with Chrome and activates it by default, so there’s no arguing that this a break in Chrome. However, it would represent an incredibly naive metric to say including Flash in Chrome has made it less secure. An enormous portion of the user population installs Flash, alongside Acrobat and Java, and by taking on additional responsibility for their security (in unique and different ways), Google is unambiguously making users safer. In the long run, sandboxes have some fairly severe issues, among which is the penchant for being very slow to repair once broken, if they can even be fixed at all. But in the short run, it’s hard to deny they’ve had an effect, exacerbating the “Hacker Latency” that quietly permeates computer security. Ultimately though, the dam has broken, and we can pretty much expect Flash bearing attackers to drill into Chrome at the next PWN2OWN. Finally, I note that the sort of disclosure executed by VUPEN is far, far too open to both fraudulent abuse and simple error, the latter of which appears to have occurred. We have disclosure for a reason. And then I clear my throat.
[Not citing in this document, as people are not speaking for their companies and I don't want to cause trouble there. Please mail me if I have permission to edit your cites into this post. I'm being a little overly cautious here. This post may mutate a bit.]
VUPEN vs. Google. What a fantastic example of the danger of bad metrics.
In software development, we have an interesting game: “Who’s bug is it, anyway?” Suppose there’s some troublesome interaction between a client and a server. Arguments along the lines of “You shouldn’t sent that message, Client!” vs. “You should be able to handle this message, Server!” are legendary. Security didn’t invent this game, it just raised the stakes (and, perhaps, provides guidance above and beyond ‘what’s cheapest’).
In VUPEN vs. Google, we have one of the more epic “Who’s bug is it, anyway?” events in recent memory.
VUPEN recently announced the achievement of code execution in Google Chrome 11. This has been something of an open challenge — code execution isn’t exactly rare in web browsers, but Chrome has managed to survive despite flaws being found in its underlying HTML5 renderer (WebKit). This has been attributed to Chrome’s sandbox: A protected and restricted environment in which malicous code is assumed to be powerless to access sensitive resources. Google has been quite proud of the Chrome sandbox, and indeed, for the last few PWN2OWN contests run by Tipping Point at CanSecWest, Chrome’s come out without a scratch.
Of course, things are not quite so simple. VUPEN didn’t actually break the Chrome sandbox. They bypassed it entirely. Flash, due to complexity issues, runs outside the primary Chrome sandbox. Now, there is a second sandbox, built just for Flash, but it’s much weaker. VUPEN apparently broke this secondary sandbox. Some have argued that this certainly doesn’t represent a break of the Google sandbox, and is really just another Flash vuln.
Is this Adobe’s bug? Is this Google’s bug? Is VUPEN right? Is Google right?
VUPEN broke Chrome. You really can’t take that from them. Install Chrome, browse to a VUPEN-controlled page, arbitrary code can execute. Sure, the flaw was in Flash. But Chrome ships Flash, as it ships WebKit, SQLite, WebM, or any of a dozen other third party dependencies. Being responsible for your imported libraries is fairly normal — when I found a handful of code execution flaws in Firefox, deriving from their use of Ogg code from xiph.org, the Firefox developers didn’t blink. They shipped it, they exposed it on a zero-click drive-by surface, they assumed responsibility for the quality of that code. When the flaws were discovered, they shipped fixes.
That’s what responsibility means — responding. It doesn’t mean there will be nothing to respond to.
Google is taking on some interesting responsibilities with its embedding of Flash.
Browsers have traditionally not addressed the security of their plugins. When there’s a flaw in Java, we look to Oracle to fix it. When there’s an issue with Acrobat, we look to Adobe to fix it. So it is with Adobe’s Flash. This has come from the fact that:
A) The browsers don’t actually ship the code; rather, users install it after the fact
B) The browsers don’t have the source code to the plugins anyway, so even if they wanted to write a fix, they couldn’t
From the fact that plugins are installed after the fact, one can assign responsibility for security to the user themselves. After all, if somebody’s going to mod their car with an unsafe fuel injection system, it’s not the original manufacturer’s fault if the vehicle goes u in flames. And from the fact that the browser manufacturers are doing nothing but exposing a pluggable interface, one can cleanly assign blame to plugin makers.
That’s all well and good. How’s that working out for us?
For various reasons, not well. Not well enough, anyway. Dino Dai Zovi commented a while back that we needed to be aware of the low hanging fruit that attackers were actually using to break into networks. Java, Acrobat, and Flash remain major attack vectors, being highly complex and incredibly widely deployed. Sure, you could just blame the user some more — but you know what, Jane in HR needs to read that resume. Deal with it. What we’re seeing Google doing is taking steps to assume responsibility over not simply the code they ship, but also the code they expose via their plugin APIs.
That’s something to be commended. They’re — in some cases, quite dramatically — increasing the count of lines of code they’re pushing, not to protect users from code they wrote, but to protect users. Period.
That’s a good thing. That’s bottom line security defeating nice sounding excuses.
The strategies in use here are quite diverse, and frankly unprecedented. For Java, Chrome pivots its user experience around whether your version of Java is fully up to date. If it isn’t, Java stops being zero-click, a dialog drops down, and the user is directed to Oracle to upgrade the plugin. For PDFs, Google just shrugged its shoulders and wrote a new PDF viewer, one constrained enough to operate within their primary sandbox even. It’s incomplete, and will prompt if it happens to see a PDF complex enough to require Acrobat Reader, but look:
1) Users get to view most PDFs without clicking through any dialogs, via Google PDF viewer
2) In those circumstances where Google PDF viewer isn’t enough, users can click through to Acrobat.
3) Zero-click drive-by attacks via Acrobat are eliminated from Chrome
This is how you go from ineffectual complaining about the bulk of Acrobat and the “stupidity” of users installing it, to actually reducing the attack surface and making a difference on defense.
For various reasons, Google couldn’t just rewrite Flash. But they could take it in house, ship it in Chrome, and at minimum take responsibility for keeping the plugin. (Oracle. Adobe. I know you guys. I like you guys. You both need to file bugs against your plugin update teams entitled ‘You have a user interface’. Seriously. Fix or don’t fix, just be quiet about it.)
Well, that’s what Chrome does. It shuts up and patches. And the bottom line is that we know some huge portion of attacks wouldn’t work, if patches were applied. That’s a lot of work. That’s a lot of pain. Chrome is taking responsibility for user behavior; taking these Adobe bugs upon themselves as a responsibility to cover.
Naive metrics would punish this. Naive observers have wrung their hands, saying Google shouldn’t ship any code they can’t guarantee is good. Guess what, nobody can ship code they guarantee is good, and with the enormous numbers of Flash deployments out there, the choice is not between Flash and No Flash, the choice is between New Flash and Old Flash.
A smart metric has to measure not just how many bugs Google had to fix, but how many bugs the user would be exposed to had Google not acted. Otherwise you just motivate people to find excuses why bugs are someone else’s fault, like that great Who’s Bug Is It Anyway on IE vs. Firefox Protocol Handlers.
(Sort of funny, actually, there’s been criticism around people only having Flash so they can access YouTube. Even if that’s true, somebody just spent $100M on an open video codec and gave it away for free. Ahem.)
Of course, it’s not only for updates that Chrome embedding Flash is interesting. There are simply certain performance, reliability, and security (as in memory corruption) improvements you can’t build talking to someone over an API. Sometimes you just need access to the raw code, so you can make changes on both sides of an interface (and retain veto over bad changes).
Also, it gets much more realistic to build a sandbox.
I suppose we need to talk about sandboxes.
It’s somewhat known that I’m not a big fan of sandboxes. Sandboxes are environments designed to prevent an attacker with arbitrary code execution from accessing any interesting resources. I once had a discussion with someone on their reliability:
“How many sandboxes have you tested?”
“How many have you not broken?”
“Do you think this might represent hard data?”
“You’re being defeatist!”
It’s not that sandboxes fail. Lots of code fails — we fix it and move on with our life. It’s that sandboxes bake security all the way into design, which is remarkably effective until the moment the design has an issue. Nothing takes longer to fix than design bugs. No other sorts of bugs threaten to be unfixable. Systrace was broken in 2007, and has never been fixed. (Sorry, Niels.) That just doesn’t happen with integer overflows, or dangling pointers, or use-after-frees, or exception handler issues. But one design bug in a sandbox, in which the time between a buffer being checked and a buffer being used is abused to allow filter bypass, and suddenly the response from responsible developers becomes “we do not recommend using sysjail (or any systrace(4) tools, including systrace(1)”.
Not saying systrace is unfixable. Some very smart people are working very hard to try, and maybe they’ll prove me wrong someday. But you sure don’t see developers so publicly and stridently say “We can’t fix this” often. Or, well, ever. Design bugs are such a mess.
VUPEN did not break the primary Chrome sandbox. They did break the secondary one, though. Nobody’s saying Flash is going into the primary sandbox anytime soon, or that the secondary sandbox is going to be upgraded to the point VUPEN won’t be able to break through. We do expect to see the Flash bug fixed fairly quickly but that’s about it.
The other problem with sandboxes, structurally, is one of attack complexity. People assume finding two bugs is as hard as finding one. The problem is that I only need to find one sandbox break, which I can then leverage again and again against different primary attack surfaces. Put in mathematical terms, additional bug hurdles are adders, not multipliers. The other problem is that for all but the most trivial sandboxes (a good example recently pointed out to me — the one for vsftpd) the degrees of freedom available to an attacker are so much greater post-exploitation, than pre, that for some it would be like jumping the molehill after scaling the mountain.
There are just so many degrees of freedom to secure, and so many dependencies (not least of which, performance) on those degrees of freedom remaining. Ugh.
Funny thing, though. Sandboxes may fail in the long term. But they do also, well, work. As of this writing, generic bypasses for the primary Chrome sandbox have not been publicly disclosed. Chrome did in fact survive the last few Tipping Point PWN2OWNs. What’s going on?
Sure, we (the public research community, or the public community that monitors black hat activity) find bugs. But we don’t find them immediately. Ever looked at how long bugs sit around before they’re finally found? Years are the norm. Decades aren’t unheard of. Of course, the higher profile the target, the quicker people get around to poking at it. Alas, the rub — takes a few years for something to get high profile, usually. We don’t find the bugs when things are originally written — when they’re cheapest to fix. We don’t find them during iterative development, during initial customer release, as they’re ramped up in the marketplace. No, the way things are structured, it has to get so big, so expensive…then it’s obvious.
Hacker Latency. It’s a problem.
Sandboxes are sort of a double whammy when it comes to high hacker latency — they’re only recently widely deployed, and they’re quite different than defenses that came before. We’re only just now seeing tools to go after them. People are learning.
But eventually the dam breaks, as it has with VUPEN’s attack on Chrome. It’s now understood that there’s a partial code execution environment that lives outside the primary Chrome sandbox, and that attacks against it don’t have to contend with the primary sandbox. We can pretty much expect Flash attacks on Chrome at the next PWN2OWN.
One must also give credit where it’s due: VUPEN chose to bypass the primary sandbox, not to break it. And so, I expect, will the next batch of Chrome attackers. That it’s something one even wants to bypass is notable.
Alas, there is one final concern. While in this case it happened to be the case that VUPEN did legitimate work, random unsubstantiated claims of security finds that are only being disclosed to random government entities deserve no credibility whatsoever. The costs of running a fraudulent attack are far too low and the damage is much too high. And it’s not merely a matter of fraud — I don’t get the sense that VUPEN knew there were two different sandboxes. The reality of black boxing is you’re never quite sure what’s going on in there, and so while there was some actionable intelligence from the disclosure (“oh heh, the Chrome sandbox has been bypassed”) it was wrong (“oh heh, a Chrome sandbox has been bypassed”) and incomplete (“Flash exploits are potentially higher priority”).
There’s a reason we have disclosure. (AHEM.)
Couple of interesting things. First, as Chris Evans (@scarybeast on Twitter, and on the Sandbox team at Google) pointed out, Tipping Point has has deliberately excluded plugins from being “in-scope” at PWN2OWN. To some degree, this makes sense — people would just show up with their Java exploit and own all the browsers. However, you know, that just happened in the real world, with one piece of malware not only working across browsers, but across PC and Mac too. Once upon a time, ActiveX (really just COM) was derided for shattering the security model for IE, for whatever the security model of the browser did, there was always some random broken ActiveX object to twiddle.
Now, the broken objects are the major plugins. Write once, own everywhere.
Worse, there’s just no way to say Flash is just a plugin anymore, not when it’s shipped as just another responsible component in Chrome.
But Chris Evans is completely right. You can’t just say Flash is fair game for Chrome (and only Chrome) in the next PWN2OWN. That would deeply penalize Google for keeping Flash up to date, which we know would unambiguously increase risk for Chrome users. The naive metric would be a clear net loss. And nor should Flash be fair for all browsers, because it’s actually useful to know which core browser engines are secure and which aren’t. As bad money chases out good, so does bad code.
So, if Dragos can find a way, we really should have a special plugin league, for the three plugins that have +90% deployments — Flash, Acrobat, and Java. If the data says that they indeed fall quite easier than the browsers that house them, well, that’s good to know.
Another issue is sort of an internal contradiction in the above piece. At one point, I say:
VUPEN chose to bypass the primary sandbox, not to break it. And so, I expect, will the next batch of Chrome attackers. That it’s something one even wants to bypass is notable.
But then I say:
I don’t get the sense that VUPEN knew there were two different sandboxes.
Obviously both can’t be true. It’s impossible to know which is which, without VUPEN cluing us in. But, as Jeremiah Grossman pointed out, VUPEN ain’t talking. In fact, VUPEN hasn’t even confirmed the entire supposition, apparently-but-not-necessarily confirmed by Tavis Ormandy et al, that theirs was a bug in Flash. Just more data pointing to these sorts of games being problematic.
As they say: “If you can’t measure it, you can’t manage it.”
There’s a serious push in the industry right now for security metrics. People really want to know what works — because this ain’t it. But where can we find hard data?
What about fuzzers — the automated, randomized testers that have been so good at finding bugs through sheer brute force?
I had a hypothesis, borne of an optimism that’s probably a bit out of place in our field: I believe, after ten years of pushing for more secure code, software quality has increased across the board — at least in things that have been under active attack. And, in conjunction with Adam Cecchetti and Mike Eddington of Deja Vu Security (and the Peach Project), we developed an experiment to try to show this.
We fuzzed Office and OpenOffice. We fuzzed Acrobat, Foxit, and GhostScript. And we fuzzed them all longitudinally — going back in time, smashing 2003, 2007, and 2010.
175,334 crashes later, we have some…interesting data. Here’s a view of just what happened to Office — and OpenOffice — between 2003 (when the Summer of Worms hit) and 2010.
Things got better! This isn’t the end all and be all of this approach. There’s lots that could be going wrong. But we see similar collapses in PDF crash rates too. Slides follow below, and a paper will probably be assembled at some point. But the most important thing we’re doing is — we’re releasing data! This is easily some of the most fun public data I’ve played with. I’m looking forward to seeing what sort of visualizations come of them. Here’s one I put together, looking at individual crash files and their combined effect on Word/Writer 2003/2007/2010:
I’ve wanted to do a study like this for a very long time, and it’s been pretty amazing finally doing a collaboration with DD and Adam! Thanks also to the vendors — Microsoft, the OpenOffice Team, Adobe, Foxit, and the guys behind GhostScript. Everyone’s been cool.
Please send feedback if you have thoughts. I’d love to post some viz based on this data! And now, without further ado…
- CanSecWest Slides
- Crash Summary (TSV) (MySQL Dump)
- Crash Profile Spreadsheet (XLS) (Google Spreadsheet)
- (Very Rough) Crash Profile Generator
Science! It works. Or, in this case, it hints of an interesting new tool with its own set of downsides
It ain’t security. Can’t say I care. Trying to fix color blindness is most fun side project evar.
[Heh! This blog post segues into a debate in the comments!]
Short Version: Zooko’s Triangle describes traits of a desirable naming system — and constrains us to two of secure, decentralized, or human readable. Aaron Swartz proposes a system intended to meet all three. While his distribution model has fundamental issues with distributed locking and cache coherency, and his use of cryptographic hashcash does not cause a significant asymmetric load for attackers vs. defenders, his use of an append-only log does have interesting properties. Effectively, this system reduces to the SSH security model in which “leaps of faith” are taken when new records are seen — but, unlike SSH, anyone can join the cloud to potentially be the first to provide malicious data, and there’s no way to recover from receiving bad (or retaining stale) key material. I find that this proposal doesn’t represent a break of Zooko’s Triangle, but it does show the construct to be more interesting than expected. Specifically, it appears there’s some room for “float” between the sides — a little less decentralization, a little more security.
Zooko’s Triangle is fairly beautiful — it, rather cleanly, describes that a naming system can offer:
- Secure, Decentralized, Non-Human-Readable names like Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com
- Secure, Centralized, Human-Readable Names like http://www.twitter.com, as declared by trusted third parties (delegations from the DNS root)
- Insecure, Decentralized, Human-Readable Names like http://www.twitter.com, as declared by untrusted third parties (consensus via a P2P cloud)
There’s quite the desire to “square Zooko’s triangle” — to achieve Secure, Decentralized, Human-Readable Names. This is driven by the desire to avoid the vulnerability point that centralization exposes, while neither making obviously impossible demands on human memory nor succumbing to the rampantly manipulatable opinion of a P2P mob.
Aaron Swartz and I have been having a very friendly disagreement about whether this is possible. I promised him that if he wrote up a scheme, I’d evaluate it. So here we are. As always, here’s the source material:
The basic idea is that there’s a list of name/key mappings, almost like a hosts file, but it links human readable names to arbitrary length keys instead of IP addresses. This list, referred to as a scroll, can only be appended to, never deleted from. Appending to the list requires a cryptographic challenge to be completed — not only do you add (name,key), you also add a nonce that causes the resulting hash from the beginning to be zero.
Aaron grants that there might be issues with initially trusting the network. But his sense is, as long as either your first node is trustworthy, or the majority of the nodes you find out about are trustworthy, you’re OK.
It’s a little more complicated than that.
There are many perspectives from which to analyze this system, but the most interesting one I think is the one where we watch it fail — not under attack, but under everyday use.
How does this system absorb changes?
While Aaron’s proposal doesn’t allow existing names to alter their keys — a fatal limitation by some standards, but one we’ll ignore for the moment — it does allow name/key pairs to be added. It also requires that additions happen sequentially. Suppose we have a scroll with 131 name/key pairs, and both Alice and Bob try to add a name without knowing about the other’s attempt. Alice will compute and distribute 131,Alice, and Bob will compute and distribute 131,Bob.
What now? Which scroll should people believe? Alice’s? Bob’s? Both?
Should they update both? How will this work over time?
Aaron says the following:
What happens if two people create a new line at the same time? The debate should be resolved by the creation of the next new line — whichever line is previous in its scroll is the one to trust.
This doesn’t mean anything, alas. The predecessor to both Alice and Bob’s new name is 131. The scroll has forked. Either:
- Some people see Alice, but not Bob
- Some people see Bob, but not Alice
- Somehow, Alice and Bob’s changes are merged into one scroll
- Somehow, the scroll containing Alice and the scroll containing Bob are both distributed
- Both new scrolls are rejected and the network keeps living in a 131 record universe
Those are the choices. We in security did not invent them.
There is a belief in security sometimes that our problems have no analogue in normal computer science. But this right here is a distributed locking problem — many nodes would like to write to the shared dataset, but if multiple nodes touch the same dataset, the entire structure becomes corrupt. The easiest fix of course is to put in a centralized locking agent, one node everyone can “phone home to” to make sure they’re the only node in the world updating the structure’s version.
But then Zooko starts jumping up and down.
(Zooko also starts jumping up and down if the output of the system doesn’t actually work. An insecure system at least needs to be attacked, in order to fail.)
Not that security doesn’t change the game. To paraphrase BASF, “Security didn’t make the fundamental problems in Comp Sci. Security makes the fundamental problems in Comp Sci worse.” Distributed locking was already a mess when we trusted all the nodes. Add a few distrusted nodes, and things immediately become an order of magnitude more difficult. Allow a single node to create an infinite number of false identities — in what is known as a Sybil attack — and any semblence of security is gone.
[Side note: Never heard of a Sybil attack? Here's one of the better stories of one, from a truly devious game called Eve Online.]
There is nothing in Aaron’s proposal that prevents the generation of an infinite number of identities. Arguably, the whole concept of adding a line to the scroll is equivalent to adding an identity, so a constraint here is probably a breaking constraint in and of itself. So when Aaron writes:
To join a network, you need a list of nodes where at least a majority are actually nodes in the network. This doesn’t seem like an overly strenuous requirement.
Without a constraint on creating new nodes, this is in fact a serious hurdle. And keep in mind, “nodes in the network” is relative — I might be connecting to majority legitimate nodes, but are they?
We shouldn’t expect this to be easy. Managing fully distributed locking is already hard. Doing it when all writes to the system must be ordered is already probably impossible. Throw in attackers that can multiply ad infinatum and can dynamically alter their activities to interfere with attempts to insert with specific ordered writes?
Before we discuss the crypto layer, let me give you an example of dynamic alterations. Suppose we organize our nodes into a circle, and have each node pass write access to the next node as listed in their scroll. (New nodes would have to be “pulled into” the cloud.) Would that work?
No, because the first malicious node that got in, could “surround” each real node with an untrusted Sybil. At that point, he’d be consulted after each new name was submitted. Any name he wanted to suppress — well, he’d simply delete that name/key pair, calculate his own, and insert it. By the time the token went around, the name would be long since destroyed.
(Not that a token approach could work, for the simple reality that someone will just lose the token!)
Thus far, we’ve only discussed propagation, and not cryptography. This should seem confusing — after all, the “cool part” of this system is the fact that every new entry added to the scroll needs to pass a computational crypto challenge first. Doesn’t that obviate some of these attacks?
The problem is that cryptography is only useful in cases of significant asymmetric workload between the legitimate user and the malicious attacker. When I have an AES256 symmetric key or an RSA2048 private key, my advantage over an attacker (theoretically) exceeds the computational capacity of the universe.
What about a hash function? No key, just the legitimate user spinning some cycles to come to a hash value of 0. Does the legitimate user have an advantage over the attacker?
Of course not. The attacker can spin the same cycles. Honestly, between GPUs and botnets, the attacker can spin far more cycles, by several orders of magnitude. So, even if you spend an hour burning your CPU to add a name, the bad guy really is in a position to spend but a few seconds cloning your work to hijack the name via the surround attack discussed earlier.
(There are constructions in which an attacker can’t quite distribute the load of cracking the hash — for example, you could encode the number of iterations a hash needed to be run, to reach N zero bits. But then, the one asymmetry that is there — the fact that the zero bits can be quickly verified despite the cost of finding them — would get wiped out, and validation of the scroll would take as long as calculation of the scroll.)
But what about old names? Are they not at least protected? Certainly not by the crypto. Suppose we have a scroll with 10,000 names, and we want to alter name 5,123. The workload to do this is not even 10,000 * time_per_name, because we (necessarily) can reuse the nonces that got us through to name 5,122.
Aaron thinks this isn’t the case:
First, you need to need to calculate a new nonce for the line you want to steal and every subsequent line….It requires having some large multiple of the rest of the network’s combined CPU power.
But adding names does not leverage the combined CPU power of the entire network — it leverages the power of a single node. When the single node wanted to create name 5,123, it didn’t need to burn cycles hashcashing through 0-5122. It just did the math for the next name.
Now, yes, names after 5,123 need to be cracked by the attacker, when they didn’t have to be cracked by the defender. But the attackers have way more single nodes than the defenders do. And with those nodes, the attacker can bash through each single name far, far quicker.
There is a constraint — to calculate the fixed nonce/zerohash for name 6001, name 6000 needs to have completed. So we can’t completely parallelize.
But of course, it’s no work at all to strip content from the scroll, since we can always remove content and get back to 0-hash.
Ah! But there’s a defense against that:
“Each remembers the last scroll it trusted.”
And that’s where things get really interesting.
Complexity is a funny thing — it sneaks up on you. Without this scroll storage requirement, the only difference between nodes in this P2P network are what nodes it happens to stumble into when first connecting. Now, we have to be concerned — which nodes has a node ever connected to? How long has it been since a node has been online? Are any significant names involved between the old scroll and the modern one?
Centralized systems don’t have such complexities, because there’s a ground truth that can be very quickly compared against. You can be a hundred versions out of date, but armed with nothing but a canonical hash you can clear your way past a thousand pretenders to the one node giving out a fresh update.
BitTorrent hangs onto centralization — even in small doses — for a reason. Zooko is a harsh mistress.
Storage creates a very, very slight asymmetry, but one that’s often “all we’ve got”. Essentially, the idea is that the first time a connection is made, it’s a leap of faith. But every time after that, the defender can simply consult his hard drive, while the attacker needs to…what? Match the key material? Corrupt the hard drive? These are in fact much harder actions than the defender’s check of his trusted store.
This is the exact system used by SSH, in lieu of being able to check a distributed trusted store. But while SSH struggles with constant prompting after, inevitably, key material is lost and a name/key mapping must change, Aaron’s proposal simply defines changing a name/key tuple as something that Can’t Happen.
Is that actually fair? There’s an argument that a system that can’t change name/key mappings in the presence of lost keys fails on feature completeness. But we do see systems that have this precise property: PGP Keyservers. You can probably find the hash of the PGP key I had in college somewhere out there. Worse, while I could probably beg and plead my PGP keys offline, nobody could ever wipe keys from these scrolls, as it would break all the zero-hashes that depend on them.
I couldn’t even put a new key in, because if scroll parsers didn’t only return the first name/key pair, then a trivial attack would simply be to write a new mapping and have it added to the set of records returned.
Assume all that. What are the ultimate traits of the system?
The first time a name/key peer is seen, it is retrieved insecurely. From that point forward, no new leap of faith is required or allowed — the trust from before must be leveraged repeatedly. The crypto adds nothing, and the decentralization does little but increase the number of attackers who can submit faulty data for the leap of faith, and fail under the sort of conditions distributed locking systems fail. But the storage does prevent infection after the initial trust upgrade, at the cost of basic flexibility.
I could probably get away with saying — a system that reduces to something this and undeployable, and frankly unstable, can not satisfy Zooko’s Triangle. After all, simply citing the initial trust upgrade, as we upgrade scroll versions from a P2P cloud on the false assumption that an attacker couldn’t maliciously synthesize new records unrecoverably, would be enough.
It’s fair, though, to discuss a few annoying points. Most importantly, aren’t we always upgrading trust? Leaps of faith are taken when hardware is acquired, when software is installed, even when the DNSSEC root key is leveraged. By what basis can I say those leaps are acceptable, but a per-name/key mapping leap represents an unacceptable upgrade?
That’s not a rhetorical question. The best answer I can provide is that the former leaps are predictable, and auditable at manufacture, while per-name leaps are “runtime” and thus unpredictable. But audits are never perfect and I’m a member of a community that constantly finds problems after release.
I am left with the fundamental sense that manufacturer-based leaps of faith are at least scalable, while operational leaps of faith collapse quickly and predictably to accepting bad data.
More interestingly, I am left with the unescapable conclusion that the core security model of Swartz’s proposal is roughly isomorphic to SSH. That’s fine, but it means that if I’m saying Swartz’s idea doesn’t satisfy Zooko due to security, then neither does SSH.
Of course, it shouldn’t. We all know that in the real world, the response by an administrator to encountering a bad key, is to assume the box was paved for some reason and thus to edit known_hosts and move on. That’s obviously insecure behavior, but to anyone who’s administered more than a box or two, that’s life. We’re attempting to fix that, of course, by letting the guy who paves the box update the key via DNSSEC SSHFP records, shifting towards security and away from decentralization. You don’t get to change client admin behavior — or remove client key overrides — without a clear and easy path for the server administrator to keep things functional.
So, yeah. I’m saying SSH isn’t as secure as it will someday be. A day without known_hosts will be a good day indeed.
I’ve had code in OpenSSH for the last decade or so; I hope I’m allowed to say that And if you write something that says “Kaminsky says SSH is Insecure” I will be very unhappy indeed. SSH is doing what it can with the resources it has.
The most interesting conclusion is then that there must be at least some float in Zooko’s Triangle. After all, OpenSSH is moving from just known_hosts to integrating a homegrown PKI/CA system, shifting away from decentralization and towards security. Inconvenient, but it’s what the data seems to say.
Looking forward to what Zooko and Aaron have to say.
DJB responds! Not in full — which I’m sure he’ll do eventually, and which I genuinely look forward to — but to my data regarding the increase in authoritative server load that DNSCurve will cause. Here is a link to his post:
What’s going on is as follows: I argue that, since cache hit rates of 80-99% are seen on intermediate caches, that this will by necessity create at least 5x-100x increases in traffic to authoritative servers. This is because traffic that was once serviced out of cache, will now need to transit all the way to the authoritative server. (It’s at least, because it appears each cache miss requires multiple queries to service NS records and the like.)
DJB replies, no, DNSCurve allows there to be a local cache on each machine, so the hit rates above won’t actually expand out to the full authoritative load increase.
Problem is, there’s already massive local caching going on, and 80-99% is still the hitrate at the intermediates! Most DNS lookups come from web browsers, effectively all of which cache DNS records. Your browser does not repeatedly hammer DNS for every image it retrieves! It is in fact these very DNS caches that one needs to work around in the case of DNS Rebinding attacks (which, by the way, still work against HTTP).
But, even if the browsers weren’t caching, the operating system caches extensively as well. Here’s Windows, displaying its DNS Client cache:
$ ipconfig /displaydns
Windows IP Configuration
Record Name . . . . . : linux.die.net
Record Type . . . . . : 5
Time To Live . . . . : 5678
Data Length . . . . . : 8
Section . . . . . . . : Answer
CNAME Record . . . . : sites.die.net
So, sure, DNSCurve lets you run a cache locally. But that local cache gains you nothing against the status quo, because we’re already caching locally.
Look. I’ve been very clear. I think Curve25519 occupies an unserviced niche in our toolbox, and will let us build some very interesting network protocols. But in the case of DNS, using it as our underlying cryptographic mechanism means we lose the capability to assert the trustworthiness of data to anyone else. That precludes cross-user caching — at least, precludes it if we want to maintain end to end trust, which I’m just not willing to give up.
I can’t imagine DJB considers it optional either.
It’s probably worth taking a moment to discuss our sources of data. My caching data comes from multiple sources — the CCC network, XS4All, a major North American ISP, one of the largest providers to major ISPs (each of the five 93% records was backing 3-4M users!), and one of the biggest single resolver operators in the world.
DJB’s data comes from a lab experiment.
Now, don’t get me wrong. I’ve done lots of experiments myself, and I’ve made serious conclusions on them. (I’ve also been wrong. Like, recently.) But, I took a look at the experimental notes that DJB (to his absolute credit!) posted.
So, the 1.15x increase was relative to 140 stub resolvers. Not 140M. 140.
And he didn’t even move all 140 stubs to 140 caches. No, he moved them from 1 cache to 10 caches.
Finally, the experiment was run for all of 15 minutes.
[QUICK EDIT: Also, unclear if the stub caches themselves were cleared. Probably not, since anything that work intensive is always bragged about.]
I’m not saying there’s experimental bias here, though the document is a pretty strong position piece for running a local resolver. (The single best piece of data in the doc: There are situations where the cache is 300ms away while the authoritative is 10ms away. I did not know this.)
But I think it’s safe to say DJB’s 1.15x load increase claim is thoroughly debunked.
Short Version: Dan Bernstein delivered a talk at the 27C3 about DNSSEC and his vision for authenticating and encrypting the net. While it is gratifying to see such consensus regarding both the need to fix authentication and encryption, and the usefulness of DNS to implement such a fix, much of his representation of DNSSEC — and his own replacement, DNSCurve — was plainly inaccurate. He attacks a straw man implementation of DNSSEC that must sign records offline, despite all major DNSSEC servers moving to deep automation to eliminate administrator errors, and despite the existence of Phreebird, my online DNSSEC signing proxy specifically designed to avoid the faults he identifies.
DJB complains about NSEC3′s impact on the privacy of domain names, despite the notably weak privacy guarantees on public DNS names themselves, and more importantly, the code in Phreebird that dynamically generates NSEC3 records thus completely defeating GPU hashcracking. He complains about DNSSEC as a DDoS bandwidth amplifier, while failing to mention that the amplification issues are inherited from DNS. I observe his own site, cr.yp.to, to be a 6.4x bandwidth amplifier, and the worldwide network of open recursive servers to be infinitely more exploitable even without DNSSEC. DJB appeared unaware that DNSSEC could be leveraged to offer end to end semantics, or that constructions existed to use secure offline records to authenticate protocols like HTTPS. I discuss implementations of both in Phreebird.
From here, I analyze Curve25519, and find it a remarkably interesting technology with advantages in size and security. His claims regarding instantaneous operation are a bit of an exaggeration though; initial benchmarking puts Curve25519 at about 4-8x the speed of RSA1024. I discuss the impact of DNSCurve on authoritative servers, which DJB claims to be a mere 1.15x increase in traffic. I present data from a wide variety of sources, including the 27C3 network, demonstrating that double and even triple digit traffic increases are in fact likely, particularly to TLDs. I also observe that DNSCurve has unavoidable effects on query latency, server CPU and memory, and key risk management. The argument that DNSSEC doesn’t sign enough, because a signature on .org doesn’t necessary sign all of wikipedia.org, is shown to be specious, in that any delegated namespace with unsigned children (including in particular DNSCurve) must have this characteristic.
I move on to discussing end to end protocols. CurveCP is seen as interesting and highly useful, particularly if it integrates a lossy mode. However, DJB states that CurveCP will succeed where HTTPS has failed because CurveCP’s cryptographic primitive (Curve25519) is faster. Google is cited as an organization that has not been able to deploy HTTPS because the protocol is too slow. Actual source material from Google is cited, directly refuting DJB’s assertion. It is speculated that the likely cause of Google’s sudden deployment of HTTPS on GMail was an attack by a foreign power, given that the change was deployed 24 hours after disclosure of the attack. Other causes for HTTPS’s relative rarity are cited, including the large set of servers that need to be simultaneously converted, the continuing inability to use HTTPS for virtual hosted sites, and the need for interaction with third party CA’s.
We then proceed to explore DJB’s model for key management. Although DJB has a complex system in DNSCurve for delegated key management, he finds himself unable to trust either the root or TLDs. Without these trusted third parties, his proposal devolves to the use of “Nym” URLs like http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com to bootstrap key acquisition for Twitter. He suggests that perhaps we can continue to use at least the TLDs to determine IP addresses, as long as we integrate with an as-yet unknown P2P DNS system as well.
I observe this is essentially a walk of Zooko’s Triangle, and does not represent an effective or credible solution to what we’ve learned is the hardest problem at the intersection of security and cryptography: Key Management. I conclude by pointing out that DNSSEC does indeed contain a coherent, viable approach to key management across organizational boundaries, while this talk — alas — does not.
So, there was a talk at the 27th Chaos Communication Congress about DNSSEC after all! Turns out Dan Bernstein, better known as DJB, is not a fan.
That’s tragic, because I’m fairly convinced he could do some fairly epic things with the technology.
Before I discuss the talk, there are three things I’d like to point out. First, you should see the talk! It’s actually a pretty good summary of a lot of latent assumptions that have been swirling around DNSSEC for years — assumptions, by the way, that have been held as much by defenders as detractors. Here are links:
Second, I have a tremendous amount of respect for Dan Bernstein. It was his fix to the DNS that I spent six months pushing, to the exclusion of a bunch of other fixes which (to be gentle) weren’t going to work. DJB is an excellent cryptographer.
And, man, I’ve been waiting my entire career for Curve25519 to come out.
But security is bigger than cryptography.
The third thing I’d like to point out is that, with DJB’s talk, a somewhat surprising consensus has taken hold between myself, IETF WG’s, and DJB. Essentially, we all agree that:
1) Authentication and encryption on the Internet is broken,
2) Fixing both would be a Big Deal,
3) DNS is how we’re going to pull this off.
Look, we might disagree about the finer details, but that’s a fairly incredible amount of consensus across the engineering spectrum: There’s a problem. We have to fix it. We know what we’re going to use to fix it.
That all identified, lets discuss the facts.
Well, this document got a bit larger than expected (understatement), so here’s a list of section headings.
DNSSEC’s Problem With Key Rotation Has Been Automated Away
DNSSEC Is Not Necessarily An Offline Signer — In Fact, It Works Better Online!
DNS Leaks Names Even Without NSEC3 Hashes
NSEC3 “White Lies” Entirely Eliminate The NSEC3 Leaking Problem
DNSSEC Amplification is not a DNSSEC bug, but an already existing DNS, UDP, and IP Bug
DNSSEC Does In Fact Offer End To End Resolver Validation — Today
DNSSEC Bootstraps Key Material For Protocols That Desperately Need It — Today
Curve25519 Is Actually Pretty Cool
Limitations of Curve25519
DNSCurve Destroys The Caching Layer. This Matters.
DNSCurve requires the TLDs to use online signing
DNSCurve increases query latency
DNSCurve Also Can’t Sign For Its Delegations
What About CurveCP?
HTTPS Has 99 Problems But Speed Ain’t One
There Is No “On Switch” For HTTPS
HTTPS Certificate Management Is Still A Problem!
The Biggest Problem: Zooko’s Triangle
The Bottom Line: It Really Is All About Key Management
From slide 48 to slide 53, DJB discusses what would appear to be a core limitation of DNSSEC: Its use of offline signing. According to the traditional model for DNSSEC, keys are kept in a vault offline, only pulled out during those rare moments where a zone must change or be resigned. He observes a host of negative side effects, including an inability to manage dynamic domains, increased memory load from larger zones, and difficulties manually rotating signatures and keys.
But these are not limitations to DNSSEC as a protocol. They’re implementation artifacts, no more inherent to DNSSEC than publicfile‘s inability to support PHP. (Web servers were not originally designed to support dynamic content, the occasional cgi-bin notwithstanding. So, we wrote better web servers!) Key rotation is the source of some enormous portion of DNSSEC’s historical deployment complexity, and as such pretty much every implementation has automated it — even at the cost of having the “key in a vault”. See production-ready systems by:
So, on a purely factual basis, the implication that DNSSEC creates a “administrative disaster” under conditions of frequent resigning is false. Everybody’s automating. Everybody. But his point that offline signing is problematic for datasets that change with any degree of frequency is in fact quite accurate.
But who says DNSSEC needs to be an offline signer, signing all its records in advance?
Online signing has long been an “ugly duckling” of DNSSEC. In online signing, requests are signed “on demand” — the keys are pulled out right when the response is generated, and the appropriate imprimatur is applied. While this seems scary, keep in mind this is how SSL, SSH, IPSec, and most other crypto protocols function. DJB’s own DNSCurve is an online signer.
PGP/GPG are not online signers. They have dramatic scalability problems, as nobody says aloud but we all know. (To the extent they will be made scalable, they will most likely be retrieving and refreshing key material via DNSSEC.)
Online signing has history with DNSSEC. RFC4470 and RFC4471 discuss the precise ways and means online signing can integrate with the original DNSSEC. Bert Hubert’s been working on PowerDNSSEC for some time, which directly integrates online signing into very large scale hosting.
And then there’s Phreebird. Phreebird is my DNSSEC proxy; I’ve been working on it for some time. Phreebird is an online signer that operates, “bump in the wire” style, in front of existing DNS infrastructure. (This should seem familiar; it’s the deployment model suggested by DJB for DNSCurve.) Got a dynamic signer, that changes up responses according to time of day, load, geolocation, or the color of its mood ring? I’ve been handling that for months. Worried about unpredictable requests? Phreebird isn’t, it signs whatever responses go by. If new responses are sent, new signatures will be generated. Concerned about preloading large zones? Don’t be, Phreebird’s cache can be as big or as little as you want it to be. It preloads nothing.
As for key and signature rotation, Phreebird can be trivially modified to sign everything with a 5 minute signature — in case you’re particularly concerned with replay. Online signing really does make things easier.
Phreebird can also deal with NSEC3′s “leaking”. Actually, it does so by default.
In the beginning, DNSSEC’s developers realized they needed a feature called Authoritative Nonexistence — a way of saying, “The record you are looking for does not exist, and I can prove it”. This posed a problem. While there are a usually a finite number of names that do exist, the number of non-existent names is effectively infinite. Unique proofs of nonexistence couldn’t be generated for all of them, but the designers really wanted to prevent requiring online signing. (Just because we can sign online, doesn’t mean we should have to sign online.) So they said they’d sign ranges — say there were no names between Alice and Bob, and Bob and Charlie, and Charlie and David.
This wasn’t exactly appreciated by Alice, Bob, Charlie, and David — or, at least, the registries that held their domains. So, instead, Alice, Bob, Charlie, and David’s names were all hashed — and then the statement became, there are no names between H1 (the hash of Charlie) and H2 (the hash of Alice). That, in a nutshell, is NSEC3.
DJB observes, correctly, that there’s only so much value one gets from hashing names. He quotes Ruben Niederhagen at being able to crack through 1.7 Trillion names a day, for example.
This is imposing, until one realizes that at 1000 queries a second, an attacker can sweep a TLD through about 864,000,000 queries a day. Granted: This is not as fast, but it’s much faster than your stock SSHD brute force, and we see substantial evidence of those all the time.
Consider: Back in 2009, DJB used hashcracking to answer Frederico Neves’ challenge to find domains inside of sec3.br. You can read about this here. DJB stepped up to the challenge, and the domains he (and Tanja Lange) found were:
douglas, pegasus, rafael, security, unbound, while42, zz–zz
It does not take 1.7T hashcracks to find these names. It doesn’t even take 864M. Domain names are many things; compliant with password policies is not one of them. Ultimately, the problem with assuming privacy in DNS names is that they’re being published openly to the Internet. What we have here is not a bright line differential, but a matter of degree.
If you want your names completely secret, you have to put them behind the split horizon of a corporate firewall — as many people do. Most CorpNet DNS is private.
However, suppose you want the best of both worlds: Secret names, that are globally valid (at the accepted cost of online brute-forceability). Phreebird has you taken care of — by generating what can be referred to as NSEC3 White Lies. H1 — the hash of Charlie — is a number. It’s perfectly valid to say:
There are no records with a hash between H1-1 and H1+1.
Since there’s only one number between x-1 and x+1 — X, or in this case, the hash of Charlie — authoritative nonexistence is validated, without leaking the actual hash after H1.
Probably the most quoted comment of DJB was the following:
“So what does this mean for distributed denial of service amplification, which is the main function of DNSSEC”
Generally, what he’s referring to is that an attacker can:
- Spoof the source of a DNSSEC query as some victim target (32 byte packet + 8 byte UDP header + 20 byte IP header = 60 byte header, raised to 64 byte due to Minimum Transmission Unit)
- Make a request that returns a lot of data (say, 2K, creating a 32x amplification)
- GOTO 1
It’s not exactly a complicated attack. However, it’s not a new attack either. Here’s a SANS entry from 2009, discussing attacks going back to 2006. Multi gigabit floods bounced off of DNS servers aren’t some terrifying thing from the future; they’re just part of the headache that is actively managing an Internet with ***holes in it.
There is an interesting game, one that I’ll definitely be writing about later, called “Whose bug is it anyway?”. It’s easy to blame DNSSEC, but there’s a lot of context to consider.
Ultimately, the bug is IP’s, since IP (unlike many other protocols) allows long distance transit of data without a server explicitly agreeing to receive it. IP effectively trusts its applications to “play nice” — shockingly, this was designed in the 80′s. UDP inherits the flaw, since it’s just a thin application wrapper around IP.
But the actual fault lies in DNS itself, as the amplification actually begins in earnest under the realm of simple unencrypted DNS queries. During the last batch of gigabit floods, the root servers were used — their returned values were 330 bytes out for every 64 byte packet in. Consider though the following completely randomly chosen query:
# dig +trace cr.yp.to any
cr.yp.to. 600 IN MX 0 a.mx.cr.yp.to.
cr.yp.to. 600 IN MX 10 b.mx.cr.yp.to.
cr.yp.to. 600 IN A 22.214.171.124
yp.to. 259200 IN NS a.ns.yp.to.
yp.to. 259200 IN NS uz5uu2c7j228ujjccp3ustnfmr4pgcg5ylvt16kmd0qzw7bbjgd5xq.ns.yp.to.
yp.to. 259200 IN NS b.ns.yp.to.
yp.to. 259200 IN NS f.ns.yp.to.
yp.to. 259200 IN NS uz5ftd8vckduy37du64bptk56gb8fg91mm33746r7hfwms2b58zrbv.ns.yp.to.
;; Received 414 bytes from 126.96.36.199#53(f.ns.yp.to) in 32 ms
So, the main function of Dan Bernstein’s website is to provide a 6.4x multiple to all DDoS attacks, I suppose?
I keed, I keed. Actually, the important takeaway is that practically every authoritative server on the Internet provides a not-insubstantial amount of amplification. Taking the top 1000 QuantCast names (minus the .gov stuff, just trust me, they’re their own universe of uber-weird; I wouldn’t judge X.509 on the Federal Bridge CA), we see:
- An average ANY query returns 413.9 bytes (seriously!)
- Almost half (460) return 413 bytes or more
So, the question then is not “what is the absolute amplification factor caused by DNSSEC”, it’s “What is the amplification factor caused by DNSSEC relative to DNS?”
It’s ain’t 90x, I can tell you that much. Here is a query for http://www.pir.org ANY, without DNSSEC:
http://www.pir.org. 300 IN A 188.8.131.52
pir.org. 300 IN NS ns1.sea1.afilias-nst.info.
pir.org. 300 IN NS ns1.mia1.afilias-nst.info.
pir.org. 300 IN NS ns1.ams1.afilias-nst.info.
pir.org. 300 IN NS ns1.yyz1.afilias-nst.info.
;; Received 329 bytes from 184.108.40.206#53(ns1.sea1.afilias-nst.info) in 90 ms
And here is the same query, with DNSSEC:
http://www.pir.org. 300 IN A 220.127.116.11
http://www.pir.org. 300 IN RRSIG A 5 3 300 20110118085021 20110104085021 61847 pir.org. n5cv0V0GeWDPfrz4K/CzH9uzMGoPnzEr7MuxPuLUxwrek+922xiS3BJG NfcM9nlbM5GZ5+UPGv668NJ1dx6oKxH8SlR+x3d8gvw2DHdA51Ke3Rjn z+P595ZPB67D9Gh6l61itZOJexwsVNX4CYt6CXTSOhX/1nKzU80PVjiM wg0=
pir.org. 300 IN NS ns1.mia1.afilias-nst.info.
pir.org. 300 IN NS ns1.yyz1.afilias-nst.info.
pir.org. 300 IN NS ns1.ams1.afilias-nst.info.
pir.org. 300 IN NS ns1.sea1.afilias-nst.info.
pir.org. 300 IN RRSIG NS 5 2 300 20110118085021 20110104085021 61847 pir.org. IIn3FUnmotgv6ygxBM8R3IsVv4jShN71j6DLEGxWJzVWQ6xbs5SIS0oL OA1ym3aQ4Y7wWZZIXpFK+/Z+Jnd8OXFsFyLo1yacjTylD94/54h11Irb fydAyESbEqxUBzKILMOhvoAtTJy1gi8ZGezMp1+M4L+RvqfGze+XFAHN N/U=
;; Received 674 bytes from 18.104.22.168#53(ns1.yyz1.afilias-nst.info) in 26 ms
About a 2x increase. Not perfect, but not world ending. Importantly, it’s nowhere close to the actual problem:
DNS comes from 1983, when the load of running around the Internet mapping names to numbers was actually fairly high. As such, it acquired a caching layer — a horde of BIND, Nominum, Unbound, MSDNS, PowerDNS, and other servers that acted as a middle layer between the masses of clients and the authoritative servers of the Internet.
At any given point, there’s between three and twelve million IP addresses on the Internet that operate as caching resolvers, and will receive requests from and send arbitrary records to any IP on the Internet.
Arbitrary attacker controlled records.
;; Query time: 5 msec
;; SERVER: 22.214.171.124#53(126.96.36.199)
;; WHEN: Tue Jan 4 11:10:59 2011
;; MSG SIZE rcvd: 3641
That’s a 3.6KB response to a 64 byte request, no DNSSEC required. I’ve been saying this for a while: DNSSEC is just DNS with signatures. Whose bug is it anyway? Well, at least some of those servers are running DJB’s dnscache…
So, what do we do about this? One option is to attempt further engineering: There are interesting tricks that can be run with ICMP to detect the flooding of an unwilling target. We could also have a RBL — realtime blackhole list — of IP addresses that are under attack. This would get around the fact that this attack is trivially distributable.
Another approach is to require connections that appear “sketchy” to upgrade to TCP. There’s support in DNS for this — the TC bit — and it’s been deployed with moderate success by at least one major DNS vendor. There’s some open questions regarding the performance of TCP in DNS, but there’s no question that kernels nowadays are at least capable of being much faster.
It’s an open question what to do here.
Meanwhile, the attackers chuckle. As DJB himself points out, they’ve got access to 2**23 machines — and these aren’t systems that are limited to speaking random obscure dialects of DNS that can be blocked anywhere on path with a simple pattern matching filter. These are actual desktops, with full TCP stacks, that can join IRC channels and be told to flood the financial site of the hour, right down to the URL!
If you’re curious why we haven’t seen more DNS floods, it might just be because HTTP floods work a heck of a lot better.
On Slide 36, DJB claims the following is possible:
Bob views Alice’s web page on his Android phone. Phone asked hotel DNS cache for web server’s address. Eve forged the DNS response! DNS cache checked DNSSEC but the phone didn’t.
This is true as per the old model of DNSSEC, which inherits a little too much from DNS. As per the old model, the only nodes that participate in record validation are full-on DNS servers. Clients, if they happen to be curious whether a name was securely resolved, have to simply trust the “AD” bit attached to a response.
I think I can speak for the entire security community when I say: Aw, hell no.
The correct model of DNSSEC is to push enough of the key material to the client, that it can make its own decisions. (If a client has enough power to run TLS, it has enough power to validate a DNSSEC chain.) In my Domain Key Infrastructure talk, I discuss four ways this can be done. But more importantly, in Phreebird I actually released mechanisms for two of them — chasing, where you start at the bottom of a name and work your way up, and tracing, where you start at the root and work your way down.
Chasing works quite well, and importantly, leverages the local cache. Here is Phreebird’s output from a basic chase command:
|---www.pir.org. (A) |---pir.org. (DNSKEY keytag: 61847 alg: 5 flags: 256) |---pir.org. (DNSKEY keytag: 54135 alg: 5 flags: 257) |---pir.org. (DS keytag: 54135 digest type: 2) | |---org. (DNSKEY keytag: 1743 alg: 7 flags: 256) | |---org. (DNSKEY keytag: 21366 alg: 7 flags: 257) | |---org. (DS keytag: 21366 digest type: 2) | | |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) | | |---. (DNSKEY keytag: 19036 alg: 8 flags: 257) | |---org. (DS keytag: 21366 digest type: 1) | |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) | |---. (DNSKEY keytag: 19036 alg: 8 flags: 257) |---pir.org. (DS keytag: 54135 digest type: 1) |---org. (DNSKEY keytag: 1743 alg: 7 flags: 256) |---org. (DNSKEY keytag: 21366 alg: 7 flags: 257) |---org. (DS keytag: 21366 digest type: 2) | |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) | |---. (DNSKEY keytag: 19036 alg: 8 flags: 257) |---org. (DS keytag: 21366 digest type: 1) |---. (DNSKEY keytag: 21639 alg: 8 flags: 256) |---. (DNSKEY keytag: 19036 alg: 8 flags: 257)
Chasing isn’t perfect — one of the things Paul Vixie and I have been talking about is what I refer to as SuperChase, encoded by setting both CD=1 (Checking Disabled) and RD=1 (Recursion Desired). Effectively, there are a decent number of records between http://www.pir.org and the root. With the advent of sites like OpenDNS and Google DNS, that might represent a decent number of round trips. As a performance optimization, it would be good to eliminate those round trips, by allowing a client to say “please fill your response with as many packets as possible, so I can minimize the number of requests I need to make”.
But the idea that anybody is going to use a DNSSEC client stack that doesn’t provide end to end semantics is unimaginable. After all, we’re going to be bootstrapping key material with this.
On page 44, DJB claims that because DNSSEC uses offline signing, the only way it could be used to secure web pages is if those pages were signed with PGP.
What? There’s something like three independent efforts to use DNSSEC to authenticate HTTPS sessions.
Leaving alone the fact that DNSSEC isn’t necessarily an offline signer, it is one of the core constructions in cryptography to intermix an authenticator with otherwise-anonymous encryption to defeat an otherwise trivial man in the middle attack. That authenticator can be almost anything — a password, a private key, even a stored secret from a previous interaction. But this is a trivial, basic construction. It’s how EDH (Ephemeral Diffie-Helman) works!
Using keys stored in DNS has been attempted for years. Some mechanisms that come to mind:
- SSHFP — SSH Fingerprints in DNS
- CERT — Certificates in DNS
- DKIM — Domain Keys in DNS
All of these have come under withering criticism from the security community, because how can you possibly trust what you get back from these DNS lookups?
With DNSSEC — even with offline-signed DNSSEC — you can. And in fact, that’s been the constant refrain: “We’ll do this now, and eventually DNSSEC will make it safe.” Intermixing offline signers with online signers is perfectly “legal” — it’s isomorphic to receiving a password in a PGP encrypted email, or sending your SSH public key to somebody via PGP.
So, what I’m shipping today is simple:
http://www.hospital-link.org IN TXT “v=key1 ha=sha1 h=f1d2d2f924e986ac86fdf7b36c94bcdf32beec15″
That’s an offline-signable blob, and it says “If you use HTTPS to connect to http://www.hospital-link.org, you will be given a certificate with the SHA1 hash of f1d2d2f924e986ac86fdf7b36c94bcdf32beec15″. As long as the ground truth in this blob can be chained to the DNS root, by the client and not some random name server in a coffee shop, all is well (to the extent SHA-1 is safe, anyway).
This is not an obscure process. This is a basic construction. Sure, there are questions to be answered, with a number of us fighting over the precise schema. And that’s OK! Let the best code win.
So why isn’t the best code DNSCurve?
DNSCurve is based on something totally awesome: Curve25519. I am not exaggerating when I say, this is something I’ve wanted from the first time I showed up in Singapore for Black Hat Asia, 9 years ago (you can see vague references to it in the man page for lc). Curve25519 essentially lets you do this:
If I have a 32 byte key, and you have a 32 byte key, and we both know eachother’s key, we can mix them to create a 32 byte secret.
32 bytes is very small.
What’s more, Curve25519 is fast(er). Here’s comparative benchmarks from a couple of systems:
SYSTEM 1 (Intel Laptop, Cygwin, curve25519-20050915 from DJB):
RSA1024 OpenSSL sign/s: 520.2
RSA1024 OpenSSL verify/s: 10874.0
Curve25519 operations/s: 4131
SYSTEM 2 (Amazon Small VM, curve25519-20050915):
RSA1024 OpenSSL sign/s: 502.6
RSA1024 OpenSSL verify/s: 11689.8
Curve25519 operations/s: 507.25
SYSTEM 3 (Amazon XLarge VM, using AGL’s code here for 64 bit compliance):
RSA1024 OpenSSL sign/s: 1048.4
RSA1024 OpenSSL verify/s: 19695.4
Curve25519 operations/s: 4922.71
While these numbers are a little lower than I remember them — for some reason, I remember 16K/sec — they’re in line with DJB’s comments here: 500M clients a day is about 5787 new sessions a second. So generally, Curve25519 is about 4 to 8 times faster than RSA1024.
It’s worth noting that, while 4x-8x is significant today, it won’t be within a year or two. That’s because by then we’ll have RSA acceleration via GPU. Consider this tech report — with 128 cores, they were able to achieve over 5000 RSA1024 decryption operations per second. Modern NVIDIA cards have over 512 cores, with across the board memory and frequency increases. That would put RSA speed quite a bit beyond software Curve25519. Board cost? $500.
That being said, Curve25519 is quite secure. Even with me bumping RSA up to 1280bit in Phreebird by default, I don’t know if I reach the putative level of security in this ECC variant.
The most exciting aspect of Curve25519 is its ability to, with relatively little protocol overhead, create secure links between two peers. The biggest problem with Curve25519 is that it seems to get its performance by sacrificing the capacity to sign once and distribute a message to many parties. This is something we’ve been able to do with pretty much every asymmetric primitive thus far — RSA, DH (via DSA), ECC (via ECDSA), etc. I don’t know whether DJB’s intent is to enforce a particular use pattern, or if this is an actual technical limitation. Either way, Alice can’t sign a message and hand it to Bob, and have Bob prove to Charlie that he received that message from Alice.
In DNSCurve, all requests are unique — they’re basically cryptographic blobs encapsulated in DNS names, sent directly to target name servers or tunneled through local servers, where they are uncacheable due to their uniqueness. (Much to my amusement and appreciation, the DNSCurve guys realized the same thing I did — gotta tunnel through TXT if you want to survive.) So one hundred thousand different users at the same ISP, all looking up the same http://www.cnn.com address, end up issuing unique requests that make their way all the way to CNN’s authoritative servers.
Under the status quo, and under DNSSEC, all those requests would never leave the ISP, and would simply be serviced locally.
It was ultimately this characteristic of DNSCurve, above all others, that caused me to reject the protocol in favor of DNSSEC. It was my estimation that this would cause something like a 100x increase in load on authoritative name servers.
DJB claims to have measured the effect of disabling caching at the ISP layer, and says the increase is little more than 15%, or 1.15x. He declares my speculation, “wild”. OK, that’s fine, I like being challenged! Lets take a closer look.
All caches have a characteristic known as the hit rate. For every query that comes in, what’s the chances that it will require a lookup to a remote authoritative server, vs. being servicable from the cache? A hit rate of 50% would imply a 2x increase in load. 75% would imply 4x. 87.5%? 8x.
Inverting the math, for the load differential to be just 15%, that would mean only about 6% of queries were being hosted from local cache. Do we, in fact, see a 6% hitrate? Lets see what actual operations people say about their name servers:
In percentages is dat 80 tot 85% hits…Daar kom je ook iets boven de 80%.
–XS4ALL, a major Dutch ISP (80-85% numbers)
I’ve attached a short report from a couple of POP’s at a mid-sized (3-4 M subs) ISP. It’s just showing a couple of hours from earlier this fall (I kind of chose it at random).
It’s extremely consistent across a number of different sites that I checked (not the 93%, but 85-90% cache hits): 0.93028, 0.92592, 0.93094, 0.93061, 0.92741
–Major DNS supplier
I don’t have precise cache hit rates for you, but I can give you this little fun tidbit. If you have a 2-tier cache, and the first tier is tiny (only a couple hundred entries), then you’ll handle close to 50% of the
queries with the first cache…
Actually, those #s are old. We’re now running a few K first cache, and have a 70%+ hit rate there.
–Major North American ISP
So, essentially, unfiltered consensus hitrate is about 80-90%, which puts the load increase at 5x-10x in general. Quite a bit less than the 100x I was worried about, right? Well, lets look at one last dataset:
Dec 27 03:30:25 solaria pdns_recursor: stats: 11133 packet cache entries, 99% packet cache hits
Dec 27 04:00:26 solaria pdns_recursor: stats: 17884 packet cache entries, 99% packet cache hits
Dec 27 04:30:28 solaria pdns_recursor: stats: 22522 packet cache entries, 99% packet cache hits
–Network at 27th Chaos Communication Congress
Well, there’s our 100x numbers. At least (possibly more than) 99% of requests to 27C3′s most popular name server were hosted out of cache, rather than spawning an authoritative lookup.
Of course, it’s interesting to know what’s going on. This quote comes from an enormous supplier of DNS resolutions.
Facebook loads resources from dynamically (seemingly) created subdomains. I’d guess for alexa top 10,000 hit rate is 95% or more. But for other popular domains like Facebook, absent a very large cache, hit rate will be exceptionally low. And if you don’t descriminate cache policy by zone, RBLs will eat your entire cache, moving hit rate very very low.
–Absolutely Enormous Resolution Supplier
Here’s where it becomes clear that there are domains that choose to evade caching — they’ve intentionally engineered their systems to emit low TTLs and/or randomized names, so that requestors can get the most up to date records. That’s fine, but those very nodes are directly impacting the hitrate — meaning your average authoritative is really being saved even more load by the existence of the DNS caching layer.
It gets worse: As Paul Wouters points out, there is not a 1-to-1 relationship between cache misses and further queries. He writes:
You’re forgetting the chain reaction. If there is a cache miss, the server has to do much more then 1 query. It has to lookup NS records, A records, perhaps from parents (perhaps cached) etc. One cache miss does not equal one missed cache hit!
To the infrastructure, every local endpoint cache hit means saving a handful of lookups.
That 99% cache hit rate looks even worse now. Looking at the data, 100x or even 200x doesn’t seem quite so wild, at least compared to the fairly obviously incorrect estimation of 1.15x.
Actually, when you get down to it, the greatest recipients of traffic boost are going to be nodes with a large number of popular domains that are on high TTLs, because those are precisely what:
a) Get resolved by multiple parties, and
b) Stay in cache, suppression further resolution
Yeah, so that looks an awful lot like records hosted at the TLDs. Somehow I don’t think they’re going to deploy DNSCurve anytime soon.
(And for the record, I’m working on figuring out the exact impact of DNSCurve on each of the TLDs. After all, data beats speculation.)
[UPDATE: DJB challenges this section, citing local caching effects. More info here.]
DNSSEC lets you choose: Online signing, with its ease of use and extreme flexibility. Or offline signing, with the ability to keep keying material in very safe places.
A while back, I had the pleasure of being educated in the finer points of DNS key risk management, by Robert Seastrom of Afilias (they run .org). Lets just say there are places you want to be able to distribute DNS traffic, without actually having “the keys to the kingdom” physically deployed. As Paul Wouters observed:
Do you know how many instances of rootservers there are? Do you really want that key to live in 250+ places at once? Hell no!
DNSSEC lets you choose how widely to distribute your key material. DNSCurve does not.
Yes, Curve25519 is fast. But nothing in computers is “instantaneous”, as DJB repeatedly insisted Curve25519 is at 27C3. A node that’s doing 5500 Curve25519 operations a second is likely capable of 50,000 to 100,000 DNS queries per second. We’re basically eating 5 to 10 qps per Curve25519 op — which makes sense, really, since no asymmetric crypto is ever going to be as easy as burping something out on the wire.
DJB gets around this by presuming large scale caching. But while DNSSEC can cache by record — with cache effectiveness at around 70% for just a few thousand entries, according to the field data — DNSCurve has to cache by session. Each peer needs to retain a key, and the key must be looked up.
Random lookups into a 500M-record database (as DJB cites for .com), on the required millisecond scale, aren’t actually all that easy. Not impossible, of course, but messy.
This is unavoidable. Caches allow entire trust chains to be aggregated on network links near users. Even though DNSSEC hasn’t yet built out a mechanism for a client to retrieve that chain in a single query, there’s nothing stopping us on a protocol level from really leveraging local caches.
DNSCurve can never use these local caches. Every query must round-trip between the host and each foreign server in the resolution chain — at least, if end to end trust is to be maintained.
(In the interest of fairness, there are modes of validating DNSSEC on endpoints that bypass local caches entirely and go straight to the root. Unbound, or libunbound as used in Phreebird, will do this by default. It’s important that the DNSSEC community be careful about this, or we’ll have the same fault I’m pointing out in DNSCurve.)
DNSCurve Also Can’t Sign For Its Delegations
If there’s one truly strange argument in DJB’s presentation, it’s on Page 39. Here, he complains that in DNSSEC, .org being signed doesn’t sign all the records underneath .org, like wikipedia.org.
Huh? Wikipedia.org is a delegation, an unsigned one at that. That means the only information that .org can sign is either:
1) Information regarding the next key to be used to sign wikipedia.org records, along with the next name server to talk to.
2) Proof there is no next key, and Wikipedia’s records are unsigned
That’s it. Those are the only choices, because Wikipedia’s IP addresses are not actually hosted by the .org server. DNSCurve either implements the exact same thing, or it’s totally undeployable, because without support for unsigned delegations 100% of your children must be signed in order for you to be signed. I can’t imagine DNSCurve has that limitation.
I don’t hate CurveCP. In fact, if there isn’t a CurveCP implementation out within a month, I’ll probably write one myself. If I’ve got one complaint about what I’ve heard about it, it’s that it doesn’t have a lossy mode (trickier than you think — replay protection etc).
CurveCP is seemingly an IPsec clone, over which a TCP clone runs. That’s not actually accurate. See, our inability to do small and fast asymmetric cryptography has forced us to do all these strange things to IP, making stateful connections even when we just wanted to fire and forget a packet. With CurveCP, if you know who you’re sending to, you can just fire and forget packets — the overhead, both in terms of effect on CPU and bandwidth, is actually quite affordable.
This is what I wanted for linkcat almost a decade ago! There are some beautiful networking protocols that could be made with Curve25519. CurveCP is but one. The fact that CurveCP would run entirely in userspace is a bonus — controversial as this may be, the data suggests kernel bound APIs (like IPsec) have a lot more trouble in the field than usermode APIs (like TLS/HTTPS).
There is a weird thing with CurveCP, though. DJB is proposing tunneling it over 53/udp. He’s not saying to route it through DNS proper, i.e. shunting all over CurveCP into the weird formats you have to use when proxying off a name server. He just wants data to move over 53/udp, because he thinks it gets through firewalls easier. Now, I haven’t checked into this in at least six years, but back when I was all into tunneling data over DNS, I actually had to tunnel data over DNS, i.e. fully emulate the protocol. Application Layer Gateways for DNS are actually quite common, as they’re a required component of any captive portal. I haven’t seen recent data, but I’d bet a fair amount that random noise over 53/udp will actually be blocked on more networks than random noise over some higher UDP port.
This is not an inherent aspect of CurveCP, though, just an implementation detail.
Unfortunately, my appreciation of CurveCP has to be tempered by the fact that it does not, in fact, solve our problems with HTTPS. DJB seems thoroughly convinced that the reason we don’t have widespread HTTPS is because of performance issues. He goes so far as to cite Google, which displays less imagery to the user if they’re using https://encrypted.google.com.
Source Material is a beautiful thing. According to Adam Langley, who’s actually the TLS engineer at Google:
If there’s one point that we want to communicate to the world, it’s that SSL/TLS is not computationally expensive any more. Ten years ago it might have been true, but it’s just not the case any more. You too can afford to enable HTTPS for your users.
In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. Many people believe that SSL takes a lot of CPU time and we hope the above numbers (public for the first time) will help to dispel that.
If you stop reading now you only need to remember one thing: SSL/TLS is not computationally expensive any more.
Given that Adam is actually the engineer who wrote the generic C implementation of Curve25519, you’d think DJB would listen to his unambiguous, informed guidance. But no, talk after talk, con after con, DJB repeats the assertion that performance is why HTTPS is poorly deployed — and thus, we should throw out everything and move to his faster crypto function.
(He also suggests that by abandoning HTTPS and moving to CurveCP, we will somehow avoid the metaphorical attacker who has the ability to inject one packet, but not many, and doesn’t have the ability to censor traffic. That’s a mighty fine hair to split. For a talk that is rather obsessed with the goofiness of thinking partial defenses are meaningful, this is certainly a creative new security boundary. Also, non-existent.)
I mentioned earlier: Security is larger than cryptography. If you’ll excuse the tangent, it’s important to talk about what’s actually going on.
The average major website is not one website — it is an agglomeration, barely held together with links, scripts, images, ads, APIs, pages, duct tape, and glue.
For HTTPS to work, everything (except, as is occasionally argued, images) must come from a secure source. It all has to be secure, at the exact same time, or HTTPS fails loudly, and appropriately.
At this point, anyone who’s ever worked at a large company understands exactly, to a T, why HTTPS deployment has been so difficult. Coordination across multiple units is tricky even when there’s money to be made. When there isn’t money — when there’s merely defense against customer data being lost — it’s a lot harder.
The following was written in 2007, and was not enough to make Google start encrypting:
‘The goal is to identify the applications being used on the network, but some of these devices can go much further; those from a company like Narus, for instance, can look inside all traffic from a specific IP address, pick out the HTTP traffic, then drill even further down to capture only traffic headed to and from Gmail, and can even reassemble emails as they are typed out by the user.‘
Now, far be it from me to speculate as to what actually moved Google towards HTTPS, but it would be my suspicion that this had something to do with it:
Google is now encrypting all Gmail traffic from its servers to its users in a bid to foil sniffers who sit in cafes, eavesdropping in on traffic passing by, the company announced Wednesday.
The change comes just a day after the company announced it might pull its offices from China after discovering concerted attempts to break into Gmail accounts of human rights activists.
There is a tendency to blame the business guys for things. If only they cared enough! As I’ve said before, the business guys have blown hundreds of millions on failed X.509 deployments. They cared enough. The problems with existing trust distribution systems are (and I think DJB would agree with me here) not just political, but deeply, embarrassingly technical as well.
Once upon a time, I wanted to write a kernel module. This module would quietly and efficiently add a listener on 443/TCP, that was just a mirror of 80/TCP. It would be like stunnel, but at native speeds.
But what certificate would it emit?
See, in HTTP, the client declares the host it thinks it’s talking to, so the server can “morph” to that particular identity. But in HTTPS, the server declares the host it thinks it is, so the client can decide whether to trust it or not.
This has been a problem, a known problem, since the late nineties. They even built a spec, called SNI (Server Name Indication), that allowed the client to “hint” to the server what name it was looking for.
Didn’t matter. There’s still not enough adoption of SNI for servers to vhost based off of it. So, if you want to deploy TLS, you not only have to get everybody, across all your organizations and all your partners and all your vendors and all your clients to “flip the switch”, but you also have to get them to renumber their networks.
Those are three words a network engineer never, ever, ever wants to hear.
And we haven’t even mentioned the fact that acquiring certificates is an out-of-organization acquisition, requiring interactions with outsiders for every new service that’s offered. Empirically, this is a fairly big deal. Devices sure don’t struggle to get IP addresses — but the contortions required to get a globally valid certificate into a device shipped to a company are epic and frankly impossible. To say nothing of what happens when one tries to securely host from a CDN (Content Distribution Network)!
DNSSEC, via its planned mechanisms for linking names to certificate hashes, bypasses this entire mess. A hundred domains can CNAME to the same host, with the same certificate, within the CDN. On arrival, the vhosting from the encapsulated HTTP layer will work as normal. My kernel module will work fine.
‘Bout time we fixed that!
So. When I say security is larger than cryptography — it’s not that I’m saying cryptography is small. I’m saying that security, actual effective security, requires being worried about a heck of alot more failure modes than can fit into a one hour talk.
I’ve saved my deepest concern for last.
I can’t believe DJB fell for Zooko’s Triangle.
So, I met Zooko about a decade ago. I had no idea of his genius. And yet, there he was, way back when, putting together the core of DJB’s talk at 27C3. Zooko’s Triangle is a description of desirable properties of naming systems, of which only two can be implemented at any one time. They are:
- Secure: Only the actual owner of the name is found.
- Decentralized: There is no “single point of failure” that everyone trusts to provide naming services
- Human Readable: The name is such that humans can read it.
DJB begins by recapitulating Nyms, an idea that keeps coming up through the years:
“Nym” case: URL has a key!
Recognize magic number 123 in http://1238675309.twitter.com and extract key 8675309.
(Technical note: Keys are actually longer than this, but still fit into names.)
DJB shortens the names in his slides — and admits that he does this! But, man, they’re really ugly:
I actually implemented something very similar in the part of Phreebird that links DNSSEC to the browser lock in your average web browser:
I didn’t invent this approach, though, and neither did DJB. While I was personally inspired by the Self-Certifying File System, this is the “Secure and Decentralized — Not Human Readable” side of Zooko, so this shows up repeatedly.
For Phreebird, this was just a neat trick, something funny and maybe occasionally useful when interoperating with a config file or two. It wasn’t meant to be used as anything serious. Among other things, it has a fundamental UI problem — not simply that the names are hideous (though they are), but that they’ll necessarily be overridden. People think bad security UI is random, like every once in a while the vapors come over a developer and he does something stupid.
No. You can look at OpenSSH and say — keys will change over time, and users will have to have a way to override their cache. You can look at curl and say — sometimes, you just have to download a file from an HTTPS link that has a bad cert. There will be a user experience telling you not one, but two ways to get around certificate checking.
You can look at a browser and say, well, if a link to a page works but has the wrong certificate hash in the sending link, the user is just going to have to be prompted to see if they want to browse anyway.
It is tempting, then, to think bad security UI is inevitable, that there are no designs that could possibly avoid it. But it is not the fact that keys change that force an alert. It’s that they change, without the legitimate administrator having a way to manage the change. IP addresses change all the time, and DNS makes it totally invisible. Everybody just gets the new IPs — and there’s no notification, no warnings, and certainly no prompting.
Random links smeared all over the net will prompt like little square Christmas ornaments. Keys stored in DNS simply won’t.
So DJB says, don’t deploy long names, instead use normal looking domains like www. Then, behind the scenes, using the DNS aliasing feature known as CNAME, link www to the full Nym.
Ignore the fact that there are applications actually using the CNAME data for meaningful things. This is very scary, very dangerous stuff. After all, you can’t just trust your network to tell you the keyed identity of the node you’re attempting to reach. That’s the whole point — you know the identity, the network is untrusted. DJB has something that — if comprehensively deployed, all the way to the root, does this. DNSCurve, down from the root, down from com, down to twitter.com would in fact provide a chain of trust that would allow http://www.twitter.com to CNAME to some huge ugly domain.
Whether or not it’s politically feasible (it isn’t), it is a scheme that could work. It is secure. It is human readable. But it is not decentralized — and DJB is petrified of trusted third parties.
And this, finally, is when it all comes off the rails. DJB suggests that all software could perhaps ship with lists of keys of TLDs — .com, .de, etc. Of course, such lists would have to be updated.
Never did I think I’d see one of the worst ideas from DNSSEC’s pre-root-signed past recycled as a possibility. That this idea was actually called the ITAR (International Trust Anchor Repository), and that DJB was advocating ITAR, might be the most surreal experience of 2010 for me.
(Put simply, imagine every device, every browser, every phone FTP’ing updates from some vendor maintained server. It was a terrible idea when DNSSEC suggested it, and even they only did so under extreme duress.)
But it gets worse. For, at the end of it, DJB simply through up his hands and said:
“Maybe P2P DNS can help.”
Now, I want to be clear (because I screwed this up at first): DJB did not actually suggest retrieving key material from P2P DNS. That’s good, because P2P DNS is the side of Zooko’s Triangle where you get decentralization and human readable names — but no security! Wandering a cloud, asking if anyone knows the trusted key for Twitter, is an unimaginably bad idea.
No, he suggested some sort of split system, where you actually and seriously use URLs like http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com to identify peers, but P2P DNS tells you what IPs to speak to them at.
What? Isn’t security important to you?
After an hour of hearing how bad DNSSEC must be, ending up here is…depressing. In no possible universe are Nymic URLs a good idea for anything but wonky configuration entries. DJB himself was practically apologizing for them. They’re not even a creative idea.
No system can credibly purport to be solving the problems of authentication and encryption for anybody, let alone the whole Internet, without offering a serious answer to the question of Key Management. To be blunt, this is the hard part. New transports and even new crypto are optimizations, possibly even very interesting ones. But they’re not where we’re bleeding.
The problem is that it’s very easy to give a node an IP address that anyone can route to, but very hard to give a node an identity that anybody can verify. Key Management — the creation, deletion, integration, and management of identities within and across organizational boundaries — this is where we need serious solutions.
Nyms are not a serious solution. Neither is some sort of strange half-TLD, half P2PDNS, split identity/routing hack. Solving key management is not actually optional. This is where the foundation of an effective solution must be found.
DNSSEC has a coherent plan for how keys can be managed across organizational boundaries — start at the root, delegate down, and use governance and technical constraints to keep the root honest. It’s worked well so far, or you wouldn’t be reading this blog post right now. There’s no question it’s not a perfect protocol — what is? — but the criticisms coming in from DJB are fairly weak (and unfairly old) in context, and the alternative proposed is sadly just smoke and mirrors without a keying mechanism.
Short version: DNS over TCP (or HTTP) is almost certainly not faster than DNS over UDP, for any definition of faster. There was some data that supported a throughput interpretation of speed, but that data is not replicating under superior experimental conditions. Thanks to Tom Ptacek for prodding me into re-evaluating my data.
So one of the things I haven’t gotten a chance to write a diary entry about yet, is the fact that when implementing end-to-end DNSSEC, there will be environments in which arbitrary DNS queries just aren’t an option. In such environments, we will need to find a way to tunnel traffic.
Inevitably, this leads us to HTTP, the erstwhile “Universal Tunneling Protocol”.
Now, I don’t want to go ahead and write up this entire concern now. What I do want to do is discuss a particular criticism of this concern — that HTTP, being run over TCP, would necessarily be too slow in order to function as a DNS transport.
I decided to find out.
I am, at my core, an empiricist. It’s my belief that the security community runs a little too much on rumor and nowhere nearly enough on hard, repeatable facts. So, I have a strong bias towards actual empirical results.
One has to be careful, though — as they say, data is not information, information is not knowledge, and knowledge is not wisdom.
So, Phreebird is built on top of libevent, a fairly significant piece of code that makes it much easier to write fast network services. (Libevent essentially abstracts away the complex steps it’s required to make modern kernel networking fast.) Libevent supports UDP, TCP, and HTTP transports fairly elegantly, and even simultaneously, so I built each endpoint into Phreebird.
Then, I ran a simple benchmark. From my Black Hat USA slides:
# DNS over UDP
./queryperf -d target2 -s 188.8.131.52 -l 10
Queries per second: 3278.676726 qps
# DNS over HTTP
ab -c 100 -n 10000 http://184.108.40.206/Rz8BAAABAAAAAAAAA3d3dwNjbm4DY29tAAABAAE=
Requests per second: 3910.13 [#/sec] (mean)
Now, at Black Hat USA, the title of my immediate next slide was “Could be Wrong!”, with the very first bullet point being “Paul Vixie thinks I am”, and the second point being effectively “this should work well enough, especially if we can retrieve entire DNS chains over a single query”. But, aside from a few raised eyebrows, the point didn’t get particularly challenged, and I dropped the skepticism from the talk. I mean, there’s a whole bunch of crazy things going on in the DKI talk, I’ve got more important things to delve deeply into, right?
Turns out there’s a fair number of things wrong with the observation. First off, saying “DNS over HTTP is faster than DNS over UDP” doesn’t specify which definition of faster I’m referring to. There are two. When I say a web server is fast, it’s perfectly reasonable for me to say “it can handle 50,000 queries per second, which is 25% faster than its competition”. That is speed-as-throughput. But it’s totally reasonable also to say “responses from this web server return in 50ms instead of 500ms”, which is speed-as-latency.
Which way is right? Well, I’m not going to go mealy-mouthed here. Speed-as-latency isn’t merely a possible interpretation, it’s the normal interpretation. And of course, since TCP adds a round trip between client and server, whereas UDP does not, TCP couldn’t be faster (modulo hacks that reduced the number of round trips at the application layer, anyway). What I should have been saying was that “DNS servers can push more traffic over HTTP than they can over UDP.”
Except that’s not necessarily correct as well.
Now, I happen to be lucky. I did indeed store the benchmarking data from my July 2010 experiment comparing UDP and HTTP DNS queries. So I can show I wasn’t pulling numbers out of my metaphorical rear. But I didn’t store the source code, which has long since been updated (changing the performance characteristics), and I didn’t try my tests on multiple servers of different performance quality.
I can’t go back in time and get the code back, but I can sure test Phreebird. Here’s what we see on a couple of EC2 XLarges, after I’ve modified Phreebird to return a precompiled response rather than building packets on demand:
# ./queryperf -d bench -s x.x.x.x -l 10
Queries per second: 25395.728961 qps
# ab -c 1000 -n 100000 ‘http://x.x.x.x/.well-known/dns-http?v=1&q=wn0BAAABAAAAAAAABWJlbmNoBG1hcmsAAAEAAQ==’
Requests per second: 7816.82 [#/sec] (mean)
DNS over HTTP here is about 30% the performance of DNS over UDP. Perhaps if I try some smaller boxes?
# ./queryperf -d bench -s pb-a.org -l 10
Queries per second: 7918.464171 qps
# ab -c 1000 -n 100000 ‘http://pb-a.org/.well-known/dns-http?v=1&q=wn0BAAABAAAAAAAABWJlbmNoBG1hcmsAAAEAAQ==’
Requests per second: 3486.53 [#/sec] (mean)
Well, we’re up to 44% the speed, but that’s a far cry from 125%. What’s going on? Not sure, I didn’t store the original data. There’s a couple other things wrong:
- This is a conflated benchmark. I should have a very small, tightly controlled, correctly written test case server and client. Instead, I’m borrowing demonstration code that’s trying to prove where DNSSEC is going, and stock benchmarkers.
- I haven’t checked if Apache Benchmarker is reusing sockets for multiple queries (I don’t think it is). If it is, that would mean it was amortizing setup time across multiple queries, which would skew the data.
- I’m testing between two nodes on the same network (Amazon). I should be testing across a worldwide cluster. Planetlab calls.
- I’m testing between two nodes, period. That means that the embarrassingly parallelizable problem that is opening up a tremendous number of TCP client sockets, is borne by only one kernel instead of some huge number. I’m continuing to repeat this error on the new benchmarks as well.
- This was a seriously counterintuitive result, and as such bore a particular burden to be released with hard, replicative data (including a repro script).
Now, does this mean the findings are worthless? No. First, there’s no intention of moving all DNS over HTTP, just those queries that cannot be serviced via native transport. Slower performance — by any definition — is superior to no performance at all. Second, there was at least one case where DNS over HTTP was 25% faster. I don’t know where it is, or what caused it. Tom Ptacek supposes that the source of HTTP’s advantage was that my UDP code was hobbled (specifically, through some half-complete use of libevent). That’s actually my operating assumption for now.
Finally, and most importantly, as Andy Steingruebl tweets:
Most tests will prob show UDP faster, but that isn’t the point. Goal is to make TCP fast enough, not faster than UDP.
The web is many things, optimally designed for performance is not one of them. There are many metrics by which we determine the appropriate technological solution. Raw cycle-for-cycle performance is a good thing, but it ain’t the only thing (especially considering the number of problems which are very noticeably not CPU-bound).
True as that all is, I want to be clear. A concept was misunderstood, and even my “correct interpretation” wasn’t backed up by a more thorough and correct repetition of the experiment. It was through the criticism of others (specifically, Tom Ptacek) that this came to light. Thanks, Tom.
That’s science, folks. It’s awesome, even when it’s inconvenient!
Several years ago, I had some fun: I streamed live audio, and eventually video, through the DNS.
Heh. I was young, and it worked through pretty much any firewall. (Still does, actually.) It wasn’t meant to be a serious transport though. DNS was not designed to traffic large amounts of data. It’s a bootstrapper.
But then, we do a lot of things with protocols that we weren’t “supposed” to do. Where do we draw the line?
Obviously DNS is not going to become the next great CDN hack (though I had a great trick for that too). But there’s a real question: How much data should we be putting into the DNS?
Somewhere between “only IP addresses, and only a small number”, and “live streaming video”, there’s an appropriate middle ground. Hard to know where exactly that is.
There is a legitimate question of whether anything the size of a certificate should be stored in DNS. Here is the size of Hotmail’s certificate, at the time of writing:
-----BEGIN CERTIFICATE----- MIIHVTCCBj2gAwIBAgIKKykOkAAIAAHL8jANBgkqhkiG9w0BAQUFADCBizETMBEG CgmSJomT8ixkARkWA2NvbTEZMBcGCgmSJomT8ixkARkWCW1pY3Jvc29mdDEUMBIG CgmSJomT8ixkARkWBGNvcnAxFzAVBgoJkiaJk/IsZAEZFgdyZWRtb25kMSowKAYD VQQDEyFNaWNyb3NvZnQgU2VjdXJlIFNlcnZlciBBdXRob3JpdHkwHhcNMTAxMTI0 MTYzNjQ1WhcNMTIxMTIzMTYzNjQ1WjBuMQswCQYDVQQGEwJVUzELMAkGA1UECBMC V0ExEDAOBgNVBAcTB1JlZG1vbmQxEjAQBgNVBAoTCU1pY3Jvc29mdDEUMBIGA1UE CxMLV2luZG93c0xpdmUxFjAUBgNVBAMTDW1haWwubGl2ZS5jb20wggEiMA0GCSqG SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDVCWje/SRDef6Sad95esXcZwyudwZ8ykCZ lCTlXuyl84yUxrh3bzeyEzERtoAUM6ssY2IyQBdauXHO9f+ZEz09mkueh4XmD5JF /NhpxdpPMC562NlfvfH/f+8KzKCPhPlkz4DwYFHsknzHvyiz2CHDNffcCXT+Bnrv G8eEPbXckfEFB/omArae0rrJ+mfo9/TxauxyX0OsKv99d0WO0AyWY2/Bt4G+lSuy nBO7lVSadMK/pAxctE+ZFQM2nq3G4o+L95HeuG4m5NtIJnZ/7dZwc8HuXuMQTluA ZL9iqR8a24oSVCEhTFjm+iIvXAgM+fMrjN4jHHj0Vo/o0xQ1kJLbAgMBAAGjggPV MIID0TALBgNVHQ8EBAMCBLAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMB MHgGCSqGSIb3DQEJDwRrMGkwDgYIKoZIhvcNAwICAgCAMA4GCCqGSIb3DQMEAgIA gDALBglghkgBZQMEASowCwYJYIZIAWUDBAEtMAsGCWCGSAFlAwQBAjALBglghkgB ZQMEAQUwBwYFKw4DAgcwCgYIKoZIhvcNAwcwHQYDVR0OBBYEFNHp09bqrsUO36Nc 1mQgHqt2LeFiMB8GA1UdIwQYMBaAFAhC49tOEWbztQjFQNtVfDNGEYM4MIIBCgYD VR0fBIIBATCB/jCB+6CB+KCB9YZYaHR0cDovL21zY3JsLm1pY3Jvc29mdC5jb20v cGtpL21zY29ycC9jcmwvTWljcm9zb2Z0JTIwU2VjdXJlJTIwU2VydmVyJTIwQXV0 aG9yaXR5KDgpLmNybIZWaHR0cDovL2NybC5taWNyb3NvZnQuY29tL3BraS9tc2Nv cnAvY3JsL01pY3Jvc29mdCUyMFNlY3VyZSUyMFNlcnZlciUyMEF1dGhvcml0eSg4 KS5jcmyGQWh0dHA6Ly9jb3JwcGtpL2NybC9NaWNyb3NvZnQlMjBTZWN1cmUlMjBT ZXJ2ZXIlMjBBdXRob3JpdHkoOCkuY3JsMIG/BggrBgEFBQcBAQSBsjCBrzBeBggr BgEFBQcwAoZSaHR0cDovL3d3dy5taWNyb3NvZnQuY29tL3BraS9tc2NvcnAvTWlj cm9zb2Z0JTIwU2VjdXJlJTIwU2VydmVyJTIwQXV0aG9yaXR5KDgpLmNydDBNBggr BgEFBQcwAoZBaHR0cDovL2NvcnBwa2kvYWlhL01pY3Jvc29mdCUyMFNlY3VyZSUy MFNlcnZlciUyMEF1dGhvcml0eSg4KS5jcnQwPwYJKwYBBAGCNxUHBDIwMAYoKwYB BAGCNxUIg8+JTa3yAoWhnwyC+sp9geH7dIFPg8LthQiOqdKFYwIBZAIBCjAnBgkr BgEEAYI3FQoEGjAYMAoGCCsGAQUFBwMCMAoGCCsGAQUFBwMBMIGuBgNVHREEgaYw gaOCDyoubWFpbC5saXZlLmNvbYINKi5ob3RtYWlsLmNvbYILaG90bWFpbC5jb22C D2hvdG1haWwubXNuLmNvbYINaG90bWFpbC5jby5qcIINaG90bWFpbC5jby51a4IQ aG90bWFpbC5saXZlLmNvbYITd3d3LmhvdG1haWwubXNuLmNvbYINbWFpbC5saXZl LmNvbYIPcGVvcGxlLmxpdmUuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQC9trl32j6J ML00eewSJJ+Jtcg7oObEKiSWvKnwVSmBLCg0bMoSCTv5foF7Rz3WTYeSKR4G72c/ pJ9Tq28IgBLJwCGqUKB8RpzwlFOB8ybNuwtv3jn0YYMq8G+a6hkop1Lg45d0Mwg0 TnNICdNMaHx68Z5TK8i9QV6nkmEIIYQ32HlwVX4eSmEdxLX0LTFTaiyLO6kHEzJg CxW8RKsTBFRVDkZQ4CtxpvSV3OSJEEoHiJ++RiLZYY/1XRafwxqESMn+bGNM7aoE NHJz3Uzu2/rSFQ5v7pmTpJokNcHl8hY1fCFs01PYkoWm0WXKYnDHL4+L46orvsyE GM1PAYpIdTSp -----END CERTIFICATE-----
Compared to the size of an IP address, this is a bit much. There are three things that give me pause about pushing so much in.
First — and, yes, this is a personal bias — every time we try to put this much in DNS, we end up creating a new protocol in which we don’t. There’s a dedicated RRTYPE for certificate storage, called CERT. From what I can tell, the PGP community used CERT, and then migrated away. There’s also the experience we can see in the X.509 realm, where they had many places where certificate chains were declared inline. For various reasons, these in-protocol declarations were farmed out into URLs in which resources could be retrieved as necessary. Inlining became a scalability blocker.
Operational experience is important.
Second, there’s a desire to avoid DNS packets that are too big to fit into UDP frames. UDP, for User Datagram Protocol, is basically a thin application based wrapper around IP itself. There’s no reliability and few features around it — it’s little more than “here’s the app I want to talk to, and here’s a checksum for what I was trying to say” (and the latter is optional). When using UDP, there are two size points to be concerned with.
The first is (roughly) 512 bytes. This is the traditional maximum size of a DNS packet, because it’s the traditional minimum size of a packet an IP network can be guaranteed to transmit without fragmentation en route.
The second is (roughly) 1500 bytes. This is essentially how much you’re able to move over IP (and thus UDP) itself before your packet gets fragmented — generally immediately, because it can’t even get past the local Ethernet card.
There’s a strong desire to avoid IP fragmentation, as it has a host of deleterious effects. If any IP fragment is dropped, the entire packet is lost. So there’s “more swings at the bat” at having a fatal drop. In the modern world of firewalls, fragmentation isn’t supported well as you can’t know whether to pass a packet until the entire thing has been reassembled. I don’t think any sites flat out block fragmented traffic, but it’s certainly not something that’s optimized for.
Finally — and more importantly than anyone admits — fragmented traffic is something of a headache to debug. You have to have reassembly in your decoders, and that can be slow.
However, we’ve already sort of crossed the rubicon here. DNSSEC is many things, but small is not one of them. The increased size of DNSSEC responses is not by any stretch of the imagination fatal, but it does end the era of tiny traffic. This is made doubly true by the reality that, within the next five or so years, we really will need to migrate to keysizes greater than 1024. I’m not sure we can afford 2048, though NIST is fairly adamant about that. But 1024 is definitely the new 512.
That’s not to say that DNSSEC traffic will always fragment at UDP. It doesn’t. But we’ve accepted that DNS packets will get bigger with DNSSEC, and fairly extensive testing has shown that the world does not come to an end.
It is a reasonable thing to point out though , that while use of DNSSEC might lead to fragmentation, use of massive records in the multi-kilobyte range will, every time.
(There’s an amusing thing in DNSSEC which I’ll go so far as to say I’m not happy about. There’s actually a bit, called DO, that says whether the client wants signatures. Theoretically we could use this bit to only send DNSSEC responses to clients that are actually desiring signatures. But I think somebody got worried that architectures would be built to only serve DNSSEC to 0.01% of clients — gotta start somewhere. So now, 80% of DNS requests claim that they’ll validate DNSSEC. This is…annoying.)
Now, there’s of course a better way to handle large DNS records than IP fragmenting: TCP. But TCP has historically been quite slow, both in terms of round trip time (there’s a setup penalty) and in terms of kernel resources (you have to keep sockets open). But a funny thing happened on the way to billions of hits a day.
TCP stopped being so bad.
The setup penalty, it turns out, can be amortized across multiple queries. Did you know that you can run many queries off the same TCP DNS socket, pretty much exactly like HTTP pipelines? It’s true! And as for kernel resources…
TCP stacks are now fast. In fact — and, I swear, nobody was more surprised than me — when I built support for what I called “HTTP Virtual Channel” into Phreebird and LDNS, so that all DNS queries would be tunneled over HTTP, the performance impact wasn’t at all what I thought it would be:
DNS over HTTP is actually faster than DNS over UDP — by a factor of about 25%. Apparently, a decade of optimising web servers has had quite the effect. Of course, this requires a modern approach to TCP serving, but that’s the sort of thing libevent makes pretty straightforward to access. (UPDATE: TCP is reasonably fast, but it’s not UDP fast. See here for some fairly gory details.) And it’s not like DNSSEC isn’t necessitating pretty deep upgrades to our name server infrastructure anyway.
So it’s hard to argue we shouldn’t have large DNS records, just because it’ll make the DNS packets better. That ship has sailed. There is of course the firewalling argument — you can’t depend on TCP, because clients may not support the transport. I have to say, if your client doesn’t support TCP lookup, it’s just not going to be DNSSEC compliant.
Just part of compliance testing. Every DNSSEC validator should in fact be making sure TCP is an available endpoint.
There’s a third argument, and I think it deserves to be aired. Something like 99.9% of users are behind a DNS cache that groups their queries with other users. On average, something like 90% of all queries never actually leave their host networks; instead they are serviced by data already in the cache.
Would it be fair for one domain to be monopolizing that cache?
This isn’t a theoretical argument. More than a few major Top 25 sites have wildcard generated names, and use them. So, when you look in the DNS cache, you indeed see huge amounts of throwaway domains. Usually these domains have a low TTL (Time to Live), meaning they expire from cache quickly anyway. But there are grumbles.
My sense is that if cache monopolization became a really nasty problem, then it would represent an attack against a name server. In other words, if we want to say that it’s a bug for http://www.foo.com to populate the DNS cache to the exclusion of all others, then it’s an attack for http://www.badguy.com to be able to populate said cache to the same exclusion. It hasn’t been a problem, or an attack I think, because it’s probably the least effective denial of service imaginable.
I could see future name servers tracking the popularity of cache entries before determining what to drop. Hasn’t been needed yet, though.
Where this comes down to is — is it alright for DNS to require more resources? Like I wrote earlier, we’ve already decided it’s OK for it to. And frankly, no matter how much we shove into DNS, we’re never getting into the traffic levels that any other interesting service online is touching. Video traffic is…bewilderingly high.
So do we end up with a case for shoving massive records into the DNS, and justifying it because a) DNSSEC does it and b) They’re not that big, relative to say Internet Video? I have to admit, if you’re willing to “damn the torpedos” on IP Fragmentation or upgrading to TCP, the case is there. And ultimately, the joys of a delegated namespace is that — it’s your domain, you can put whatever you want in there.
My personal sense is that while IP fragmentation / TCP upgrading / cache consumption isn’t the end of the world, it’s certainly worth optimizing against. More importantly, operational experience that says “this plan doesn’t scale” shows up all over the place — and if there’s one thing we need to do more of, it’s to identify, respect, and learn from what’s failed before and what we did to fix it.
A lot of protocols end up passing URLs to larger resources, to be retrieved via HTTP, rather than embedding those resources inline. In my next post, we’ll talk about what it might look like to push HTTP URLs into DNS responses. And that, I think, will segue nicely into why I was writing a DNS over HTTP layer in the first place.