On The Flip Side
What was once possible via 32,769 packets, is still possible via between 134,217,728 and 4,294,967,296 packets. Yep. We’ve been saying that for a while now. So has PowerDNS. So has DJBDNS. There’s nothing specific to BIND here, though I think most people understand that.
What’s going on here is a simple question: Which would you rather build secondary layers of defense against? Thousands of packet? Or billions of packets?
Look. We were looking at an attack before the patch that took ten seconds and was relatively invisible. Four billion packets are many things, but subtle is not one of them.
So there’s a reason you’re not hearing anyone saying, “don’t patch.” And there’s a reason we’ve been telling everyone this is just a stopgap — that we’re still in trouble with DNS, just less. But in business, we choose our risks every single day. That’s why it’s called risk management, not risk elimination.
And for the most part, people seem to get it. Even the story I was somewhat worried about when I’d heard about it — John Markoff’s piece in the New York Times — is a remarkably fair treatment of the issue. Back in March, we needed to come up with a solution to this problem, that could viably protect as many people as possible in a short period of time. DNSSEC has been in progress for nine years. Asking people to deploy it over the course of a month would not have been a pragmatic approach.
DNSSec may be the long term fix. It certainly was not the short term fix.
But can we do better than source port randomization? Possibly. Now comes the wild arguments about what we should really do, to fix this issue. That’s fine by me. That was the idea. But nobody should ever think that they have the One True Fix. I’ve been out here arguing against the 65536-to-1 lottery that fixed source port DNS is. That’s not to say I haven’t been analyzing all the other designs too — but I have to prioritize on things that are in the field, putting customers at risk. I think I’ve argued pretty persuasively that there’s no way 65536-to-1 can ever again offer a sufficient level of security. I don’t know if anyone disagrees with that.
I’m looking forward to better options than source port randomization, even if it means we’ll accept the occasional Gig-E local LAN desktop flood attack hitting our 131M-to-1 to 2B-to-1 mitigations.
It’s the call DJB made all those years ago. It was the right call to make.
Now, lets talk about how entertaining a problem this is going to be to solved, to any degree past what DJB accomplished back in 1999. Note, I’m not saying any final solution won’t use elements of everything that’s about to follow. I’m just saying there are awesomely nasty attacks against everything, and people shouldn’t presume I or others won’t poke sucking chest wounds into seemingly elegant solutions.
First, the universal constraint on every solution is that it must cover the root servers, and the TLD’s, because they’re almost always a better target for poisoning since their position higher in the DNS heirarchy allows them to pollute any name below them. In other words, you can totally opt foo.com into whatever security system you like, but unless A.GTLD-SERVERS.NET (the server for com) is itself secure — and unless the root servers that tell you where A.GTLD-SERVERS.NET — are also included in the solution — there’s no effective security whatsoever. DNSSec, minus the DLV hack, suffers this specific issue, and so does everything else. You either need to be backward compatible all the way up the heirarchy, a trait that port randomization and some other solutions have, or you need to push code to them.
It’s not an impossible proposition to get the root and TLD servers to modify their infrastructure. But that’s not the sort of thing you can make happen via a secret meeting at Microsoft 🙂 It’s a definite negative if they have to change anything.
One solution I’ve sort of liked, believe it or not, is the 0x20 proposal. Basically, this idea from (I think) David Dagon and Paul Vixie notices that DNS is case insensitive (lower case is equal to upper case) but case preserving (a response contains the same caps as a request). So, we can get extra bits of entropy based on asking for wWw.DOXpaRA.cOM, and ignoring replies containing www.DOXPARA.com, WWW.doxpara.COM, etc. 0x20 has some notable corner cases, though — the shorter the domain, the less security can be guaranteed. This is particularly problematic if you’re attacking the roots or TLD’s — especially considering the ability to have almost 100% numeric names, like:
Another path that’s suggested is to double query — “debounce”, as one engineer suggested. Debouncing is similar to the “run all DNS traffic over TCP” approach — seems good, up until the moment you realize you’d kill DNS dead. There’s a certain amount of spare capacity in the DNS system — but it is finite, and it is stressed. Absolutely there’s not enough to handle a 100% increase in traffic over the course of a month.
Now raised is the possibility of “attack modes” — large scale state transitions during periods where the server recognizes (it’s not exactly subtle) that it’s under attack. These have a lot of potential, except for the reality that it creates something of a super amplification attack: An attack invests a small number of packets to push every name server on the net into attack mode, and the DNS infrastructure implodes.
Those wouldn’t be very good headlines.
That’s not to say there aren’t more targeted mitigations — this is the sort of work Nominum has been doing with their server, to attempt to prevent individual targeted names from being poisoned. The problem here is how as soon as the attack mode isn’t global, it becomes interesting for the attacker to repeatedly migrate out of the small range that’s in “lockdown” into a new target range.
And there are so many ranges. There’s dozens of variations on the attacks I’m presenting here. For just one example, I may have shown off how to attack www.foo.com via 83.foo.com, but that doesn’t mean it’s not useful to just attack 83.foo.com. The web security model, to varying degrees, trusts arbitrary subdomains as elements of their parents. This was ultimately how I was able to rickroll the Internet at Toorcon.
All the rate limiting approaches have issues with attacks outside the limited range — with the added bonus that somebody’s not getting a reply, a nasty trait that makes attacks on downstream customers of data much more feasible.
Eventually, people realize we could use a better source of entropy — perhaps a prefix on each query name (XQID) or an extra RR (resource record) containing a cookie. Now we need cooperation from the authoritative side of the DNS house. This is tricky, precisely because while it’s one thing to be proud of “+50% of the net has patched”, it’s quite another to say “well you can reach half the domains out there…” The solutions I’ve seen do all have a story for backwards compatibility, storing/caching whether a given name server does or doesn’t support their particular variant.
But again, if the root servers, or the com servers, are not signed up for this system, there’s no incremental benefit to it: The attacker can prevent your nice and secure name server from ever being used by other resolvers in the first place.
And so, we end up at cryptography. DNSSEC is one approach. I don’t think I need to go into the pragmatic, political issues that have made this an issue. From an engineering standpoint though, if we don’t have the headroom for TCP, do we really have the headroom for any cryptography though? Maybe. DNSSEC is not the only possible trick either. Link-based crypto, either via DTLS (keyed via the existing PKI, using the NS name as the Subject) or some TKEY/TSIG dance, could also work.
So, there’s lots of options. Lots, and lots, and lots of options. But, throughout this entire process of analysis, one thing was very clear:
DJB was right. Almost every attack we find, is strongly mitigated by source port randomization. Mitigated, not eliminated, but mitigated just the same. He may not have known how exactly to break BIND or MSDNS in 10 seconds in the real world — frankly, if he did, he’d have told us. But he knew there had to be a way, as Hans Dobbertin knew in 1996 that eventually somebody was going to break MD5. When Wang finally came out with her MD5 collisions in 2004, it wasn’t a surprise — MD5 had been federally decertified for years. But it was still a pretty big deal, since certifications aside MD5 is everywhere.
Fixed source port DNS was everywhere. Less so now. I’m indescribably amazed and honored by that. That’s a lot of hours by a lot of IT guys we’re looking at here. I’m sure there are some pretty happy pizza shops right now. But lets be clear — there are bad guys in the field, and they are using this attack in interesting ways. People who are patched are much, much safer than people who are not.
Finally, as important a question as “how should DNS really be fixed” is, I think the real question of the day is “why does DNS matter so much?”. From Halvar Flake’s first post — “What, doesn’t everyone assume their gateway is owned, and thus use SSL/SSH?” — the underlying instability is the continuing assumption that there is a difference between networks that are hostile and networks that are safe. DNS is a great way to exploit that delusion — especially behind firewalls — but SNMPv3 and BGP both enable all the attacks I’ve found here. Even if we go from 32 bits of entropy to 128 bits — even if we deploy DNSSec — we’re still going to deliver email insecurely. We’re still going to have an almost entirely unauthenticated web. We’re still going to ignore SSL certificate errors, and we’re still going to have application after application that can’t autoupdate securely.
That, at the end of the day, is a far larger problem than this particular DNS issue.