This year’s Black Hat and Defcon slides!
Man, it’s nice to be playing with packets again!
People seem to be rather excited (Forbes, Dark Reading, Search Security) about the Neutrality Router I’ve been working on. It’s called N00ter, and in a nutshell, it normalizes your link such that any differences in performance can’t be coming from different servers taking different routes, and have to instead be caused at the ISP. Here’s a summary of what I posted to Slashdot, explaining more succinctly what N00ter is up to.
Say Google is 50ms slower than Bing. Is this because of the ISP, or the routers and myriad server and path differentials between the ISP and Google, vs. the ISP and Bing? Can’t tell, it’s all conflated. We have to normalize the connection between the two sites, to measure if the ISP is using policy to alter QoS. Here’s how we do this with n00ter.
Start with a VPN, that creates an encrypted link from a Client to a broker/concentrator. An IP at the Broker talks plaintext with Google and Bing, who replies to the Broker. The Broker now encrypts the traffic back to the Client.
Policy can’t differentiate Bing traffic from Google traffic, it’s all encrypted.
Now, lets change things up — let’s have the Broker push the response traffic from Google and Bing, completely in the open. In fact, lets have it go so far as to spoof traffic from the original sources, making it look like there isn’t even a Broker in place. There’s just nice clean streams from Google and Bing.
If traffic from the same host, being sent over the same network path, but looking like Google, arrives faster (or slower) than traffic that looks like it came from Bing, then there’s policy differentiating Google from Bing.
Now, what if the policy is only applied to full flows, and not half flows? Well, in this case, we have one session that’s a straight normal download from Bing. Then we have another, where the entire client->server path is tunneled as before, but the Broker immediately emits the tunneled packets to Bing *spoofing the Client’s IP address*. So basically we’re now comparing the speed of a full legitimate flow to Bing, with a half flow. If QoS differs — as it would, if policy is only applied to full flows, then once again the policy is detected.
I call this client->server spoofing mode Roto-N00ter.
There’s more tricks, but this is what N00ter’s up to in a nutshell. It should work for anything IP based — if you want to know if XBox360 traffic routes faster than PS3 traffic, this’ll tell you.
Also, I’ve been doing some interesting things with BitCoin. (Len, we’ll miss you.) A few weeks ago, I gave a talk at Toorcon Seattle on the subject. Here are those slides as well.
Where’s the code? Well, two things are slowing down Paketto Keiretsu 3.0 (do people even remember scanrand?). First, I could use a release manager. I swear, packing stuff up for release is actually harder in many ways than writing the code! I do admit to know TCP rather better than Autoconf.
Secondly, and I know this is going to sound strange — I’m really not out to bust anyone with N00ter. Yes, I know it’s inevitable. But if a noxious filter is quietly removed with the knowledge that it’s presence is going to be provable in a court of law, well, all’s well that ends well, right?
So, give me a week or two. I have to get back from Germany anyway (the Black Ops talk will indeed be presented in a nuke hardened air bunker, overlooking MiG’s on the front lawn. LOL.)
Some of the remedies are problematic. A group of Internet safety experts cautioned that the procedure to redirect Internet traffic from offending Web sites would mimic what hackers do when they take over a domain. If it occurred on a large enough scale it could impair efforts to enhance the safety of the domain name system.
This kind of blocking is unlikely to be very effective. Users could reach offending Web sites simply by writing the numerical I.P. address in the navigator box, rather than the URL. The Web sites could distribute free plug-ins to translate addresses into numbers automatically.
The bill before the Senate is an important step toward making piracy less profitable. But it shouldn’t pass as is. If protecting intellectual property is important, so is protecting the Internet from overzealous enforcement.
TL;DR: Authentication failures are getting us owned. Our standard technology for auth, passwords, fail repeatedly — but their raw simplicity compared to competing solutions drives their continued use. SecurID is the most successful post-password technology, with over 40 million deployed devices. It achieved its success by emulating passwords as closely as possible. This involved generating key material at RSA’s factory, a necessary step that nonetheless created the circumstances that allowed a third party compromise at RSA to affect customers like Lockheed Martin. There are three attack vectors that these hackers are now able to leverage — phishing attacks, compromised nodes, and physical monitoring of traveling workers. SecurID users however are at no greater risk of sharing passwords, a significant source of compromise. I recommend replacing devices in an orderly fashion, possibly while increasing the rotation rate of PINs. I dismiss concerns about source compromise on the grounds that both hardware and software are readily reversed, and anyway we didn’t change operational behavior when Windows or IOS source leaked. RSA’s communications leave a lot to be desired, and even though third party security dependencies are tolerable, it’s unclear (even with claims of a “shiny new HSM”) whether this particular third party security dependency should be tolerated.
Table Of Contents:
There’s an old story, that when famed bank robber Willie Sutton was asked why he robbed banks, he replied: “Because that’s where the money is.” Why would hackers hit RSA Security’s SecurID authenticators?
Because that’s where the money is.
The failure of authentication technologies is one of the three great issues driving the cyber security crisis (the other two being broken code and invulnerable attackers). The data isn’t subtle: According to Verizon Business’ Data Breach Investigative Report, over half of all compromised records were lost to attacks in which a credential breach occurred. Realistically, that means passwords failed us in some way yet again.
By almost any metric, passwords are a failed authentication technology. But there’s one metric in which they win.
Passwords are really simple to implement — and through their simplicity, they’re incredibly robust to real world circumstances. Those silly little strings of text can be entered by users on any device, they can be tunneled inside practically any protocol across effectively any organizational boundary, and even a barely conscious programmer can compare string equality from scratch with no API calls required.
Password handling can be more complex. But they don’t have to be. The same can’t be said about almost any of the technologies that were pegged to replace passwords, like X.509 certificates or biometrics.
It is a hard lesson, but ops — operations — is really the decider of what technologies live or die, because they’re the guys who put their jobs on the line with every deployment. Ops loves simplicity. While “secure in the absence of an attacker” is a joke, “functional in the presence of a legitimate user” is serious business. If you can’t work when you must, you won’t last long enough to not work when you must not.
So passwords, with all their failures, persist in all but the most extreme scenarios. There’s however been one technology that’s been able to get some real world wins against password deployments: RSA SecurIDs. To really understand their recent fall, it’s important to understand the terms of their success. I will attempt to explain that here.
Over forty million RSA SecurID tokens have been sold. By a wide margin, they are the most successful post-password technology of all time. What are they?
Every thirty seconds, RSA SecurID tokens generate a fresh new code for the user to identify himself with. They do require a user to maintain possession of the item, much as he might retain a post-it note with a hard to remember code. And they do require the server to eventually query a backend server for the correctness of the password, much as it might validate against a centralized password repository.
Everything between the user and the server, however, stays the same. The same keys are used to enter the same character sets into the same devices, which use the same forms and communication protocols to move a simple string from client A to server B.
Ops loves things that don’t change.
It gets better, actually. Before, passwords needed to be cycled every so often. Now, because these devices are “self-cycling”, the rigamarole of managing password rotation across a large number of users can be dispensed with (or kept, if desired, by rotating PIN numbers).
Ops loves the opportunity to do less work.
When things get interesting is when you ask: How do these precious new devices get configured? Passwords are fairly straightforward to provision — give the user something temporary and hideous, then pretend they’ll generate something similar under goofy “l33tspeak” constraints. Providing the user with a physical object is in fact slightly more complex, but not unmanageably so. It’s not like new users don’t need to be provisioned with other physical objects, like computers and badges. No, the question is how much work needs to go into configuring the device for a user.
There are no buttons on a SecurID token. That doesn’t just mean less things for a user to screw up and call help desk over, annoying Ops. It also means less work for Ops to do. Tapping configuration data into a hundred stupid little tokens, or worse, trying to explain to a user how to tap a key in, is expensive at best and miserable at worst. So here’s what RSA hit on:
Just put a random key, called a seed, on each token at the factory.
Have the device combine the seed, with time, to create the randomized token output.
Link each seed to a serial number.
Print the serial number on the device.
Send Ops the seeds for the serial numbers in their possession.
Now, all Ops needs to know is that a particular user has the device with a particular serial number, and they can tie the account to the token in possession. It’s as low configuration as it comes.
There are three interesting quirks of this deployment model:
First, it’s not perfect. SecurID tokens are still significantly more complicated to deploy than passwords, even independent of the cost. That they’re the most successful post-password technology does not make them a universal success, by any stretch of the imagination.
Second, the resale value of the SecurID system is impacted — anyone who owned your system previously, owns your systems today because they have the seeds to your authenticators and you can’t change them. This is not a bug (if you’re RSA), it’s a feature.
Third, and more interestingly, users of SecurID tokens don’t actually generate their own credentials. A major portion, the seed for each token, comes from RSA itself. What’s more, these seeds — or a “Master Key” able to regenerate them — were stored at RSA. It is clear that the seeds were lost to hackers unknown.
It seems obvious that storing the seeds to customer devices was a bug. I argue it was a feature as well.
I fly, a lot. Maybe too much. For me, the mark of a good airline is not how it functions when things go right. It’s what they do when thy don’t.
There’s far more room for variation when things go wrong.
Servers die. It happens. There’s not always a backup. That happens too. The loss of an authentication services is about as significant an event as can be, because (by definition) an outage is created across the entire company (or at least across entire userbases).
Is it rougher when a SecurID server dies, than when a password server dies?
Yes, I argue that it is. For while users have within their own memories backups of their passwords, they have no knowledge of the seeds contained within their tokens. They’ve got a token, it’s got a serial number, they’ve got work to do. The clock is ticking. But while replacing passwords is a matter of communication, replacing tokens is a matter of shipping physical items — in quantities large enough that nobody would have them simply lying around.
Ops hates anything that threatens a multi-day outage. That sort of stuff gets you fired.
And so RSA retained the ability to regenerate seed files for all their customers. Perhaps they did this by storing truly randomly generated secrets. Most likely though they combined a master secret with a serial number, creating a unique value that could nonetheless be restored later.
The ability to recover from a failure has been a thorn in the side of many security systems, particularly Full Disk Encryption products. It’s much easier to have no capacity to escrow keys, or to survive bad blocks. But then nobody deploys your code.
Over the many years that SecurID has been a going concern, these sorts of recoveries were undoubtedly far more common than deep compromises of RSA’s deepest secrets.
Things change. RSA finally got broken into, because that’s where the money was. And we’re seeing the consequences of the break being admitted at Lockheed Martin, who recently cycled the credentials of every user in their entire organization.
That ain’t cheap.
It’s important to understand how we got here. There have been some fairly serious accusations of incompetence, and I think they reflect a simple lack of understanding regarding why the SecurID system was built as it was. But where do we go from here? That requires actionable intelligence, which I define as knowing:
What can an attacker do today, that he couldn’t do yesterday, for what class attacker, to what class customer?
The answer seems to be:
An attacker with a username, a PIN, and at least one output from a SecurID token (but possibly a few more), can extend their limited access window indefinitely through the generation of arbitrary token outputs. The attacker must be linked to the hackers who stole the seed material from RSA. The customer must have deployed RSA SecurID.
This extends into three attack scenarios:
1) Phishing. Attackers have learned they can impersonate sites users are willing to log into, and can sometimes get them to provide credentials willingly. (Google recently accused a Chinese group of doing just this to a number of high profile targets in the US.) Before, the loss of a single SecurID login would only allow a one time access to the phished user’s network. Now the attacker can impersonate the user indefinitely.
2) Keylogging. It does happen that attackers acquire remote access to a user’s machine, and are able to monitor its resources and learn any entered passwords. Before, for the attacker to leverage the user’s access to a trusted internal network, he’d have to wait for the legitimate user to log in, and then run “along side” him either from the same machine or from a foreign one. This could increase the likelihood of discovery. Now the attacker can choose to run an attack at the time, and from the machine of his choosing.
3) Physical monitoring. Historically, information security professionals have mostly ignored the class of attacks known as TEMPEST, in which electromagnetic emissions from computing devices leak valuable information. But in the years since TEMPEST attacks were discovered other classes of attacks against keyboards — as complicated as lasers against the back of a screen, and as simple as telephoto lenses, have proven effective. Before, an attacker who managed to pick off a single log in had to use their findings within minutes in order for them to be useful. Now, the attacker can break in at their leisure. Note that there’s also some risk from attackers who merely pick up the serial number from the back of a token, if those attacker can learn or guess usernames and PINs.
There is one attack scenario that is not impacted: Many compromises are linked to shared credentials, i.e. one user giving their password to another willingly, or posting their credentials to a shared server. SecurID does still prevent password sharing of this type, since users aren’t in the class of attacker that has the seeds.
What to do in defense? My personal recommendation would be to implement an orderly replacement of tokens, unless there was some evidence of active attack in which case I’d say to replace tokens aggressively. In the meantime, rotating PINs on an accelerated basis — and looking for attempted logins that try to recycle old PINs — might be a decent short term mitigation.
But is that all that went wrong?
The RSA “situation” has been going on for a couple months now, with no shortage of rumors swirling about what was lost, and no real guidance from RSA on the risk to their customers (at least none outside of NDA). Two other possible scenarios should be discussed.
The first is that it’s possible RSA lost their source code. Nobody cares. We didn’t stop using Windows or IOS when their source code leaked, and we wouldn’t stop using SecurID now even if they went Open Source tomorrow. The reality is that any truly interesting hardware has long since been dunked in sulfuric acid and analyzed from photographs, and any such software has been torn apart in IDA Pro. The fact that attackers had to break in to get the seed(s) reflects the fact that there wasn’t any magic backdoor to be found in what was already released.
The second scenario, brought up by the brilliant Mike Ossman, is that perhaps RSA employees had some crazy theoretical attack they’d discussed among themselves, but had never gone public with. Maybe this attack only applied to older SecurIDs, which had not been replaced because nobody outside had learned of the internal discovery. Under the unbound possibilities of this attack, who knows, maybe that occurred. It’s viable enough to cite in passing here, but there’s no evidence for it one way or another.
But whose fault is that?
It’s impossible to discuss the RSA situation without calling them out for the lack of actionable intelligence provided by the company itself. Effectively, the company went into full panic mode, then delivered guidance that amounted to tying one’s shoelaces.
That was a surprise. Ops hates surprises, especially when they concern whether critical operational components can be trusted or not.
It was doubly bizarre, given that by all signs half the security community guessed the precise impact of the attack within days of the original announcement. With the exception of the need to support recovery operations, most of the above analysis has been discussed ad nauseum.
But confirmations, and hard details, have only been delivered by RSA under NDA.
Why? A moderately cynical bit of speculation might be that RSA could only replace so many tokens at once, and they were afraid they’d simply get swamped if everybody decided their second factor authenticator had been fully broken. Only now, after several months, are they confident they can handle any and all replacement requests. In the wake of the Lockheed Martin incident, they’ve indeed publicly announced they will indeed effect replacements upon requests.
Whatever the cause, there are other tokens besides RSA SecurID, and I expect their sales teams to exploit this situation extensively. (A filing to the SEC that there would be no financial impact from this hack seemed…premature…given the lack of precedent either regarding the scale of attack or the behavior towards the existing customer base.)
Realistically, customers sticking with RSA should probably seek contractual terms that prevents a repeat of this scenario. That means two things. First, actual customers should obligate RSA to tell them of their exposure to events that may affect them. This is actually quite normal — shows up in HIPAA arrangements all the time, for example.
Second, customers should discover the precise nature of their continued exposure to seed loss at RSA HQ, even after their replacement tokens arrive.
ARE THINGS BETTER?
Jeff Moss recently asked the following question: “When I heard RSA had a shiney new half million dollar HSM to store seed files I wondered where had they been stored before?”
HSM’s are rare, so likely on a stock server that got owned.
I think the more interesting question is, what exactly does the HSM do, that is different than what was done before? HSMs generally use asymmetric cryptography to sign or decrypt information. From my understanding, there’s no asymmetric crypto in any SecurID token — it’s all symmetric crypto and hash functions. The best case scenario, then, would be the HSM receiving a serial number, internally combining it with a “master seed”, then returning the token seed to the factory.
That’s cool, but it doesn’t exactly stop a long term attack. Is there rate limiting? Is there alarming on recovery of older serial numbers? Is this in fact a custom HSM backend? If they are, is the new code in fact running inside the physically protected core?
It is also possible the HSM stores a private key that is able to decrypt randomly generated keys, encrypted to a long term public key. In this case, the HSM could be kept offline, only to be brought up in the occasional case of a necessary recovery.
We don’t really know, and ultimately we should.
See, here’s the thing. Some people are angry that vulnerabilities in a third party can affect their organization. The reality is that this is a ship that has long since sailed. Even if you ignore all the supply chain threats in the underlying hardware within an organizaton, there’s an entire world of software that’s constantly being updated, and automatically shipped to high percentages of clients and servers. There is no question that the future of software is, and must be, dynamic updates — if for no other reason that static defenses are hopelessly vulnerable to dynamic threats. The reality is that more and more of these updates are being left in the hands of the original vendors.
Compromise of these third party update chains is actually much more exploitable than even the RSA hit. At least that required partial credentials from the users to be exploited.
But the fact remains, while these other third parties might get targeted and might have their implicit trust exploited, RSA did. The attackers have not, and realistically will not get caught. They know RSA’s network. Can they come back and do it again?
It is one thing to choose to accept the third party risk from RSA’s authenticators. It’s another to have no choice. It’s not really clear whether RSA’s “shiny new HSMs” make the new SecurIDs more secure than the old ones. RSA has smart people, and it’s likely there’s a concrete improvement. But we need a little more than likeliness here.
In the presence of the HSM, the question should be asked: What are attackers unable to do today, that they could do yesterday, for the attackers who broke into RSA before, to RSA now?
The Los Angeles Times has come out in support of our position on recursive DNS filtering!
The main problem with the bill is in its effort to render sites invisible as well as unprofitable. Once a court determines that a site is dedicated to infringing, the measure would require the companies that operate domain-name servers to steer Internet users away from it. This misdirection, however, wouldn’t stop people from going to the site, because it would still be accessible via its underlying numerical address or through overseas domain-name servers.
A group of leading Internet engineers has warned that the bill’s attempt to hide piracy-oriented sites could hurt some legitimate sites because of the way domain names can be shared or have unpredictable mutual dependencies. And by encouraging Web consumers to use foreign or underground servers, the measure could undermine efforts to create a more reliable and fraud-resistant domain-name system. These risks argue for Congress to take a more measured approach to the problem of overseas rogue sites.
The DNS works. It creates the shared namespace that allows applications to interoperate across LANs, organizations, and even countries. With the advent of DNSSEC, it’s our best opportunity to finally address the authentication flaws that are implicated in over half of all data breaches.
There are efforts afoot to manipulate the DNS on a remarkably large scale. The American PROTECT IP act contains several reasonable and well targeted remedies to copyright infringement. One of these remedies, however, is to leverage the millions of recursive DNS servers that act as accelerators for Internet traffic, and convert them into censors for domain names in an effort to block content.
Filtering DNS traffic will not work, and in its failure, will harm both the security and stability of the Internet at large.
But don’t take just my word for it. I’ve been fairly quiet about PROTECT IP since when it was referred to as COICA, working behind the scenes to address the legislation. A common request has been a detailed white paper, deeply elucidating the technical consequences of Recursive DNS Filtering.
Today, I — along with Steve Crocker, author of the first Internet standard ever, Paul Vixie, creator of many of the DNS standards and long time maintainer of the most popular name server in the world, David Dagon, co-founder of the Botnet fighting firm Damballa, and Danny MacPherson, Chief Security Officer of Verisign — release Security and Other Technical Concerns Raised by the DNS Filtering Requirements in the PROTECT IP Bill.
Thanks go to many, but particularly to the staffers who have been genuinely and consistently curious — on both sides of the aisle — about the purely technical impact of the Recursive DNS Filtering provisions. A stable and secure Internet is not a partisan issue, and it’s gratifying to see this firsthand.
TL,DR: Identifying the responsible party for a bug is hard. VUPEN was able to achieve code execution in Google’s Chrome browser, via Flash and through a special but limited sandbox created for the Adobe code. Chrome ships Flash with Chrome and activates it by default, so there’s no arguing that this a break in Chrome. However, it would represent an incredibly naive metric to say including Flash in Chrome has made it less secure. An enormous portion of the user population installs Flash, alongside Acrobat and Java, and by taking on additional responsibility for their security (in unique and different ways), Google is unambiguously making users safer. In the long run, sandboxes have some fairly severe issues, among which is the penchant for being very slow to repair once broken, if they can even be fixed at all. But in the short run, it’s hard to deny they’ve had an effect, exacerbating the “Hacker Latency” that quietly permeates computer security. Ultimately though, the dam has broken, and we can pretty much expect Flash bearing attackers to drill into Chrome at the next PWN2OWN. Finally, I note that the sort of disclosure executed by VUPEN is far, far too open to both fraudulent abuse and simple error, the latter of which appears to have occurred. We have disclosure for a reason. And then I clear my throat.
[Not citing in this document, as people are not speaking for their companies and I don't want to cause trouble there. Please mail me if I have permission to edit your cites into this post. I'm being a little overly cautious here. This post may mutate a bit.]
VUPEN vs. Google. What a fantastic example of the danger of bad metrics.
In software development, we have an interesting game: “Who’s bug is it, anyway?” Suppose there’s some troublesome interaction between a client and a server. Arguments along the lines of “You shouldn’t sent that message, Client!” vs. “You should be able to handle this message, Server!” are legendary. Security didn’t invent this game, it just raised the stakes (and, perhaps, provides guidance above and beyond ‘what’s cheapest’).
In VUPEN vs. Google, we have one of the more epic “Who’s bug is it, anyway?” events in recent memory.
VUPEN recently announced the achievement of code execution in Google Chrome 11. This has been something of an open challenge — code execution isn’t exactly rare in web browsers, but Chrome has managed to survive despite flaws being found in its underlying HTML5 renderer (WebKit). This has been attributed to Chrome’s sandbox: A protected and restricted environment in which malicous code is assumed to be powerless to access sensitive resources. Google has been quite proud of the Chrome sandbox, and indeed, for the last few PWN2OWN contests run by Tipping Point at CanSecWest, Chrome’s come out without a scratch.
Of course, things are not quite so simple. VUPEN didn’t actually break the Chrome sandbox. They bypassed it entirely. Flash, due to complexity issues, runs outside the primary Chrome sandbox. Now, there is a second sandbox, built just for Flash, but it’s much weaker. VUPEN apparently broke this secondary sandbox. Some have argued that this certainly doesn’t represent a break of the Google sandbox, and is really just another Flash vuln.
Is this Adobe’s bug? Is this Google’s bug? Is VUPEN right? Is Google right?
VUPEN broke Chrome. You really can’t take that from them. Install Chrome, browse to a VUPEN-controlled page, arbitrary code can execute. Sure, the flaw was in Flash. But Chrome ships Flash, as it ships WebKit, SQLite, WebM, or any of a dozen other third party dependencies. Being responsible for your imported libraries is fairly normal — when I found a handful of code execution flaws in Firefox, deriving from their use of Ogg code from xiph.org, the Firefox developers didn’t blink. They shipped it, they exposed it on a zero-click drive-by surface, they assumed responsibility for the quality of that code. When the flaws were discovered, they shipped fixes.
That’s what responsibility means — responding. It doesn’t mean there will be nothing to respond to.
Google is taking on some interesting responsibilities with its embedding of Flash.
Browsers have traditionally not addressed the security of their plugins. When there’s a flaw in Java, we look to Oracle to fix it. When there’s an issue with Acrobat, we look to Adobe to fix it. So it is with Adobe’s Flash. This has come from the fact that:
A) The browsers don’t actually ship the code; rather, users install it after the fact
B) The browsers don’t have the source code to the plugins anyway, so even if they wanted to write a fix, they couldn’t
From the fact that plugins are installed after the fact, one can assign responsibility for security to the user themselves. After all, if somebody’s going to mod their car with an unsafe fuel injection system, it’s not the original manufacturer’s fault if the vehicle goes u in flames. And from the fact that the browser manufacturers are doing nothing but exposing a pluggable interface, one can cleanly assign blame to plugin makers.
That’s all well and good. How’s that working out for us?
For various reasons, not well. Not well enough, anyway. Dino Dai Zovi commented a while back that we needed to be aware of the low hanging fruit that attackers were actually using to break into networks. Java, Acrobat, and Flash remain major attack vectors, being highly complex and incredibly widely deployed. Sure, you could just blame the user some more — but you know what, Jane in HR needs to read that resume. Deal with it. What we’re seeing Google doing is taking steps to assume responsibility over not simply the code they ship, but also the code they expose via their plugin APIs.
That’s something to be commended. They’re — in some cases, quite dramatically — increasing the count of lines of code they’re pushing, not to protect users from code they wrote, but to protect users. Period.
That’s a good thing. That’s bottom line security defeating nice sounding excuses.
The strategies in use here are quite diverse, and frankly unprecedented. For Java, Chrome pivots its user experience around whether your version of Java is fully up to date. If it isn’t, Java stops being zero-click, a dialog drops down, and the user is directed to Oracle to upgrade the plugin. For PDFs, Google just shrugged its shoulders and wrote a new PDF viewer, one constrained enough to operate within their primary sandbox even. It’s incomplete, and will prompt if it happens to see a PDF complex enough to require Acrobat Reader, but look:
1) Users get to view most PDFs without clicking through any dialogs, via Google PDF viewer
2) In those circumstances where Google PDF viewer isn’t enough, users can click through to Acrobat.
3) Zero-click drive-by attacks via Acrobat are eliminated from Chrome
This is how you go from ineffectual complaining about the bulk of Acrobat and the “stupidity” of users installing it, to actually reducing the attack surface and making a difference on defense.
For various reasons, Google couldn’t just rewrite Flash. But they could take it in house, ship it in Chrome, and at minimum take responsibility for keeping the plugin. (Oracle. Adobe. I know you guys. I like you guys. You both need to file bugs against your plugin update teams entitled ‘You have a user interface’. Seriously. Fix or don’t fix, just be quiet about it.)
Well, that’s what Chrome does. It shuts up and patches. And the bottom line is that we know some huge portion of attacks wouldn’t work, if patches were applied. That’s a lot of work. That’s a lot of pain. Chrome is taking responsibility for user behavior; taking these Adobe bugs upon themselves as a responsibility to cover.
Naive metrics would punish this. Naive observers have wrung their hands, saying Google shouldn’t ship any code they can’t guarantee is good. Guess what, nobody can ship code they guarantee is good, and with the enormous numbers of Flash deployments out there, the choice is not between Flash and No Flash, the choice is between New Flash and Old Flash.
A smart metric has to measure not just how many bugs Google had to fix, but how many bugs the user would be exposed to had Google not acted. Otherwise you just motivate people to find excuses why bugs are someone else’s fault, like that great Who’s Bug Is It Anyway on IE vs. Firefox Protocol Handlers.
(Sort of funny, actually, there’s been criticism around people only having Flash so they can access YouTube. Even if that’s true, somebody just spent $100M on an open video codec and gave it away for free. Ahem.)
Of course, it’s not only for updates that Chrome embedding Flash is interesting. There are simply certain performance, reliability, and security (as in memory corruption) improvements you can’t build talking to someone over an API. Sometimes you just need access to the raw code, so you can make changes on both sides of an interface (and retain veto over bad changes).
Also, it gets much more realistic to build a sandbox.
I suppose we need to talk about sandboxes.
It’s somewhat known that I’m not a big fan of sandboxes. Sandboxes are environments designed to prevent an attacker with arbitrary code execution from accessing any interesting resources. I once had a discussion with someone on their reliability:
“How many sandboxes have you tested?”
“How many have you not broken?”
“Do you think this might represent hard data?”
“You’re being defeatist!”
It’s not that sandboxes fail. Lots of code fails — we fix it and move on with our life. It’s that sandboxes bake security all the way into design, which is remarkably effective until the moment the design has an issue. Nothing takes longer to fix than design bugs. No other sorts of bugs threaten to be unfixable. Systrace was broken in 2007, and has never been fixed. (Sorry, Niels.) That just doesn’t happen with integer overflows, or dangling pointers, or use-after-frees, or exception handler issues. But one design bug in a sandbox, in which the time between a buffer being checked and a buffer being used is abused to allow filter bypass, and suddenly the response from responsible developers becomes “we do not recommend using sysjail (or any systrace(4) tools, including systrace(1)”.
Not saying systrace is unfixable. Some very smart people are working very hard to try, and maybe they’ll prove me wrong someday. But you sure don’t see developers so publicly and stridently say “We can’t fix this” often. Or, well, ever. Design bugs are such a mess.
VUPEN did not break the primary Chrome sandbox. They did break the secondary one, though. Nobody’s saying Flash is going into the primary sandbox anytime soon, or that the secondary sandbox is going to be upgraded to the point VUPEN won’t be able to break through. We do expect to see the Flash bug fixed fairly quickly but that’s about it.
The other problem with sandboxes, structurally, is one of attack complexity. People assume finding two bugs is as hard as finding one. The problem is that I only need to find one sandbox break, which I can then leverage again and again against different primary attack surfaces. Put in mathematical terms, additional bug hurdles are adders, not multipliers. The other problem is that for all but the most trivial sandboxes (a good example recently pointed out to me — the one for vsftpd) the degrees of freedom available to an attacker are so much greater post-exploitation, than pre, that for some it would be like jumping the molehill after scaling the mountain.
There are just so many degrees of freedom to secure, and so many dependencies (not least of which, performance) on those degrees of freedom remaining. Ugh.
Funny thing, though. Sandboxes may fail in the long term. But they do also, well, work. As of this writing, generic bypasses for the primary Chrome sandbox have not been publicly disclosed. Chrome did in fact survive the last few Tipping Point PWN2OWNs. What’s going on?
Sure, we (the public research community, or the public community that monitors black hat activity) find bugs. But we don’t find them immediately. Ever looked at how long bugs sit around before they’re finally found? Years are the norm. Decades aren’t unheard of. Of course, the higher profile the target, the quicker people get around to poking at it. Alas, the rub — takes a few years for something to get high profile, usually. We don’t find the bugs when things are originally written — when they’re cheapest to fix. We don’t find them during iterative development, during initial customer release, as they’re ramped up in the marketplace. No, the way things are structured, it has to get so big, so expensive…then it’s obvious.
Hacker Latency. It’s a problem.
Sandboxes are sort of a double whammy when it comes to high hacker latency — they’re only recently widely deployed, and they’re quite different than defenses that came before. We’re only just now seeing tools to go after them. People are learning.
But eventually the dam breaks, as it has with VUPEN’s attack on Chrome. It’s now understood that there’s a partial code execution environment that lives outside the primary Chrome sandbox, and that attacks against it don’t have to contend with the primary sandbox. We can pretty much expect Flash attacks on Chrome at the next PWN2OWN.
One must also give credit where it’s due: VUPEN chose to bypass the primary sandbox, not to break it. And so, I expect, will the next batch of Chrome attackers. That it’s something one even wants to bypass is notable.
Alas, there is one final concern. While in this case it happened to be the case that VUPEN did legitimate work, random unsubstantiated claims of security finds that are only being disclosed to random government entities deserve no credibility whatsoever. The costs of running a fraudulent attack are far too low and the damage is much too high. And it’s not merely a matter of fraud — I don’t get the sense that VUPEN knew there were two different sandboxes. The reality of black boxing is you’re never quite sure what’s going on in there, and so while there was some actionable intelligence from the disclosure (“oh heh, the Chrome sandbox has been bypassed”) it was wrong (“oh heh, a Chrome sandbox has been bypassed”) and incomplete (“Flash exploits are potentially higher priority”).
There’s a reason we have disclosure. (AHEM.)
Couple of interesting things. First, as Chris Evans (@scarybeast on Twitter, and on the Sandbox team at Google) pointed out, Tipping Point has has deliberately excluded plugins from being “in-scope” at PWN2OWN. To some degree, this makes sense — people would just show up with their Java exploit and own all the browsers. However, you know, that just happened in the real world, with one piece of malware not only working across browsers, but across PC and Mac too. Once upon a time, ActiveX (really just COM) was derided for shattering the security model for IE, for whatever the security model of the browser did, there was always some random broken ActiveX object to twiddle.
Now, the broken objects are the major plugins. Write once, own everywhere.
Worse, there’s just no way to say Flash is just a plugin anymore, not when it’s shipped as just another responsible component in Chrome.
But Chris Evans is completely right. You can’t just say Flash is fair game for Chrome (and only Chrome) in the next PWN2OWN. That would deeply penalize Google for keeping Flash up to date, which we know would unambiguously increase risk for Chrome users. The naive metric would be a clear net loss. And nor should Flash be fair for all browsers, because it’s actually useful to know which core browser engines are secure and which aren’t. As bad money chases out good, so does bad code.
So, if Dragos can find a way, we really should have a special plugin league, for the three plugins that have +90% deployments — Flash, Acrobat, and Java. If the data says that they indeed fall quite easier than the browsers that house them, well, that’s good to know.
Another issue is sort of an internal contradiction in the above piece. At one point, I say:
VUPEN chose to bypass the primary sandbox, not to break it. And so, I expect, will the next batch of Chrome attackers. That it’s something one even wants to bypass is notable.
But then I say:
I don’t get the sense that VUPEN knew there were two different sandboxes.
Obviously both can’t be true. It’s impossible to know which is which, without VUPEN cluing us in. But, as Jeremiah Grossman pointed out, VUPEN ain’t talking. In fact, VUPEN hasn’t even confirmed the entire supposition, apparently-but-not-necessarily confirmed by Tavis Ormandy et al, that theirs was a bug in Flash. Just more data pointing to these sorts of games being problematic.
As they say: “If you can’t measure it, you can’t manage it.”
There’s a serious push in the industry right now for security metrics. People really want to know what works — because this ain’t it. But where can we find hard data?
What about fuzzers — the automated, randomized testers that have been so good at finding bugs through sheer brute force?
I had a hypothesis, borne of an optimism that’s probably a bit out of place in our field: I believe, after ten years of pushing for more secure code, software quality has increased across the board — at least in things that have been under active attack. And, in conjunction with Adam Cecchetti and Mike Eddington of Deja Vu Security (and the Peach Project), we developed an experiment to try to show this.
We fuzzed Office and OpenOffice. We fuzzed Acrobat, Foxit, and GhostScript. And we fuzzed them all longitudinally — going back in time, smashing 2003, 2007, and 2010.
175,334 crashes later, we have some…interesting data. Here’s a view of just what happened to Office — and OpenOffice — between 2003 (when the Summer of Worms hit) and 2010.
Things got better! This isn’t the end all and be all of this approach. There’s lots that could be going wrong. But we see similar collapses in PDF crash rates too. Slides follow below, and a paper will probably be assembled at some point. But the most important thing we’re doing is — we’re releasing data! This is easily some of the most fun public data I’ve played with. I’m looking forward to seeing what sort of visualizations come of them. Here’s one I put together, looking at individual crash files and their combined effect on Word/Writer 2003/2007/2010:
I’ve wanted to do a study like this for a very long time, and it’s been pretty amazing finally doing a collaboration with DD and Adam! Thanks also to the vendors — Microsoft, the OpenOffice Team, Adobe, Foxit, and the guys behind GhostScript. Everyone’s been cool.
Please send feedback if you have thoughts. I’d love to post some viz based on this data! And now, without further ado…
- CanSecWest Slides
- Crash Summary (TSV) (MySQL Dump)
- Crash Profile Spreadsheet (XLS) (Google Spreadsheet)
- (Very Rough) Crash Profile Generator
Science! It works. Or, in this case, it hints of an interesting new tool with its own set of downsides ;)
It ain’t security. Can’t say I care. Trying to fix color blindness is most fun side project evar.
[Heh! This blog post segues into a debate in the comments!]
Short Version: Zooko’s Triangle describes traits of a desirable naming system — and constrains us to two of secure, decentralized, or human readable. Aaron Swartz proposes a system intended to meet all three. While his distribution model has fundamental issues with distributed locking and cache coherency, and his use of cryptographic hashcash does not cause a significant asymmetric load for attackers vs. defenders, his use of an append-only log does have interesting properties. Effectively, this system reduces to the SSH security model in which “leaps of faith” are taken when new records are seen — but, unlike SSH, anyone can join the cloud to potentially be the first to provide malicious data, and there’s no way to recover from receiving bad (or retaining stale) key material. I find that this proposal doesn’t represent a break of Zooko’s Triangle, but it does show the construct to be more interesting than expected. Specifically, it appears there’s some room for “float” between the sides — a little less decentralization, a little more security.
Zooko’s Triangle is fairly beautiful — it, rather cleanly, describes that a naming system can offer:
- Secure, Decentralized, Non-Human-Readable names like Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com
- Secure, Centralized, Human-Readable Names like http://www.twitter.com, as declared by trusted third parties (delegations from the DNS root)
- Insecure, Decentralized, Human-Readable Names like http://www.twitter.com, as declared by untrusted third parties (consensus via a P2P cloud)
There’s quite the desire to “square Zooko’s triangle” — to achieve Secure, Decentralized, Human-Readable Names. This is driven by the desire to avoid the vulnerability point that centralization exposes, while neither making obviously impossible demands on human memory nor succumbing to the rampantly manipulatable opinion of a P2P mob.
Aaron Swartz and I have been having a very friendly disagreement about whether this is possible. I promised him that if he wrote up a scheme, I’d evaluate it. So here we are. As always, here’s the source material:
The basic idea is that there’s a list of name/key mappings, almost like a hosts file, but it links human readable names to arbitrary length keys instead of IP addresses. This list, referred to as a scroll, can only be appended to, never deleted from. Appending to the list requires a cryptographic challenge to be completed — not only do you add (name,key), you also add a nonce that causes the resulting hash from the beginning to be zero.
Aaron grants that there might be issues with initially trusting the network. But his sense is, as long as either your first node is trustworthy, or the majority of the nodes you find out about are trustworthy, you’re OK.
It’s a little more complicated than that.
There are many perspectives from which to analyze this system, but the most interesting one I think is the one where we watch it fail — not under attack, but under everyday use.
How does this system absorb changes?
While Aaron’s proposal doesn’t allow existing names to alter their keys — a fatal limitation by some standards, but one we’ll ignore for the moment — it does allow name/key pairs to be added. It also requires that additions happen sequentially. Suppose we have a scroll with 131 name/key pairs, and both Alice and Bob try to add a name without knowing about the other’s attempt. Alice will compute and distribute 131,Alice, and Bob will compute and distribute 131,Bob.
What now? Which scroll should people believe? Alice’s? Bob’s? Both?
Should they update both? How will this work over time?
Aaron says the following:
What happens if two people create a new line at the same time? The debate should be resolved by the creation of the next new line — whichever line is previous in its scroll is the one to trust.
This doesn’t mean anything, alas. The predecessor to both Alice and Bob’s new name is 131. The scroll has forked. Either:
- Some people see Alice, but not Bob
- Some people see Bob, but not Alice
- Somehow, Alice and Bob’s changes are merged into one scroll
- Somehow, the scroll containing Alice and the scroll containing Bob are both distributed
- Both new scrolls are rejected and the network keeps living in a 131 record universe
Those are the choices. We in security did not invent them.
There is a belief in security sometimes that our problems have no analogue in normal computer science. But this right here is a distributed locking problem — many nodes would like to write to the shared dataset, but if multiple nodes touch the same dataset, the entire structure becomes corrupt. The easiest fix of course is to put in a centralized locking agent, one node everyone can “phone home to” to make sure they’re the only node in the world updating the structure’s version.
But then Zooko starts jumping up and down.
(Zooko also starts jumping up and down if the output of the system doesn’t actually work. An insecure system at least needs to be attacked, in order to fail.)
Not that security doesn’t change the game. To paraphrase BASF, “Security didn’t make the fundamental problems in Comp Sci. Security makes the fundamental problems in Comp Sci worse.” Distributed locking was already a mess when we trusted all the nodes. Add a few distrusted nodes, and things immediately become an order of magnitude more difficult. Allow a single node to create an infinite number of false identities — in what is known as a Sybil attack — and any semblence of security is gone.
[Side note: Never heard of a Sybil attack? Here's one of the better stories of one, from a truly devious game called Eve Online.]
There is nothing in Aaron’s proposal that prevents the generation of an infinite number of identities. Arguably, the whole concept of adding a line to the scroll is equivalent to adding an identity, so a constraint here is probably a breaking constraint in and of itself. So when Aaron writes:
To join a network, you need a list of nodes where at least a majority are actually nodes in the network. This doesn’t seem like an overly strenuous requirement.
Without a constraint on creating new nodes, this is in fact a serious hurdle. And keep in mind, “nodes in the network” is relative — I might be connecting to majority legitimate nodes, but are they?
We shouldn’t expect this to be easy. Managing fully distributed locking is already hard. Doing it when all writes to the system must be ordered is already probably impossible. Throw in attackers that can multiply ad infinatum and can dynamically alter their activities to interfere with attempts to insert with specific ordered writes?
Before we discuss the crypto layer, let me give you an example of dynamic alterations. Suppose we organize our nodes into a circle, and have each node pass write access to the next node as listed in their scroll. (New nodes would have to be “pulled into” the cloud.) Would that work?
No, because the first malicious node that got in, could “surround” each real node with an untrusted Sybil. At that point, he’d be consulted after each new name was submitted. Any name he wanted to suppress — well, he’d simply delete that name/key pair, calculate his own, and insert it. By the time the token went around, the name would be long since destroyed.
(Not that a token approach could work, for the simple reality that someone will just lose the token!)
Thus far, we’ve only discussed propagation, and not cryptography. This should seem confusing — after all, the “cool part” of this system is the fact that every new entry added to the scroll needs to pass a computational crypto challenge first. Doesn’t that obviate some of these attacks?
The problem is that cryptography is only useful in cases of significant asymmetric workload between the legitimate user and the malicious attacker. When I have an AES256 symmetric key or an RSA2048 private key, my advantage over an attacker (theoretically) exceeds the computational capacity of the universe.
What about a hash function? No key, just the legitimate user spinning some cycles to come to a hash value of 0. Does the legitimate user have an advantage over the attacker?
Of course not. The attacker can spin the same cycles. Honestly, between GPUs and botnets, the attacker can spin far more cycles, by several orders of magnitude. So, even if you spend an hour burning your CPU to add a name, the bad guy really is in a position to spend but a few seconds cloning your work to hijack the name via the surround attack discussed earlier.
(There are constructions in which an attacker can’t quite distribute the load of cracking the hash — for example, you could encode the number of iterations a hash needed to be run, to reach N zero bits. But then, the one asymmetry that is there — the fact that the zero bits can be quickly verified despite the cost of finding them — would get wiped out, and validation of the scroll would take as long as calculation of the scroll.)
But what about old names? Are they not at least protected? Certainly not by the crypto. Suppose we have a scroll with 10,000 names, and we want to alter name 5,123. The workload to do this is not even 10,000 * time_per_name, because we (necessarily) can reuse the nonces that got us through to name 5,122.
Aaron thinks this isn’t the case:
First, you need to need to calculate a new nonce for the line you want to steal and every subsequent line….It requires having some large multiple of the rest of the network’s combined CPU power.
But adding names does not leverage the combined CPU power of the entire network — it leverages the power of a single node. When the single node wanted to create name 5,123, it didn’t need to burn cycles hashcashing through 0-5122. It just did the math for the next name.
Now, yes, names after 5,123 need to be cracked by the attacker, when they didn’t have to be cracked by the defender. But the attackers have way more single nodes than the defenders do. And with those nodes, the attacker can bash through each single name far, far quicker.
There is a constraint — to calculate the fixed nonce/zerohash for name 6001, name 6000 needs to have completed. So we can’t completely parallelize.
But of course, it’s no work at all to strip content from the scroll, since we can always remove content and get back to 0-hash.
Ah! But there’s a defense against that:
“Each remembers the last scroll it trusted.”
And that’s where things get really interesting.
Complexity is a funny thing — it sneaks up on you. Without this scroll storage requirement, the only difference between nodes in this P2P network are what nodes it happens to stumble into when first connecting. Now, we have to be concerned — which nodes has a node ever connected to? How long has it been since a node has been online? Are any significant names involved between the old scroll and the modern one?
Centralized systems don’t have such complexities, because there’s a ground truth that can be very quickly compared against. You can be a hundred versions out of date, but armed with nothing but a canonical hash you can clear your way past a thousand pretenders to the one node giving out a fresh update.
BitTorrent hangs onto centralization — even in small doses — for a reason. Zooko is a harsh mistress.
Storage creates a very, very slight asymmetry, but one that’s often “all we’ve got”. Essentially, the idea is that the first time a connection is made, it’s a leap of faith. But every time after that, the defender can simply consult his hard drive, while the attacker needs to…what? Match the key material? Corrupt the hard drive? These are in fact much harder actions than the defender’s check of his trusted store.
This is the exact system used by SSH, in lieu of being able to check a distributed trusted store. But while SSH struggles with constant prompting after, inevitably, key material is lost and a name/key mapping must change, Aaron’s proposal simply defines changing a name/key tuple as something that Can’t Happen.
Is that actually fair? There’s an argument that a system that can’t change name/key mappings in the presence of lost keys fails on feature completeness. But we do see systems that have this precise property: PGP Keyservers. You can probably find the hash of the PGP key I had in college somewhere out there. Worse, while I could probably beg and plead my PGP keys offline, nobody could ever wipe keys from these scrolls, as it would break all the zero-hashes that depend on them.
I couldn’t even put a new key in, because if scroll parsers didn’t only return the first name/key pair, then a trivial attack would simply be to write a new mapping and have it added to the set of records returned.
Assume all that. What are the ultimate traits of the system?
The first time a name/key peer is seen, it is retrieved insecurely. From that point forward, no new leap of faith is required or allowed — the trust from before must be leveraged repeatedly. The crypto adds nothing, and the decentralization does little but increase the number of attackers who can submit faulty data for the leap of faith, and fail under the sort of conditions distributed locking systems fail. But the storage does prevent infection after the initial trust upgrade, at the cost of basic flexibility.
I could probably get away with saying — a system that reduces to something this and undeployable, and frankly unstable, can not satisfy Zooko’s Triangle. After all, simply citing the initial trust upgrade, as we upgrade scroll versions from a P2P cloud on the false assumption that an attacker couldn’t maliciously synthesize new records unrecoverably, would be enough.
It’s fair, though, to discuss a few annoying points. Most importantly, aren’t we always upgrading trust? Leaps of faith are taken when hardware is acquired, when software is installed, even when the DNSSEC root key is leveraged. By what basis can I say those leaps are acceptable, but a per-name/key mapping leap represents an unacceptable upgrade?
That’s not a rhetorical question. The best answer I can provide is that the former leaps are predictable, and auditable at manufacture, while per-name leaps are “runtime” and thus unpredictable. But audits are never perfect and I’m a member of a community that constantly finds problems after release.
I am left with the fundamental sense that manufacturer-based leaps of faith are at least scalable, while operational leaps of faith collapse quickly and predictably to accepting bad data.
More interestingly, I am left with the unescapable conclusion that the core security model of Swartz’s proposal is roughly isomorphic to SSH. That’s fine, but it means that if I’m saying Swartz’s idea doesn’t satisfy Zooko due to security, then neither does SSH.
Of course, it shouldn’t. We all know that in the real world, the response by an administrator to encountering a bad key, is to assume the box was paved for some reason and thus to edit known_hosts and move on. That’s obviously insecure behavior, but to anyone who’s administered more than a box or two, that’s life. We’re attempting to fix that, of course, by letting the guy who paves the box update the key via DNSSEC SSHFP records, shifting towards security and away from decentralization. You don’t get to change client admin behavior — or remove client key overrides — without a clear and easy path for the server administrator to keep things functional.
So, yeah. I’m saying SSH isn’t as secure as it will someday be. A day without known_hosts will be a good day indeed.
I’ve had code in OpenSSH for the last decade or so; I hope I’m allowed to say that :) And if you write something that says “Kaminsky says SSH is Insecure” I will be very unhappy indeed. SSH is doing what it can with the resources it has.
The most interesting conclusion is then that there must be at least some float in Zooko’s Triangle. After all, OpenSSH is moving from just known_hosts to integrating a homegrown PKI/CA system, shifting away from decentralization and towards security. Inconvenient, but it’s what the data seems to say.
Looking forward to what Zooko and Aaron have to say.