Home > Security > Be Still My Breaking Heart

Be Still My Breaking Heart

Abstract:  Heartbleed wasn’t fun.  It represents us moving from “attacks could happen” to “attacks have happened”, and that’s not necessarily a good thing.  The larger takeaway actually isn’t “This wouldn’t have happened if we didn’t add Ping”, the takeaway is “We can’t even add Ping, how the heck are we going to fix everything else?”.  The answer is that we need to take Matthew Green’s advice, start getting serious about figuring out what software has become Critical Infrastructure to the global economy, and dedicating genuine resources to supporting that code.  It took three years to find Heartbleed.  We have to move towards a model of No More Accidental Finds.

======================

You know, I’d hoped I’d be able to avoid a long form writeup on Heartbleed.  Such things are not meant to be.  I’m going to leave many of the gory technical details to others, but there’s a few angles that haven’t been addressed and really need to be.  So, let’s talk.  What to make of all this noise?

First off, there’s been a subtle shift in the risk calculus around security vulnerabilities.  Before, we used to say:  “A flaw has been discovered.  Fix it, before it’s too late.”  In the case of Heartbleed, the presumption is that it’s already too late, that all information that could be extracted, has been extracted, and that pretty much everyone needs to execute emergency remediation procedures.

It’s a significant change, to assume the worst has already occurred.

It always seems like a good idea in security to emphasize prudence over accuracy, possible risk over evidence of actual attack.  And frankly this policy has been run by the privacy community for some time now.  Is this a positive shift?  It certainly allows an answer to the question for your average consumer, “What am I supposed to do in response to this Internet ending bug?”  “Well, presume all your passwords leaked and change them!”

I worry, and not merely because “You can’t be too careful” has not at all been an entirely pleasant policy in the real world.  We have lots of bugs in software.  Shall we presume every browser flaw not only needs to be patched, but has already been exploited globally worldwide, and you should wipe your machine any time one is discovered?  This OpenSSL flaw is pernicious, sure.  We’ve had big flaws before, ones that didn’t just provide read access to remote memory either.  Why the freak out here?

Because we expected better, here, of all places.

There’s been quite a bit of talk, about how we never should have been exposed to Heartbleed at all, because TLS heartbeats aren’t all that important a feature anyway.  Yes, it’s 2014, and the security community is complaining about Ping again.  This is of course pretty rich, given that it seems half of us just spent the last few days pinging the entire Internet to see who’s still exposed to this particular flaw.  We in security sort of have blinders on, in that if the feature isn’t immediately and obviously useful to us, we don’t see the point.

In general, you don’t want to see a protocol designed by the security community.  It won’t do much.  In return (with the notable and very appreciated exception of Dan Bernstein), the security community doesn’t want to design you a protocol.  It’s pretty miserable work.  Thanks to what I’ll charitably describe as “unbound creativity” the fast and dumb and unmodifying design of the Internet has made way to a hodge podge of proxies and routers and “smart” middleboxes that do who knows what.  Protocol design is miserable, nothing is elegant.  Anyone who’s spent a day or two trying to make P2P VoIP work on the modern Internet discovers very quickly why Skype was worth billions.  It worked, no matter what.

Anyway, in an alternate universe TLS heartbeats (with full ping functionality) are a beloved security feature of the protocol as they’re the key to constant time, constant bandwidth tunneling of data over TLS without horrifying application layer hacks.  As is, they’re tremendously useful for keeping sessions alive, a thing I’d expect hackers with even a mild amount of experience with remote shells to appreciate.  The Internet is moving to long lived sessions, as all Gmail users can attest to.  KeepAlives keep long lived things working.  SSH has been supporting protocol layer KeepAlives forever, as can be seen:

The takeaway here is not “If only we hadn’t added ping, this wouldn’t have happened.”  The true lesson is, “If only we hadn’t added anything at all, this wouldn’t have happened.”  In other words, if we can’t even securely implement Ping, how could we ever demand “important” changes?  Those changes tend to be much more fiddly, much more complicated, much riskier.  But if we can’t even securely add this line of code:

if (1 + 2 + payload + 16 > s->s3->rrec.length)

I know Neel Mehta.  I really like Neel Mehta.  It shouldn’t take absolute heroism, one of the smartest guys in our community, and three years for somebody to notice a flaw when there’s a straight up length field in the patch.  And that, I think, is a major and unspoken component of the panic around Heartbleed.  The OpenSSL dev shouldn’t have written this (on New Years Eve, at 1AM apparently).  His coauthors and release engineers shouldn’t have let it through.  The distros should have noticed.  Somebody should have been watching the till, at least this one particular till, and it seems nobody was.

Nobody publicly, anyway.

If we’re going to fix the Internet, if we’re going to really change things, we’re going to need the freedom to do a lot more dramatic changes than just Ping over TLS.  We have to be able to manage more; we’re failing at less.

There’s a lot of rigamarole around defense in depth, other languages that OpenSSL could be written in, “provable software”, etc.  Everyone, myself included, has some toy that would have fixed this.  But you know, word from the Wall Street Journal is that there have been all of $841 in donations to the OpenSSL project to address this matter.  We are building the most important technologies for the global economy on shockingly underfunded infrastructure.  We are truly living through Code in the Age of Cholera.

Professor Matthew Green of Johns Hopkins University recently commented that he’s been running around telling the world for some time that OpenSSL is Critical Infrastructure.  He’s right.  He really is.  The conclusion is resisted strongly, because you cannot imagine the regulatory hassles normally involved with traditionally being deemed Critical Infrastructure.  A world where SSL stacks have to be audited and operated against such standards is a world that doesn’t run SSL stacks at all.

And so, finally, we end up with what to learn from Heartbleed.  First, we need a new model of Critical Infrastructure protection, one that dedicates real financial resources to the safety and stability of the code our global economy depends on – without attempting to regulate that code to death.  And second, we need to actually identify that code.

When I said that we expected better of OpenSSL, it’s not merely that there’s some sense that security-driven code should be of higher quality.  (OpenSSL is legendary for being considered a mess, internally.)  It’s that the number of systems that depend on it, and then expose that dependency to the outside world, are considerable.  This is security’s largest contributed dependency, but it’s not necessarily the software ecosystem’s largest dependency.  Many, maybe even more systems depend on web servers like Apache, nginx, and IIS.  We fear vulnerabilities significantly more in libz than libbz2 than libxz, because more servers will decompress untrusted gzip over bzip2 over xz.  Vulnerabilities are not always in obvious places – people underestimate just how exposed things like libxml and libcurl and libjpeg are.  And as HD Moore showed me some time ago, the embedded space is its own universe of pain, with 90’s bugs covering entire countries.

If we accept that a software dependency becomes Critical Infrastructure at some level of economic dependency, the game becomes identifying those dependencies, and delivering direct technical and even financial support.  What are the one million most important lines of code that are reachable by attackers, and least covered by defenders?  (The browsers, for example, are very reachable by attackers but actually defended pretty zealously – FFMPEG public is not FFMPEG in Chrome.)

Note that not all code, even in the same project, is equally exposed.    It’s tempting to say it’s a needle in a haystack.  But I promise you this:  Anybody patches Linux/net/ipv4/tcp_input.c (which handles inbound network for Linux), a hundred alerts are fired and many of them are not to individuals anyone would call friendly.  One guy, one night, patched OpenSSL.  Not enough defenders noticed, and it took Neel Mehta to do something.

We fix that, or this happens again.  And again.  And again.

No more accidental finds.  The stakes are just too high.

Categories: Security
  1. April 10, 2014 at 7:30 pm

    It turns out the code wasn’t authored at 11pm GMT on new years eve, that’s just when the commit was accepted into the core. Who knows if that means code review happened just prior or not.

  2. April 10, 2014 at 9:09 pm

    The problem is: nobody is willing to pay for this, Dan. There is not sufficient revenue to be made in preventing these bugs and there is no economic disadvantage in keeping the current level of risk. We’d need to have companies fined severely for every lost customer data record. Or for exploits that could have been prevented (“state of the art”-type legislation). Once companies have a risk of $50 per lost record to account against, there is suffcient advantage in paying for security audits instead. And due to the beauty of most relevant code being open source, the society as a whole could benefit.

    • Mark Overholser
      April 11, 2014 at 9:42 pm

      Let’s say, for the sake of argument, that companies did pay for code audits and discover flaws like this in the open-source code they were using. Under the model you’re describing, there would definitely be an incentive for them to fix the flaw for themselves. However, I don’t believe there would be a strong enough incentive for them to push their fix (the revelation of and patching of which cost them money) upstream into the public code base. Yes, there is a long-term cost involved in maintaining the patch in-house and having to re-validate and re-apply the patch as newer releases of the public code come out, and that does create some incentive to publicly disclose the results so that the patch would be maintained by the public.

      However, there would also be an incentive to keep the patch private for the very reason that public disclosure would potentially help competitors, and I think that companies in a predominantly capitalistic society would not want to give their competitors the benefit of protections afforded by a patch that they themselves had to pay to discover and fix. Instead, keeping the patch private affords them protection, as well as the potential that their competitors might be exploited (not by them, of course…) and fined.

      In summary, I agree that economic sanctions on companies who “allow” themselves to be breached would create incentives for them to be safer, but I disagree that society would readily reap the benefits by getting more upstream patches and better code. This doesn’t even take into consideration the fact that some companies would simply accept the risk of sanction and do nothing to prevent exploitation. Exploitation and fines aren’t guaranteed, just a potential risk, therefore the associated costs are intangible. The cost of paying someone to look for flaws and fix them is entirely tangible. As a result, I believe that many companies would simply look to spend their money elsewhere where the benefits would be more easily compared to the costs (advertising, public relations, lead generation, improved operational efficiency, etc.).

  3. April 11, 2014 at 12:00 am

    Dan, this piece sums up the problem brilliantly. Thanks for taking the time to write it.

  4. April 11, 2014 at 12:03 am

    Also, I wonder if the dev who wrote that code is in a bar getting drunk? It would make for an interesting scene.

    Barman: “Hey buddy, you look like you’ve had a rough week. What happened?”
    Dev: “I got drunk and broke the Internet”

  5. Ryan
    April 11, 2014 at 8:58 pm

    How about the two big elephants in the room:
    1) Why does adding a small commit to an (apparently) non-security related part of my code endanger my entire application?
    2) Why does running application X on potentially malicious inputs put my entire machine at risk (or even affect unrelated components)?

    Instead of throwing money at stopping never-ending instances of these problems (such code-reviews that would have prevented heartbleed and a potential decompression vulnerability), we need to take a step back and determine the best way to tackle these *classes* of problems.

    • April 12, 2014 at 12:29 am

      Anyone communicating with application X is able to benefit from bugs in X, which sounds like (i) not everyone should be privileged to receive from X, and (ii) not all parts of X should have the same privileges to X’s data. Add to this that is the fact that this particular X is managing data for Y, Z W and numerous others, none of which should be retained except under stringent confidentiality controls…
      Regrettably, the proposed solutions to these have disfunctions of their own…

      • Lennie
        April 12, 2014 at 7:21 am

        I don’t know enough of the TLS protocol by heart to know, but would it be possible to do process separation ?

        Where you have 3 processes: one that produces the data.

        One that handles encryption of that data and a third process that handles the details of the TLS-protocol with only seeing the encrypted data.

        My guess would be no, but could something like TLS 1.3 support it ?

        • April 12, 2014 at 12:51 pm

          I don’t think the protocol would particularly constrain the structure of the program. We do arguably need a pattern and/or architecture of separation of concerns in security-involved programs…

          • Ryan
            April 12, 2014 at 8:12 pm

            Any architecture would require abstractions, but C is unsutiable for providing these abstractions since it doesn’t enfore them (for example, a bounds violation let’s an attacker potentially read unrelated pieces of memory; in some cases it lets him run arbitrary code).

            Lennie’s solution of multi-processing is 1 step up since processes do enfore abstraction of seperate memory layouts (i.e., by default process A can’t read processes B memory), so if done correctly, it could have prevented heartbleed. That said, if your language requires you to isolate every part of your program in a seperate process to achieve isolation, the development/run-time overhead is a lot higher (probably unacceptable). At this point, it’s time to use a language that enforces abstraction?

            • Lennie
              April 13, 2014 at 12:04 pm

              I guess it’s a similar problem to offloading encryption to hardware. If only the encryption part is offloaded to an other process then at least the private key can never be leaked in a similar fashion to this.

  6. px
    April 11, 2014 at 9:01 pm

    As an eternal optimist, I’m obliged to point out the bright side of all this, which is the fact that we are actually *finding* bugs like these now, rather than having them littered across all of our most trusted software stacks for eternity. I do think that this particular finding came as the result of increased interest, due to bug bounty programs sponsored by Google, Facebook, Microsoft, etc.

    https://hackerone.com/ibb
    http://googleonlinesecurity.blogspot.com/2013/10/going-beyond-vulnerability-rewards.html

    Yes I’m aware that Neel is a Google employee, and that he donated his entire 15k bounty from IBB to the Freedom of The Press Foundation, so clearly cash wasn’t his primary motivation, but the fact that major companies are supporting these programs definitely adds an extra air of prestige to the findings.

    What we should be doing is encouraging more big companies to join in and start rewarding those who make the Internet more secure for everyone. Hopefully this trend of rewarding researchers continues, or even grows, because as it does, stupid bugs like this will become more and more rare, and the general public can start feeling somewhat safe in their daily use of modern technology.

  7. April 11, 2014 at 11:14 pm

    I will repost my comment to Poul-Henning Kamp’s ACM Queue article, “A Generation Lost in the Bazaar” (http://queue.acm.org/detail.cfm?id=2349257):

    Mr. Kamp, what you wrote needed to be said. Thank you for saying it. Along this line, I have been thinking about a “quality improvement institute, factory and service” that might help. Open source is well established now in industry, and the problems are apparent to all. However, the benefits are seen as even greater. This creates an opportunity to improve existing open source software and future development and maintenance processes. (Not surprisingly, this same opportunity exists for closed source software.) Therefore, there should be a willingness on the part of government and industry to fund quality improvement. (I don’t attempt to specify whether such organization(s) are for-profit or not-for-profit, but their products would need to be open for use without compensation for them to benefit open source developers.) There are several approaches that should be pursued: (1) Improved training and education for developers; (2) Identification and promulgation of better or best practices (in choices of languages, tools and development processes); (3) Identification of existing software most in need of remediation or replacement; (4) Funding remediation/replacement of the most important needs; (5) Outreach to existing projects with offers to help adopt better practices; (6) Outreach to academia to encourage training in better practices; (7) Advice for starting new projects following best practices appropriate for the project size; (8) Funding development of tools to support better or best practices. (9) Review of existing tools and offering constructive suggestions for improvement, perhaps with funding. I would be happy to elaborate on these ideas with anyone interested.

  8. kune
    April 11, 2014 at 11:23 pm

    This whole situation reminds me of the poor fire protection during the middle ages. Strasbourg burned down eight times in the 14th century. Later regulations were written but not enforced, a factor of the Great Fire in London 1666, which destroyed the homes of 70 000 of the 80 000 inhabitants.

    The issue is that there are industries that know how to write secure code. Companies that provide the software-controlled brakes for modern cars have processes to prevent bugs that would cost them millions if their customers would have to recall cars to fix those bugs. Writing code in those environments is not fun, but for what reason should the production of software fun if several hundred millions of people depend on it.

    However my expectation is that the IT/Cyber industry needs to be burned or bleeded several times more, before the government will outlaw the use of memcpy on protocol level functions and enforce such rules. Access to packet content should always happen using functions operating on packet buffer abstractions. This requires however proper documentation of the internal APIs.

    • April 11, 2014 at 11:31 pm

      My understanding is that the Toyota brake failures were caused by a stack overflow virulent enough to occur w/o any external influence (i.e. network attackers).

    • April 12, 2014 at 12:34 am

      Just as a side comment, I took a safety-critical realtime course at York university some years ago. Intellectually hard, but eminently doable

  9. Steve
    April 12, 2014 at 12:12 am

    Dan you are brilliant!!! Please come back to defcon!!!

  10. Gilbert
    April 12, 2014 at 12:58 pm

    Doing complete code reviews for security takes times and is hard. The OpenBSD developers did one, years ago. They took their code base, and did huge code reviews. By doing so, they found countless “bugs” and soon realized that most bugs could have become holes in security. Not only doing good programming and reviews of code when comitting is important (using automated tools there to look for possible buffer overflows is good) but you need to check the whole code base too.

    OpenBSD experience shows it gives good results and rewards, but it takes a long time to do, and requires experienced people. And you must do it when well awake and attentive. If you do this for too long, you will have bugs right into your eyeballs you won’t see.

    Using tools to look for vulnerabilities is a start perhaps ?

  11. DK
    April 12, 2014 at 3:23 pm

    Do you think certifiable software (i.e. software that can be mathematically proven to be bug-free) will ever, pragmatically speaking, be a part of the solution? I.e. the Flint project at Yale.

    • April 12, 2014 at 7:23 pm

      The problem is that it’s deeply tempting to define your problem such as to make the math tractable, but the actual delivered security unknown. You can very correctly do the wrong thing.

  12. Rilind Shamku
    April 12, 2014 at 7:01 pm

    This is shocking! Very very trivial length check can cause all this mayhem. Anyone that’s read a few C code security books would have been able to avoid this fundamental error. This is not a C issue; it is definitely a human error. C was designed for efficiency and to give programmer more power especially around memory allocation. I can not believe that a crucial project for the internet such as, OpenSSL would let this fly by without checks. I understand that the project is underfunded (ridiculous, obviously giant internet companies don’t care about investing in security), etc… but still a simple source code analysis session would have been able to find this coding error. This leads me to believe that there might be more trivial errors like this one in the code. So with all that said, I believe it is very crucial that we spend the same exact amount of time that we use to develop code in developing source code analysis tools to check our code. Source code analysis is the first process of any code reviews. I hope this is the only major vulnerability with OpenSSL.

  13. Craig
    April 14, 2014 at 4:18 am

    According to comments all over the Internet, Heartbleed could have been prevented with: more eyeballs, better code reviews, banning , more auditing, and/or automated code analysis to save us from ourselves. To this discussion let me add two words I rarely see mentioned: unit tests.

    Most useful, long-lived code is simply too complex for humans to find critical problems simply by eyeballing it. Imagine if there were a testing framework already in place how much more likely it would have been to ask, “What happens if these two numbers which are supposed to be the same were actually different?” Although they cannot guarantee a better outcome, unit tests establish a framework for testing failure hypotheses and catching regressions.

    I will note that the original code submission did not include tests. Neither did the fix. And overall, OpenSSL appears not well structured for unit testing, so it’s not right to blame the patch authors for not rearchitecting the entire library just to submit their changes. Instead, the responsibility is on the primary maintainers to cultivate testable code, and require tests with every bug fix or new feature. None of us is looking forward to the next critical bug in OpenSSL or elsewhere in OpenSourceLandia. I don’t know where it will be, but I can tell you how to place your bets: it will be in code that is complex, in widespread use, and lacks a solid test suite.

    Writing tests is not sexy, and most devs don’t want to do it. Their code works, so why write redundant code to prove that it works?! Besides, unit tests are not shipped. Etc. The point is that unit tests are a gift to the future engineer (maybe yourself) which helps prevent your work from being broken by someone who doesn’t understand or remember all the nuances.

    I am not a test engineer. However, I am a software engineer who has come to strongly appreciate the value of unit tests, and for designing code which can be unit tested.

  1. April 10, 2014 at 5:10 pm
  2. April 10, 2014 at 8:18 pm
  3. April 11, 2014 at 9:00 pm
  4. April 11, 2014 at 9:51 pm
  5. April 12, 2014 at 12:13 am
  6. April 12, 2014 at 12:17 am
  7. April 12, 2014 at 1:51 am
  8. April 12, 2014 at 8:29 pm
  9. April 12, 2014 at 10:28 pm
  10. April 13, 2014 at 4:46 pm
  11. April 14, 2014 at 2:07 am
  12. April 15, 2014 at 12:33 pm
  13. April 15, 2014 at 10:51 pm
  14. April 29, 2014 at 6:11 pm

Leave a comment