Home > Security > Fuzzmarking: Towards Hard Security Metrics For Software Quality?

Fuzzmarking: Towards Hard Security Metrics For Software Quality?

As they say: “If you can’t measure it, you can’t manage it.”

There’s a serious push in the industry right now for security metrics. People really want to know what works — because this ain’t it. But where can we find hard data?

What about fuzzers — the automated, randomized testers that have been so good at finding bugs through sheer brute force?

I had a hypothesis, borne of an optimism that’s probably a bit out of place in our field: I believe, after ten years of pushing for more secure code, software quality has increased across the board — at least in things that have been under active attack. And, in conjunction with Adam Cecchetti and Mike Eddington of Deja Vu Security (and the Peach Project), we developed an experiment to try to show this.

We fuzzed Office and OpenOffice. We fuzzed Acrobat, Foxit, and GhostScript. And we fuzzed them all longitudinally — going back in time, smashing 2003, 2007, and 2010.

175,334 crashes later, we have some…interesting data.  Here’s a view of just what happened to Office — and OpenOffice — between 2003 (when the Summer of Worms hit) and 2010.

Things got better! This isn’t the end all and be all of this approach.  There’s lots that could be going wrong.  But we see similar collapses in PDF crash rates too.  Slides follow below, and a paper will probably be assembled at some point.  But the most important thing we’re doing is — we’re releasing data!  This is easily some of the most fun public data I’ve played with.  I’m looking forward to seeing what sort of visualizations come of them.  Here’s one I put together, looking at individual crash files and their combined effect on Word/Writer 2003/2007/2010:

I’ve wanted to do a study like this for a very long time, and it’s been pretty amazing finally doing a collaboration with DD and Adam!  Thanks also to the vendors — Microsoft, the OpenOffice Team, Adobe, Foxit, and the guys behind GhostScript.  Everyone’s been cool.

Please send feedback if you have thoughts.  I’d love to post some viz based on this data!  And now, without further ado…

Science!  It works.  Or, in this case, it hints of an interesting new tool with its own set of downsides 😉

Categories: Security
  1. pepe
    March 13, 2011 at 9:50 am

    E. Rescorla, Is finding security holes a good idea?, IEEE Security and Privacy Magazine, Vol. 3, No. 1. (January 2005), pp. 14-19. doi:10.1109/MSP.2005.17

  2. Gerard
    April 21, 2011 at 11:30 am

    I am missing something here: you have been testing both office packages against the same document formats – why not against the respective native formats?

    Especially when you contemplate updating an office suite as opposed to switching the office suite, the question may be: do I simply put money on the counter and with that get more stability at work, than I would get if I spend time and effort to convert my current documents into the native format of the other suite?

    Another thing I miss: you exactly specify the MS suite versions (2003, 2007 and 2010), but I seem to be unable to find which OpenOffice or StarOffice version you have opposed to each of these. Release dates obviously also differ, as do possible vulnerabilities as you patch the respective versions – so in essence: if you wish to track 10 years of software development, looking at all available versions and patch levels of both packages between 2001 and 2011 becomes mandatory.

    An aspect that your test obviously can not cater for: in an average office setting you may have documents in various native formats of the respective package. A powerpoint presentation that just got created last week, but an old budget plan that dates back to 2002 will thus be in “different” formats. How do the respective “modern” versions of the office suites handle “old” files in terms of stability?

    Where would I gain more stability: when I convert all “old” files into the respective newest file format of the latest office suite, or if I go through the hassle of converting the files into the newest file format of the respective other version?

    Obviously, that blows up the amount of data and tests required considerably, but it also gives you a much more acurate comparison between MS products and that of “other” development teams.

    Finally (as with any application usually has to run on top of an operating system), a look at the impact of an update of the underlying operating system is also something to muse about: how does MS Office stability improve when looking at the respective OS of that time, and how does that compare to a totally different setup say with OpenOffice on Linux?

    If you are truly looking for metrics that would both quantify and qualify the improvements in software quality over the years, then I’m afraid you’d have to do more comprehensive tests – especially ones that contain much less “unknown” factors, so as to be reproducable (and as such providing you with more reliable results).

    Or, from the current set of data, could you rule out that the next 100000 documents (or for that matter: feeding the same 100000 documents into the respective office packages in a different sequence) would lead to different results?

    Even simple variations would in my opinion very quickly change the found data: imagine “old” MS Office XP formats being written by a much more modern version of MS-Office as opposed to the “original” old office version, correlated to files written by a current version of OpenOffice in that “old” format – or for that matter the just as old version of OpenOffice attempting this.

    As vulnerabilities and simple errors will have been sorted out in the more recent versions, I would suspect that files in the “old” format written by the newer versions of the office suite will also result in considerably less problems, with or without “flipping” single bits or complete bytes in them. Then again, that’s just a hunch – only thorough testing (as required when trying to make a statement about the alledged improvement of the quality of certain software) could confirm or refute that hunch.

    The tests conducted so far seem to point in the correct direction – but for that data to be really usable to provide a definate statement about software quality improvements over the years, much more testing should be done.

    At least in my opinion.

    Kind regards

    Gerard

  1. April 20, 2011 at 6:03 pm

Leave a comment