Home > Security > DakaRand 1.0: Revisiting Clock Drift For Entropy Generation

DakaRand 1.0: Revisiting Clock Drift For Entropy Generation

“The generation of random numbers is too important to be left to chance.”
Robert R. Coveyou

“One out of 200 RSA keys in the field were badly generated as a result of standard dogma.  There’s a chance this might fail less.”
–Me

[Note:  There are times I write things with CIO’s in mind.  This is not one of those times.]

So, I’ve been playing with userspace random number generation, as per Matt Blaze and D.P. Mitchell’s TrueRand from 1996.  (Important:  Matt Blaze has essentially disowned this approach, and seems to be honestly horrified that I’m revisiting it.)  The basic concept is that any system with two clocks has a hardware number generator, since clocks jitter relative to one another based on physical properties, particularly when one is operating on a slow scale (like, say, a human hitting a keyboard) while another is operating on a fast scale (like a CPU counter cycling at nanosecond speeds).  Different tolerances on clocks mean more opportunities for unmodelable noise to enter the system.  And since the core lie of your computer is that it’s just one computer, as opposed to a small network of independent nodes running on their own time, there should be no shortage of bits to mine.

At least, that’s the theory.

As announced at Defcon 20 / Black Hat, here’s DakaRand 1.0.  Let me be the first to say, I don’t know that this works.  Let me also be the first to say, I don’t know that it doesn’t.  DakaRand is a collection of modes that tries to convert the difference between clocks into enough entropy that, whether or not it survives academic attack, would certainly force me (as an actual guy who breaks stuff) to go attack something else.

A proper post on DakaRand is reserved, I think, for when we have some idea that it actually works.  Details can be seen in the slides for the aforementioned talk; what I’d like to focus on now is recommendations for trying to break this code.  The short version:

1) Download DakaRand, untar, and run “sh build.sh”.
2) Run dakarand -v -d out.bin -m [0-8]
3) Predict out.bin, bit for bit, in less than 2^128 work effort, on practically any platform you desire with almost any level of active manipulation you wish to insert.

The slightly longer version:

  1. DakaRand essentially tries to force the attacker into having no better attack than brute force, and then tries to make that work effort at least 2^128.  As such, the code is split into generators that acquire bits, and then a masking sequence of SHA-256, Scrypt, and AES-256-CTR that expands those bits into however much is requested.  (In the wake of Argyros and Kiayias’s excellent and underreported “I Forgot Your Password:  Randomness Attacks Against PHP Applications“, I think it’s time to deprecate all RNG’s with invertable output.  At the point you’re asking whether an RNG should be predictable based on its history, you’ve already lost.)  The upshot of this is that the actual target for a break is not the direct output of DakaRand, but the input to the masking sequence.  Your goal is to show that you can predict this particular stream, with perfect accuracy, at less than 2^128 work effort.  Unless you think you can glean interesting information from the masking sequence (in which case, you have more interesting things to attack than my RNG), you’re stuck trying to design a model of the underlying clock jitter.
  2. There are nine generators in this initial release of DakaRand.  Seriously, they can’t all work.
  3. You control the platform.  Seriously — embedded, desktop, server, VM, whatever — it’s fair game.  About the only constraint that I’ll add is that the device has to be powerful enough to run Linux.  Microcontrollers are about the only things in the world that do play the nanosecond accuracy game, so I’m much less confident against those.  But, against anything ARM or larger, real time operation is simply not a thing you get for free, and even when you pay dearly for it you’re still operating within tolerances far larger than DakaRand needs to mine a bit.  (Systems that are basically cycle-for-cycle emulators don’t count.  Put Bochs and your favorite ICE away.  Nice try though!)
  4. You seriously control the platform.  I’ve got no problem with you remotely spiking the CPU to 100%, sending arbitrary network traffic at whatever times you like, and so on.  The one constraint is that you can’t already have root — so, no physical access, and no injecting yourself into my gathering process.  It’s something of a special case if you’ve got non-root local code execution.  I’d be interested in such a break, but multitenancy is a lie and there’s just so many  interprocess leaks (like this if-it’s-so-obvious-why-didn’t-you-do-it example of cross-VM communication).
  5. Virtual machines get special rules:  You’re allowed to suspend/restore right up to the execution of DakaRand.  That is the point of atomicity.
  6. The code’s a bit hinky, what with globals and a horde of dependencies.  If you’d like to test on a platform that you just can’t get DakaRand to build on, that makes things more interesting, not less.  Email me.
  7. All data generated is mixed into the hash, but bits are “counted” when Von Neumann debiasing works.  Basically, generators return integers between 0 and 2^32-1.  Every integer is mixed into the keying hash (thus, you having to predict out.bin bit for bit).  However, each integer is also measured for the number of 1’s it contains.  An even number yields a 0; an odd number, a 1.  Bits are only counted when two sequential numbers have either a 10 or a 01, and as long as there’s less than 256 bits counted, the generator will continue to be called.  So your attack needs to model the absolute integers returned (which isn’t so bad) and the amount of generator calls it takes for a Von Neumann transition to occur and whether the transition is a 01 or a 10 (since I put that value into the hash too).
  8. I’ve got a default “gap” between generator probes of just 1000us — a millisecond.  This is probably not enough for all platforms — my assumption is that, if anything has to change, that this has to become somewhat dynamic.

Have fun!  Remember, “it might fail somehow somewhere” just got trumped by “it actually did fail all over the place”, so how about we investigate a thing or two that we’re not so sure in advance will actually work?

(Side note:  Couple other projects in this space:  Twuewand, from Ryan Finnie, has the chutzpah to be pure Perl.  And of course, Timer Entropyd, from Folkert Van Heusden.  Also, my recommendation to kernel developers is to do what I’m hearing they’re up to anyway, which is to monitor all the interrupts that hit the system on a nanosecond timescale.  Yep, that’s probably more than enough.)

Categories: Security
  1. August 15, 2012 at 11:15 pm

    Build error on Mac OS X 10.8 (Xcode 4.5-DP3). Also Macports tools installed on system in /opt/local

    dakarand.c: In function ‘measure_usleep_with_clock_monotonic’:
    dakarand.c:66: error: ‘CLOCK_MONOTONIC’ undeclared (first use in this function)
    dakarand.c:66: error: (Each undeclared identifier is reported only once
    dakarand.c:66: error: for each function it appears in.)
    dakarand.c: In function ‘measure_usleep_with_clock_realtime’:
    dakarand.c:78: error: ‘CLOCK_REALTIME’ undeclared (first use in this function)

  2. jpgoldberg
    August 16, 2012 at 2:10 am

    I’ve got patches that appear to work on OS X. I haven’t tested whether it breaks anything elsewhere.

    And these are ugly! On OS X, it is required to set up a handle to talk to the various clocks, so I did some awful stuff to try to make sure to only set up the handles once.

    https://www.dropbox.com/sh/lkhuja9kdsnj7mk/S3YKFLKJWW

    • jpgoldberg
      August 16, 2012 at 1:28 pm

      I had the monotonic and the realtime clocks backwards in the previous patch. As of a few seconds ago, the new patch gets it right (at least according to what I could find in what passes for Darwin documentation). I’ve cited my sources in the comments.

  3. lkafle
    August 16, 2012 at 3:41 am

    Reblogged this on lava kafle kathmandu nepal.

  4. Bill Stewart
    August 16, 2012 at 11:00 pm

    Virtual Machines are one of the really critical targets for RNGs these days, particularly because they don’t have actual spinning metal disks with chaotic airflow or other direct sources of hardware access like their own sound cards or clock chips. (In many cases they don’t always even know what time it is for a while, because they have to wait for ntp to run after they wake up.) Even mobile phone apps are starting to run on VMs these days.

    At some point there should be some standardized interface for the VM to access randomness feeds from the host, and a way to tell whether the host is actually providing it.

    • August 16, 2012 at 11:32 pm

      VM Hosts should be exposing a random device to their guests, even with the DakaRand approach.

      That being said, much to my surprise DakaRand seems to work just fine under a VM. It seems hypervisors are not providing nanosecond-scale real time operation.

      At least, that’s how things appear. I’m hoping people really try to attack this approach!

  5. anon
    August 19, 2012 at 3:00 am

    hey Bill, the standardized interface for entropy in virtual machines is virtio_rng. press all your VMM/hypervisor vendors to support it!

    • August 19, 2012 at 3:34 am

      Agreed. Even if Dakarand works, it should be run in an environment with a maximum number of timeslices (the host).

  6. noopfoop
    August 19, 2012 at 5:05 am

    Did you have a look at havaged? http://www.issihosts.com/haveged/
    Looks like they both could profit from each other – If DakaRrand is working.

  7. jpgoldberg
    August 19, 2012 at 6:11 pm

    One thing I haven’t done on OS X is check to see what clock resolution DakaRand is getting. I really know nothing about breaking RNGs, but the data I’m getting out has a very high number of 0x00 bytes in it. (This is just eyeballing it, no statistical tests at all). This seems odd to me because even if the clock resolution is bad, the hashing should at least mask that to some degree.

    So I suspect that there is something else wrong when this gets built on OS X. I’ve got a FreeBSD system around, and I’ll test on that, and I’ll get some Linux under a VM. But again, I can’t really be a serious Adversary because I just don’t have skills in this area.

    I also know little about chip development and deployment, but I wonder whether other architectures will follow Intel’s Ivy Bridge in putting an RNG in the chip.

    • August 19, 2012 at 8:21 pm

      Are you seeing the 0x00’s in -d out.bin output, or in the “real” output?

      I’m sure some chips will slowly get RNGs, but you know, the high end keeps getting higher. The low end never goes away.

      • jpgoldberg
        August 19, 2012 at 10:46 pm

        I’m seeing them in out.bin. It seems that I would confused about what should be the “real” output.

        Point noted about the low end chips.

        • August 19, 2012 at 10:51 pm

          Yes, it is expected that out.bin will have imperfect entropy. What’s the file size of out.bin, and then a gzipped out.bin.gz?

          out.bin is the output before masking. If you just encrypt 01, 02, 03, 04, 05… it’ll still be 100% entropic (appearing) after masking. So measuring/attacking post mask isn’t too helpful.

  8. August 19, 2012 at 6:51 pm

    Regarding the virtual machines that require entropy data: take a look at http://vanheusden.com/entropybroker/
    This program runs timer/audio/video_entropyd, EGD source or even “a real” RNG, etc on systems capable of them. They then send the data to a central point; the entropy broker which then distributes the data to systems that require entropy data.

    • August 19, 2012 at 8:20 pm

      Yes, you’re in the last paragraph of the post 🙂 I’m not saying I’m the first to build a clock drift entropy engine, just that it’s time the approach got serious attention / possible “on by default” status.

  9. August 20, 2012 at 11:28 pm

    I am the upstream for haveged, aka the Rodney Dangerfield of userspace RNGs. I would like to point out the the mechanism that haveged uses, processor flutter, is likely to be a better source of entropy than clock drift because processor execution is the cumulative result of the many independent hardware clocks inherent in a modern processor architecture. Think of all the asynchronous hardware processes that happen in a recovery from a branch predictor failure, a cache miss, or a page fault. In any real processor, you can observe these effects in userspace w/o a terribly accurate clock. Of course, this is all too complicated to explain by any simple model but the secret is to adopt a pragmatic approach, not sweat the details, and concentrate on the output. The latest haveged incorporates configurable init and/or run time testing to settle the question of whether this works or not. Haveged uses the AIS-31 test suite commonly used to certify smart cards. I certainly agree something like haveged should not be something you need to add to your system to achieve a secure system, but in the absence of that it is a practical solution to a common problem.

    • August 20, 2012 at 11:37 pm

      Ah yes! My apologies for missing Haveged in my research. It indeed looks quite nice. One concern I have is that while there is asynchrony inside a processor, it necessarily has to be kept under control (or more accurately, designers *are* actually taking into account nanosecond differentials and distributing a unified clock pulse). What are the core queries you’re executing/timing to get your bits?

      • August 21, 2012 at 12:03 am

        The core queries are are 8 32-bit clock reads during the generation of 8 32-bit words of output (i.e. the delta between clock readings is determined by how fast the algorithm executes). The process generating the output is constructed to fill the instruction and data level 1 caches and includes a sequence that exercises the branch predictors. The output uses an intermediate walk table (aka the exercise for the level 1 data cache) to XOR the clock readings with the contents of the intermediate table which is XORed with the the output value. A couple of linear-congruence type selectors determine which intermediate values are mixed with the output. This approximates the mixing of independent entropy sources, which is important because the actual entorpy measure of the clock source alone seems to be ~6bits/byte. I do not have any claim to the creation of the HAVEGE algorithm, but even after working with it for 3+years, it still seems like pure genius.

  10. August 20, 2012 at 11:36 pm

    Link should be http://www.issihosts.com

  11. September 2, 2012 at 3:06 pm

    Dan: I just released a new version of entropy broker. It has numerous fixes and enhancements.
    http://vanheusden.com/entropybroker/

  1. May 20, 2015 at 7:58 pm

Leave a comment