DakaRand 1.0: Revisiting Clock Drift For Entropy Generation

Home > Security > DakaRand 1.0: Revisiting Clock Drift For Entropy Generation

DakaRand 1.0: Revisiting Clock Drift For Entropy Generation

August 15, 2012 Dan Kaminsky Leave a comment Go to comments

“The generation of random numbers is too important to be left to chance.”
—Robert R. Coveyou

“One out of 200 RSA keys in the field were badly generated as a result of standard dogma. There’s a chance this might fail less.”
–Me

[Note: There are times I write things with CIO’s in mind. This is not one of those times.]

So, I’ve been playing with userspace random number generation, as per Matt Blaze and D.P. Mitchell’s TrueRand from 1996. (Important: Matt Blaze has essentially disowned this approach, and seems to be honestly horrified that I’m revisiting it.) The basic concept is that any system with two clocks has a hardware number generator, since clocks jitter relative to one another based on physical properties, particularly when one is operating on a slow scale (like, say, a human hitting a keyboard) while another is operating on a fast scale (like a CPU counter cycling at nanosecond speeds). Different tolerances on clocks mean more opportunities for unmodelable noise to enter the system. And since the core lie of your computer is that it’s just one computer, as opposed to a small network of independent nodes running on their own time, there should be no shortage of bits to mine.

At least, that’s the theory.

As announced at Defcon 20 / Black Hat, here’s DakaRand 1.0. Let me be the first to say, I don’t know that this works. Let me also be the first to say, I don’t know that it doesn’t. DakaRand is a collection of modes that tries to convert the difference between clocks into enough entropy that, whether or not it survives academic attack, would certainly force me (as an actual guy who breaks stuff) to go attack something else.

A proper post on DakaRand is reserved, I think, for when we have some idea that it actually works. Details can be seen in the slides for the aforementioned talk; what I’d like to focus on now is recommendations for trying to break this code. The short version:

1) Download DakaRand, untar, and run “sh build.sh”.
2) Run dakarand -v -d out.bin -m [0-8]
3) Predict out.bin, bit for bit, in less than 2^128 work effort, on practically any platform you desire with almost any level of active manipulation you wish to insert.

The slightly longer version:

DakaRand essentially tries to force the attacker into having no better attack than brute force, and then tries to make that work effort at least 2^128. As such, the code is split into generators that acquire bits, and then a masking sequence of SHA-256, Scrypt, and AES-256-CTR that expands those bits into however much is requested. (In the wake of Argyros and Kiayias’s excellent and underreported “I Forgot Your Password: Randomness Attacks Against PHP Applications“, I think it’s time to deprecate all RNG’s with invertable output. At the point you’re asking whether an RNG should be predictable based on its history, you’ve already lost.) The upshot of this is that the actual target for a break is not the direct output of DakaRand, but the input to the masking sequence. Your goal is to show that you can predict this particular stream, with perfect accuracy, at less than 2^128 work effort. Unless you think you can glean interesting information from the masking sequence (in which case, you have more interesting things to attack than my RNG), you’re stuck trying to design a model of the underlying clock jitter.
There are nine generators in this initial release of DakaRand. Seriously, they can’t all work.
You control the platform. Seriously — embedded, desktop, server, VM, whatever — it’s fair game. About the only constraint that I’ll add is that the device has to be powerful enough to run Linux. Microcontrollers are about the only things in the world that do play the nanosecond accuracy game, so I’m much less confident against those. But, against anything ARM or larger, real time operation is simply not a thing you get for free, and even when you pay dearly for it you’re still operating within tolerances far larger than DakaRand needs to mine a bit. (Systems that are basically cycle-for-cycle emulators don’t count. Put Bochs and your favorite ICE away. Nice try though!)
You seriously control the platform. I’ve got no problem with you remotely spiking the CPU to 100%, sending arbitrary network traffic at whatever times you like, and so on. The one constraint is that you can’t already have root — so, no physical access, and no injecting yourself into my gathering process. It’s something of a special case if you’ve got non-root local code execution. I’d be interested in such a break, but multitenancy is a lie and there’s just so many interprocess leaks (like this if-it’s-so-obvious-why-didn’t-you-do-it example of cross-VM communication).
Virtual machines get special rules: You’re allowed to suspend/restore right up to the execution of DakaRand. That is the point of atomicity.
The code’s a bit hinky, what with globals and a horde of dependencies. If you’d like to test on a platform that you just can’t get DakaRand to build on, that makes things more interesting, not less. Email me.
All data generated is mixed into the hash, but bits are “counted” when Von Neumann debiasing works. Basically, generators return integers between 0 and 2^32-1. Every integer is mixed into the keying hash (thus, you having to predict out.bin bit for bit). However, each integer is also measured for the number of 1’s it contains. An even number yields a 0; an odd number, a 1. Bits are only counted when two sequential numbers have either a 10 or a 01, and as long as there’s less than 256 bits counted, the generator will continue to be called. So your attack needs to model the absolute integers returned (which isn’t so bad) and the amount of generator calls it takes for a Von Neumann transition to occur and whether the transition is a 01 or a 10 (since I put that value into the hash too).
I’ve got a default “gap” between generator probes of just 1000us — a millisecond. This is probably not enough for all platforms — my assumption is that, if anything has to change, that this has to become somewhat dynamic.

Have fun! Remember, “it might fail somehow somewhere” just got trumped by “it actually did fail all over the place”, so how about we investigate a thing or two that we’re not so sure in advance will actually work?

(Side note: Couple other projects in this space: Twuewand, from Ryan Finnie, has the chutzpah to be pure Perl. And of course, Timer Entropyd, from Folkert Van Heusden. Also, my recommendation to kernel developers is to do what I’m hearing they’re up to anyway, which is to monitor all the interrupts that hit the system on a nanosecond timescale. Yep, that’s probably more than enough.)

Categories: Security

Comments (21) Trackbacks (1) Leave a comment Trackback

jpgoldberg

August 15, 2012 at 11:15 pm

Reply

Build error on Mac OS X 10.8 (Xcode 4.5-DP3). Also Macports tools installed on system in /opt/local

dakarand.c: In function ‘measure_usleep_with_clock_monotonic’:
dakarand.c:66: error: ‘CLOCK_MONOTONIC’ undeclared (first use in this function)
dakarand.c:66: error: (Each undeclared identifier is reported only once
dakarand.c:66: error: for each function it appears in.)
dakarand.c: In function ‘measure_usleep_with_clock_realtime’:
dakarand.c:78: error: ‘CLOCK_REALTIME’ undeclared (first use in this function)
- Dan Kaminsky
  
  August 15, 2012 at 11:53 pm
  
  Reply
  
  Interesting. Apparently there’s enough information to make things work on OSX here:
  
  http://stackoverflow.com/questions/5167269/clock-gettime-alternative-in-mac-os-x
  
  I don’t have an OSX box handy. A patch would:
  
  1) Disable mode 4, and make mode 5 default on OSX
  2) Turn CLOCK_REALTIME calls into a macro that is defined by whether the code is being compiled on OSX or not
jpgoldberg

August 16, 2012 at 2:10 am

Reply

I’ve got patches that appear to work on OS X. I haven’t tested whether it breaks anything elsewhere.

And these are ugly! On OS X, it is required to set up a handle to talk to the various clocks, so I did some awful stuff to try to make sure to only set up the handles once.

https://www.dropbox.com/sh/lkhuja9kdsnj7mk/S3YKFLKJWW
- jpgoldberg
  
  August 16, 2012 at 1:28 pm
  
  Reply
  
  I had the monotonic and the realtime clocks backwards in the previous patch. As of a few seconds ago, the new patch gets it right (at least according to what I could find in what passes for Darwin documentation). I’ve cited my sources in the comments.
lkafle

August 16, 2012 at 3:41 am

Reply

Reblogged this on lava kafle kathmandu nepal.
Bill Stewart

August 16, 2012 at 11:00 pm

Reply

Virtual Machines are one of the really critical targets for RNGs these days, particularly because they don’t have actual spinning metal disks with chaotic airflow or other direct sources of hardware access like their own sound cards or clock chips. (In many cases they don’t always even know what time it is for a while, because they have to wait for ntp to run after they wake up.) Even mobile phone apps are starting to run on VMs these days.

At some point there should be some standardized interface for the VM to access randomness feeds from the host, and a way to tell whether the host is actually providing it.
- Dan Kaminsky
  
  August 16, 2012 at 11:32 pm
  
  Reply
  
  VM Hosts should be exposing a random device to their guests, even with the DakaRand approach.
  
  That being said, much to my surprise DakaRand seems to work just fine under a VM. It seems hypervisors are not providing nanosecond-scale real time operation.
  
  At least, that’s how things appear. I’m hoping people really try to attack this approach!
anon

August 19, 2012 at 3:00 am

Reply

hey Bill, the standardized interface for entropy in virtual machines is virtio_rng. press all your VMM/hypervisor vendors to support it!
- Dan Kaminsky
  
  August 19, 2012 at 3:34 am
  
  Reply
  
  Agreed. Even if Dakarand works, it should be run in an environment with a maximum number of timeslices (the host).
noopfoop

August 19, 2012 at 5:05 am

Reply

Did you have a look at havaged? http://www.issihosts.com/haveged/
Looks like they both could profit from each other – If DakaRrand is working.
jpgoldberg

August 19, 2012 at 6:11 pm

Reply

One thing I haven’t done on OS X is check to see what clock resolution DakaRand is getting. I really know nothing about breaking RNGs, but the data I’m getting out has a very high number of 0x00 bytes in it. (This is just eyeballing it, no statistical tests at all). This seems odd to me because even if the clock resolution is bad, the hashing should at least mask that to some degree.

So I suspect that there is something else wrong when this gets built on OS X. I’ve got a FreeBSD system around, and I’ll test on that, and I’ll get some Linux under a VM. But again, I can’t really be a serious Adversary because I just don’t have skills in this area.

I also know little about chip development and deployment, but I wonder whether other architectures will follow Intel’s Ivy Bridge in putting an RNG in the chip.
- Dan Kaminsky
  
  August 19, 2012 at 8:21 pm
  
  Reply
  
  Are you seeing the 0x00’s in -d out.bin output, or in the “real” output?
  
  I’m sure some chips will slowly get RNGs, but you know, the high end keeps getting higher. The low end never goes away.
  - jpgoldberg
    
    August 19, 2012 at 10:46 pm
    
    Reply
    
    I’m seeing them in out.bin. It seems that I would confused about what should be the “real” output.
    
    Point noted about the low end chips.
    - Dan Kaminsky
      
      August 19, 2012 at 10:51 pm
      
      Reply
      
      Yes, it is expected that out.bin will have imperfect entropy. What’s the file size of out.bin, and then a gzipped out.bin.gz?
      
      out.bin is the output before masking. If you just encrypt 01, 02, 03, 04, 05… it’ll still be 100% entropic (appearing) after masking. So measuring/attacking post mask isn’t too helpful.
Folkert Van Heusden

August 19, 2012 at 6:51 pm

Reply

Regarding the virtual machines that require entropy data: take a look at http://vanheusden.com/entropybroker/
This program runs timer/audio/video_entropyd, EGD source or even “a real” RNG, etc on systems capable of them. They then send the data to a central point; the entropy broker which then distributes the data to systems that require entropy data.
- Dan Kaminsky
  
  August 19, 2012 at 8:20 pm
  
  Reply
  
  Yes, you’re in the last paragraph of the post 🙂 I’m not saying I’m the first to build a clock drift entropy engine, just that it’s time the approach got serious attention / possible “on by default” status.
GWuertz

August 20, 2012 at 11:28 pm

Reply

I am the upstream for haveged, aka the Rodney Dangerfield of userspace RNGs. I would like to point out the the mechanism that haveged uses, processor flutter, is likely to be a better source of entropy than clock drift because processor execution is the cumulative result of the many independent hardware clocks inherent in a modern processor architecture. Think of all the asynchronous hardware processes that happen in a recovery from a branch predictor failure, a cache miss, or a page fault. In any real processor, you can observe these effects in userspace w/o a terribly accurate clock. Of course, this is all too complicated to explain by any simple model but the secret is to adopt a pragmatic approach, not sweat the details, and concentrate on the output. The latest haveged incorporates configurable init and/or run time testing to settle the question of whether this works or not. Haveged uses the AIS-31 test suite commonly used to certify smart cards. I certainly agree something like haveged should not be something you need to add to your system to achieve a secure system, but in the absence of that it is a practical solution to a common problem.
- Dan Kaminsky
  
  August 20, 2012 at 11:37 pm
  
  Reply
  
  Ah yes! My apologies for missing Haveged in my research. It indeed looks quite nice. One concern I have is that while there is asynchrony inside a processor, it necessarily has to be kept under control (or more accurately, designers *are* actually taking into account nanosecond differentials and distributing a unified clock pulse). What are the core queries you’re executing/timing to get your bits?
  - GWuertz
    
    August 21, 2012 at 12:03 am
    
    Reply
    
    The core queries are are 8 32-bit clock reads during the generation of 8 32-bit words of output (i.e. the delta between clock readings is determined by how fast the algorithm executes). The process generating the output is constructed to fill the instruction and data level 1 caches and includes a sequence that exercises the branch predictors. The output uses an intermediate walk table (aka the exercise for the level 1 data cache) to XOR the clock readings with the contents of the intermediate table which is XORed with the the output value. A couple of linear-congruence type selectors determine which intermediate values are mixed with the output. This approximates the mixing of independent entropy sources, which is important because the actual entorpy measure of the clock source alone seems to be ~6bits/byte. I do not have any claim to the creation of the HAVEGE algorithm, but even after working with it for 3+years, it still seems like pure genius.
GWuertz

August 20, 2012 at 11:36 pm

Reply

Link should be http://www.issihosts.com
Folkert Van Heusden

September 2, 2012 at 3:06 pm

Reply

Dan: I just released a new version of entropy broker. It has numerous fixes and enhancements.
http://vanheusden.com/entropybroker/

May 20, 2015 at 7:58 pm

Offline RSA strong prime test similar to Phuctor? | DL-UAT

Security Talks

2014

Yet Another Dan Kaminsky Talk: Hard Drive Operating Systems, Storage XOR Execution, Secure Random By Default, Cryptomnemonics, Ending Use After Free in Browsers, Fast Spoofed DDoS Tracing, NSA Crypto Fallout
Slides
2012

Black Ops: Practical System-Wide Timing Attack Defense, Real World Entropy Generation For Devices, Safe String Interpolation, Image Loads For Censorship Detection, Certificate Extraction w/ Flash Sockets, Stateless TCP Sockets
Slides
2011

Black Ops of TCP/IP 2011: Bitcoin Cloud Deanon/Data Embedding, External Interface UPNP, TCP SEQ# Attacks Revisted, Generic Password to Asymmetric Key Generation, Net Neutrality Validation
Slides
2010

Introducing The Domain Key Infrastructure:
Zero Configuration DNSSEC Serving, End-To-End Client Integration w/ UI Via OpenSSL and Secure Proxies, Federated OpenSSH, DNS over HTTP/X.509, Self-Securing URLs, Secure Scalable Email (Finally!)
Slides
Code (Phreebird Suite)
Black Hat USA Slides

Interpolique:
Where's The Safety in Type Safety?, Preventing Injection Attacks (XSS/SQL) With String Safety, Why Ease Of Use Matters, Automatic Query Parameterization, How LISP Was Right About Dynamic Scope, Dynamic DOM Manipulation For Secure Integration of Untrusted HTML
Slides Audio
Code

Realism in Web Defense:
Why Security Fails, What's Wrong With Session Management On The Web, The Failure Of Referrer Checking, Interpreter Suicide, Towards a Real Session Context, Treelocking, The Beginnings of Interpolique
Slides
2009

Staring Into The Abyss:
Middleware Fingerprinting, Firewall Rule Bypass, Internal Address Disclosure, Same Origin Attacks Against Proxied Hosts, TCP NAT2NAT via Active FTP And TCP Spoofing
Slides Paper

Black Ops Of PKI:
Structural Weaknesses of X.509, Architectural Advantages of DNSSEC, ASN.1 Confusion, Null Terminator Attacks Against Certificates
Slides Video
Financial Cryptography Paper
2008

It's The End Of The Cache As We Know It:
DNS Server+Client Cache Poisoning, Issues with SSL, Breaking “Forgot My Password” Systems, Attacking Autoupdaters and Unhardened Parsers, Rerouting Internal Traffic
Black Hat Slides
BH Fed Slides (Adds Drupal, DNSSEC)
Video Audio
"Illustrated Guide To The Kaminsky Bug"
Sarah on DNS

Ad Injection Gone Wild:
Subdomain NXDOMAIN injection for Universal Cross Site Scripting
Slides
2007

Design Reviewing The Web:
DNS Rebinding, VPN to the Browser, Provider Hostility Detection, Audio CAPTCHA Analysis
Slides Video
2006

Pattern Recognition:
Net Neutrality Violation Detection, Large Scale SSL Scanning, Securing Online Banking, Cryptomnemonics, Context Free Grammar Fuzzing, Security Dotplots
Slides
Weaponizing Noam Chomsky, or Hacking with Pattern Languages:
The Nymic Domain, XML Trees For Automatically Extracted Grammar, Syntax Highlighting for Compression Depth, Live Discovered Grammar Rendering, "CFG9000" Context Free Grammar Fuzzer, Dotplots for Format Identification and Fuzzer Guidance, Tilt Shift Dotplots, Visual Bindiff
Slides Video Code
2005:

Black Ops of TCP/IP 2005.5:
Worldwide DNS Scans, Temporal IDS Evasion, the Sony Rootkit, MD5 Conflation of Web Pages
Slides Video
2004:

MD5 To Be Considered Harmful Someday:
Applied Attacks Against Simple Collisions Via Malicious Appendage, Executable Confusion, Auditor Bypass, Bit Commitment Shirking, HMAC Implications, Collision Steganography, P2P Attacks Against Kazaa Hash
Slides Paper
Code (Confoo)
Code (Stripwire)

Black Ops of DNS:
Tunneling Audio, Video, and SSH over DNS
Slides Audio
Code (OzymanDNS 0.1)
Code (OzymanDNS 0.1 for Windows)
2003:

Stack Black Ops:
Generic ActiveX, SQL for Large Network Scans, Bandwidth Brokering, SSL for IDS’s
Slides Audio
Code (Paketto Keiretsu 2.00pre5)
2002:

Black Ops of TCP/IP:
High Speed Scanning, Parasitic Traceroute, TCP NAT2NAT
Slides Audio 1 Audio 2
Code (Paketto Keiretsu 1.01)
2001:

Gateway Cryptography:
SSH Dynamic Forwarding, Securing Meet-In-The-Middle, PPTP over SSH
Slides Audio
SSH Cheat Sheet

Dan Kaminsky's Blog