Home > Security > The Little MAC Attack

The Little MAC Attack

THIS IS NOT A BREAK OF HMAC.  THIS IS NOT A BREAK OF HMAC.  THIS IS NOT A BREAK OF HMAC.

That being said:

Let bz=blocksize(h), k=a[0:bz]^(0x36*bz): If h(a)==h(b) and a[0:bz]==b[0:bz], then hmac(k, a[bz:])==hmac(k, b[bz:])

Given a hash function with a collision and a key either known or controlled by an attacker, it’s trivially possible to generate a HMAC collision.  The slightly less quick and dirty steps are:

1) Start with a file at least 64 bytes long
2) Generate a collision that can append to that file.
3) XOR the first 64 bytes of your file with 0x36’s.  Make that your HMAC key.
4) Concatenate the rest of your file with your colliding blocks.  Make those your HMAC messages.
5) HMAC(key, msg1+anything) will equal HMAC(key, msg2+anything)

Here’s the quick demo:

>>> f
‘ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff”Fresh Prince Of Bel-Air (Theme Song)”\n\nNow, this is a story all about how\nMy life got flipped-turned upside down\nAnd I\’d like to take a minute\nJust sit right there\nI\’ll tell you how I became the prince of a town called Bel Air\n\nIn west Philadelphia born a’
>>> key=f[0:64]
>>> key
‘ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff’
>>> trans_36 = bytearray((x ^ 0x36) for x in range(256))
>>> key=key.translate(trans_36)
>>> key
‘PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP’
>>> msg=f[64:]
>>> collision0==collision1 # calculated w/ md5coll, thanks Stach!
False
>>> hmac.new(key, msg+collision0).hexdigest()
‘917e8efe9a5c692e50b86e9581c509f3’
>>> hmac.new(key, msg+collision1).hexdigest()
‘917e8efe9a5c692e50b86e9581c509f3’
>>> hmac.new(key, msg+collision0+”can also append arbitrary content”).hexdigest()
‘b8a879d7cc7a89ed59d0768fdfec3351’
>>> hmac.new(key, msg+collision1+”can also append arbitrary content”).hexdigest()
‘b8a879d7cc7a89ed59d0768fdfec3351’

How cool is that???

We might have different definitions of cool.  Particularly since — to aggressively not bury the lede — there really shouldn’t be any security impact here.  HMAC depends on a secret.  Obviously if the attacker knows the secret it’s not a secret!  And in what universe would it be HMAC’s responsibility to provide collision resistance for its contained hash?  So this is in no way a break of HMAC (at least in any sane use of the construction, though sadly sanity is not in fact universal).

And yet.  This is novel — I believe Little MAC the first applied “attack” against HMAC, in any form.  And importantly, it’s simple and elegant, something that can be explained.  And it’s fun!  Remember when we did things for fun?  So, let’s talk about this Little MAC attack and how it works.

====

We’ll start with the basics.  Hash functions exist to take fingerprints of data, like so:

$ python
 Python 2.7.8 (default, Jul 25 2014, 14:04:36)
 [GCC 4.8.3] on cygwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> from md5 import md5
 >>> md5('1').hexdigest()  # hash of "1"
 'c4ca4238a0b923820dcc509a6f75849b'
 >>> md5('2').hexdigest() # hash of "2"
 'c81e728d9d4c2f636f067f89cc14862c'

The files can be of practically any size, like people can, but their fingerprints (or “hashes”) end up the same size (128 bits, or 32 hexadecimal characters for the MD5 hash).  It’s supposed to be unrealistic for anyone to make two files with the same hash.  This allows all sorts of useful security properties, like it being safe to retrieve a file over an insecure channel because you know the hash it’s supposed to end up with.

Supposed to.  Weasel words if there ever were.  In 2004, Xiaoyun Wang of Shandong University showed MD5 doing the thing it really wasn’t supposed to do (that, to be fair, Hans Dobbertin made clear was inevitable back in ’96):

>>> from md5 import md5
 >>> from binascii import unhexlify
 >>> a = """
 d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
 ... d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
 ... 55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
 ... d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
 ... e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70
 ... """
 >>> b = """
 ... d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
 ... 55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
 ... d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
 ... e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70
 ... """
 >>> a=unhexlify(a.replace(" ","").replace("\n","")) # translate from hex to binary
 >>> b=unhexlify(b.replace(" ","").replace("\n",""))
 >>> a==b
 False
 >>> md5(a).hexdigest()
 '79054025255fb1a26e4bc422aef54eb4'
 >>> md5(b).hexdigest()
 '79054025255fb1a26e4bc422aef54eb4'
 >>> md5(a).hexdigest()==md5(b).hexdigest()
 True

And that is how you get a standing ovation at Crypto 2004.  Over the next few years, various people (myself included) showed how to extend this result to compromise real world systems.  Probably the best work was by Stevens and Sotirov et al with their work actually compromising the processes of a real world Certificate Authority, acquiring extensive powers over the web using a very elegant MD5 attack.  And so, slowly but surely, industry has learned not to use MD5.

But what about HMAC-MD5?

===

HMAC — Hashed Message Authentication Codes — solve a slightly different problem than pure hashes.  Let’s say Bob is validating that some data matches some hash.  Why should he trust that hash?  Maybe it didn’t come from Alice.  Maybe it came from everybody’s favorite hacker, Mallory.  If only Alice and Bob had some secret they could use, to “mix in” with that hash. so that Alice had enough information to provide a “keyed hash” but Mallory did not.

There are lots of ways of doing this, many of which are entertainingly ill-advised.  One way that is not, is HMAC, generally considered the standard construction (method of putting cryptographic primitives together in a way that does something useful, hopefully securely) for keyed hashes.  The first thing to note about HMAC is, at least at first, it does not seem to suffer from the Wang collision:

>>> import hmac
 >>> hmac.new("",a).hexdigest()
 'ad0b4561611f06292377ea66c7867db5'
 >>> hmac.new("",b).hexdigest()
 'c5c472f9a5ecf134c93aa09169cb8756'

So, blank keys mixed with Wang’s ‘vectors’ do not collide.  This research comes out of some conversations with Marsh Ray, who commented that HMAC with known keys derives to MD5.  It’s not that easy :).  You have to jump through at least a few hoops, that require looking inside of HMAC itself.

So what is HMAC?

Basically, it’s double hashing with a key and some cleverness, optimized for speed (for example, the data isn’t hashed twice, which would be slow, instead the outer hash runs across the results of the inner hash.)  Since the attacker isn’t supposed to know the key, they can’t generate new keyed hashes that are valid.  (They can sure replay old keyed hashes, but you know, that’s some other layer’s problem.)

Mathematically, HMAC is hash(key XOR 5c5c5c… + hash(key XOR 363636… + msg), with the size of the key being the blocksize of the hash (specifically, how many bytes it operates on at a time, generally 64).  HMAC-MD5 means the hash is MD5, unsurprisingly.  HMAC represented in Python looks like:

trans_5C = bytearray((x ^ 0x5c) for x in range(256))
trans_36 = bytearray((x ^ 0x36) for x in range(256))
blocksize = md5().block_size # 64
def hmac_md5(key, msg):

   if len(key) > blocksize:
   key = md5(key).digest()
   key = key + bytearray(blocksize - len(key))
   o_key_pad = key.translate(trans_5C)
   i_key_pad = key.translate(trans_36)
   return md5(o_key_pad + md5(i_key_pad + msg).digest())

Now, I see this, and I go — oh!  There’s two hashes, and only one of them actually sees the full message!  If I can get the two inside hashes (two messages, two inside hashes) to return the same value — thus destroying any information about differences between the messages — who cares about the outer hash?

You know, nobody blogs about when they get something wrong.   That’s me, being totally wrong.

MD5 collisions actually depend on initial conditions.  Even if MD5(a)==MD5(b), MD5(x+a) != MD5(x+b).  (!= is nerd for Not Equal.)  And even with a blank HMAC key, x starts out 64 bytes of 0’s, XOR’ed with 36, leaving a first MD5 block of 36363636…

Not what Wang’s collision expected to deal with — it thinks it’s going to be at the beginning of MD5 processing, not one block in.  So I think, ah!  I know!  MD5 wants to get the Wang vectors unmodified.  It doesn’t care if the bytes come in via one blob (a) or two blobs (x+a) or whatever, it’s just a stream of bytes to MD5.  So let’s split those vectors into the “key” component and the “msg” component — 64 bytes, and whatever’s left.  Now, of course HMAC is going to XOR that first 64 bytes with 36’s, but you know, XOR is reversible.  We could just XOR them first, and then HMAC will undo the damage, like this:

>>> 1 ^ 36
 37
 >>> 2 ^ 36
 38
 >>> 3 ^ 36
 39
 >>> 37 ^ 36
 1
 >>> 38 ^ 36
 2
 >>> 39 ^ 36
 3

This should at least take care of the Inner hashing, and indeed, it does:

>>> akey=a[0:64] # split the first 64 bytes...
 >>> bkey=b[0:64]
 >>> amsg=a[64:] # from the rest...
 >>> bmsg=b[64:]
 >>> akey_xored=akey.translate(trans_36) # 'damage' the keys
 >>> bkey_xored=bkey.translate(trans_36)
 >>> md5(akey_xored.translate(trans_36) + amsg).hexdigest() # let HMAC reverse the damage
 '79054025255fb1a26e4bc422aef54eb4'
 >>> md5(bkey_xored.translate(trans_36) + bmsg).hexdigest()
 '79054025255fb1a26e4bc422aef54eb4'

Huzzah!  We’ve gotten collisions in the inner hash of HMAC-MD5, using nothing but decade old test vectors.  So, we’re done, right?

>>> hmac_md5(akey_xored, amsg).hexdigest()
 'd9a57219479c530d0ae6d1a699a8e8b0'
 >>> hmac_md5(bkey_xored, bmsg).hexdigest()
 '87be91e4dfe3538589b965a912872432'

It…doesn’t work.  HMAC knows we’re up to something.  Well, of course.  While HMAC doesn’t run over your data twice, it sure does run over your keys twice, with two different XORs even.  Remember, those keys come from the first 64 bytes of two files that are not identical.  So HMAC sees the different data in the inner hash (where it’s compensated for), and in the outer hash too (where it’s not — can’t XOR defend yourself against both 36 and 5C, and this isn’t an accident).

Now, could we simultaneously generate a collision dealing with both 36 and 5C XOR masks?

Maybe?  It’s possible, but we certainly have no idea how to do it right now.  Such attacks are known as Related Key attacks, and they’re pretty rare.  There’s no established research I’m aware of here, in the context of MD5.  So I guess HMAC wins?

Ha, no.  We just need to be a little more creative.

===

Funny crypto story.  When Xiaoyun Wang first announced her collisions to the world, they actually didn’t work, and she took a bit of heat.  “Oh”, she apologized.  “I misread a few of the numbers in the MD5 specification.  Here’s the correct collisions.”

That was a few hours later, thus demonstrating her attack could be executed in hours and not, say, years.

>>> Xiaoyun_Wang=="Baller"
 True
 >>> Xiaoyun_Wang_Wishes_She_Was_A_Little_Bit_Taller
 False

Let’s talk about exactly how MD5 actually works, and what it means to generate two files with the same hash.  MD5 uses what’s called a a Merkle-Damgard construction, meaning it:

1) Starts with some initial values, A, B, C, D
2) Takes 64 bytes from a message
3) Uses those 64 bytes to shuffle those values in various interesting ways
4) End up with a new A, B, C, D
5) Go to Step 2 until a) Message is complete and b) 8 extra bytes have been included describing how many bytes were hashed (“MD-Hardening”)
6) Return A, B, C, and D as a series of bytes representing the hash of the data

There are two interesting elements from this design.  First, every 64 bytes, there’s a new A, B, C, and D, and they’re supposed to represent whatever might have been different in what came before.  If there’s ever a collision — if bytes 0 through 127 collide with some other bytes 0 through 127 — anything tacked on will have the information about the difference destroyed.  So, you can have something like:

>>> md5(a).hexdigest()==md5(b).hexdigest()
 True
 >>> ping = "// " >>> md5(a+ping).hexdigest()==md5(b+ping).hexdigest() True

More interestingly, and more critically, collisions don’t have to start from the first block.  Wang’s corrected ones did.  They came from these “Initialization Vectors”, as extracted from this Pure Python implementation:

# Load magic initialization constants.
 self.A = 0x67452301L
 self.B = 0xefcdab89L
 self.C = 0x98badcfeL
 self.D = 0x10325476L

But as you remember, she had just as easy a time colliding against basically completely incorrect vectors too.  The colliders don’t really care, they just need to know what A, B, C, and D to start with.  Since those values are changed by whatever blocks you want, you can start your collision at any block you want.

So that’s the key to getting collisions in HMAC:  Yeah, it can detect differences in the first block.  So don’t collide in the first block.  Collide in the second, or wherever you want.  How do you do that?  Glad you asked!

First, you need some data to make a collision with.  I’m feeling fresh.

>>> f
‘ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff”Fresh Prince Of Bel-Air (Theme Song)”\n\nNow, this is a story all about how\nMy life got flipped-turned upside down\nAnd I\’d like to take a minute\nJust sit right there\nI\’ll tell you how I became the prince of a town called Bel Air\n\nIn west Philadelphia born a’

Second, we’ve slightly modified that Pure Python MD5 library to emit intermediate states, with this line:

print int(A), int(B), int(C), int(D)

Now, when we use puremd5, we see:

>>> import puremd5
 >>> puremd5.md5(f).hexdigest()
1245699185 1715851451 2228545190 3605688132
3587913948 1041697412 534724617 3149578532
2537882128 3367935924 570866200 4260972837
4196602073 370791185 3701485921 977618867
2646787607 1303232542 1363560139 1938082240
2027711712 2342103113 1309181583 4220729378
'e06cdc7849a8998b8f86084e223893fb'

Funny thing.  That’s a 320 byte file, with a 64 bit block size, but there’s six intermediate states.  What gives?  Well, again, MD5 needs another 8 bytes to express that it’s hashed 320 bytes.  So we get a sixth round, consisting exclusively of the contribution from this length encoding.

We’re going to skip that round, and take the intermediate values after our actual Fresh Prince lyrics — so, round five, where A == 2646787607.  What we want is two data sets that, when appended to the data that generates these intermediate values, collides.  To do this, we’re going to use Patrick Stach’s md5coll, like so:

$ ./md5coll.exe 2646787607 1303232542 1363560139 1938082240
block #1 done
block #2 done
unsigned int m0[32] = {
0x7f774ef0, 0xd082709d, 0x0c2e2dc4, 0xad45fdb8, 
0x60245a90, 0xa45e0fe4, 0x9709a7ba, 0xd9f19db7, 
0x0533ed55, 0xbff4a606, 0x7dcec2c3, 0x4ff04d6c, 
0x2e27c907, 0x66dac8f3, 0x8ab60422, 0xbff4808d, 
0xa31a0e9c, 0x9231c038, 0x0bf22863, 0x98847dd4, 
0x1678780b, 0x1cc891c5, 0x466a894a, 0x0076194a, 
0xa336e005, 0x87b55e66, 0xa4265f42, 0x560ee046, 
0x36636c97, 0x2a5fc139, 0x90b895d1, 0x38d03d76, 
};

unsigned int m1[32] = {
0x7f774ef0, 0xd082709d, 0x0c2e2dc4, 0xad45fdb8, 
0xe0245a90, 0xa45e0fe4, 0x9709a7ba, 0xd9f19db7, 
0x0533ed55, 0xbff4a606, 0x7dcec2c3, 0x4ff0cd6c, 
0x2e27c907, 0x66dac8f3, 0x0ab60422, 0xbff4808d, 
0xa31a0e9c, 0x9231c038, 0x0bf22863, 0x98847dd4, 
0x9678780b, 0x1cc891c5, 0x466a894a, 0x0076194a, 
0xa336e005, 0x87b55e66, 0xa4265f42, 0x560e6046, 
0x36636c97, 0x2a5fc139, 0x10b895d1, 0x38d03d76, 
};

In simple terms, if we put these bytes after our Fresh Prince lyrics, md5coll says the results will collide.  Let’s see this in md5 first:

>>> m0 = (0x7f774ef0, 0xd082709d, 0x0c2e2dc4, 0xad45fdb8,
... 0x60245a90, 0xa45e0fe4, 0x9709a7ba, 0xd9f19db7,
... 0x0533ed55, 0xbff4a606, 0x7dcec2c3, 0x4ff04d6c,
... 0x2e27c907, 0x66dac8f3, 0x8ab60422, 0xbff4808d,
... 0xa31a0e9c, 0x9231c038, 0x0bf22863, 0x98847dd4,
... 0x1678780b, 0x1cc891c5, 0x466a894a, 0x0076194a,
... 0xa336e005, 0x87b55e66, 0xa4265f42, 0x560ee046,
... 0x36636c97, 0x2a5fc139, 0x90b895d1, 0x38d03d76, )
>>> m1 = (0x7f774ef0, 0xd082709d, 0x0c2e2dc4, 0xad45fdb8,
... 0xe0245a90, 0xa45e0fe4, 0x9709a7ba, 0xd9f19db7,
... 0x0533ed55, 0xbff4a606, 0x7dcec2c3, 0x4ff0cd6c,
... 0x2e27c907, 0x66dac8f3, 0x0ab60422, 0xbff4808d,
... 0xa31a0e9c, 0x9231c038, 0x0bf22863, 0x98847dd4,
... 0x9678780b, 0x1cc891c5, 0x466a894a, 0x0076194a,
... 0xa336e005, 0x87b55e66, 0xa4265f42, 0x560e6046,
... 0x36636c97, 0x2a5fc139, 0x10b895d1, 0x38d03d76, )
>>> packed_m0 = struct.pack("<32I", *m0) >>> packed_m1 = struct.pack("<32I", *m1) >>> import md5
>>> md5.md5(f+packed_m0).hexdigest()
'ddebd8f9209fc7311687d7f9ff61a869'
>>> md5.md5(f+packed_m1).hexdigest()
'ddebd8f9209fc7311687d7f9ff61a869'

Does this work for HMAC?

>>> import hmac
>>> f0 = f+packed_m0
>>> f1 = f+packed_m1
>>> hmac.new(f0[0:64], f0[64:]).hexdigest()
'd372df68e47fd619089c2c795144e572'
>>> hmac.new(f1[0:64], f1[64:]).hexdigest()
'4f94497f947e424e378d30853bd6e421'

No, not yet.  Still need to compensate for the inside hash, and split our colliding message across a key (first 64 bytes) and a message (the rest).

>>> trans_36 = bytearray((x ^ 0x36) for x in range(256))
>>> f0key.translate(trans_36)
'PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP'
>>> f0key=f0key.translate(trans_36)
>>> f1key=f1key.translate(trans_36)
>>> f0key==f1key
True
>>> f0msg=f0[64:]
>>> f1msg=f0[64:]
>>> hmac.new(f0key, f0msg).hexdigest()
'917e8efe9a5c692e50b86e9581c509f3'
>>> hmac.new(f1key, f1msg).hexdigest()
'917e8efe9a5c692e50b86e9581c509f3'

Tadah!  Inside hash is mollified by XOR’ing with 36’s, and outside hash is happy because actually those two keys are identical.  The bytes that are different, yet still collide as per MD5, are only visible to the inside hash.  By the time the outside hash gets around to running, it’s already too late.  And of course we can in fact append arbitrary content, as such:

>>> hmac.new(f0key, f0msg+"hello world").hexdigest()
'7bad230645867d9802829db78289878c'
>>> hmac.new(f1key, f1msg+"hello world").hexdigest()
'7bad230645867d9802829db78289878c'

Presumably Steven’s HashClash / Chosen Prefix attacks will work just as well inside HMAC. And since this is a generic chosen key attack (there are no constraints on your key), it’s also a known key attack — choose the key you know but don’t control. Take the known key, XOR with 0x36’s, make that the prefix to your message, generate your collision. Simple.

Hope you enjoyed!  This almost certainly doesn’t have any security impact, but I’m happy(ish) to be proved wrong.

Categories: Security
  1. David Leon Gil
    May 9, 2015 at 7:13 pm

    References:

    “Another Look at Security Theorems for 1-Key Nested MACs.” https://eprint.iacr.org/2013/248.pdf

    “Another look at HMAC.” https://eprint.iacr.org/2012/074.pdf

    • May 10, 2015 at 12:45 am

      Interesting papers, and they’re getting there, but they didn’t get to the O(1) conversion of hash collision to HMAC collision. Worth referencing though, certainly!

  2. pwnhome
    May 11, 2015 at 11:51 am

    This would seem to have implications when HMAC-MD5 is used as a commitment scheme, no?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: