How The WPS Bug Came To Be, And How Ugly It Actually Is
FINDING: WPS was designed to be secure against a malicious access point. It isn’t: A malicious AP recovers the entire PIN after only two protocol exchanges. This server to client attack is substantially easier than the client to server attack, with the important caveat that the latter does not require a client to be “lured”.
SUMMARY: Stefan Viehböck’s WPS vuln was one of the bigger flaws found in all of 2011, and should be understood as an attempt to improve the usability of a security technology in the face of severe and ongoing deficiencies in our ability to manage key management (something we’re fixing with DNSSEC). WPS seems to have started as a mechanism to integrate UI-less consumer electronics into the Wi-Fi fold; it grew to be a primary mechanism for PCs to link up. WPS attempted to use an online “proof of possession” process to mutually authenticate and securely configure clients for APs, including the provisioning of WEP/WPA/WPA2 secrets. In what appears to have been a mechanism to prevent malicious “evil twin” APs from extracting PINs from rerouted clients, the already small PIN was split in half. Stefan noted that this made brute force efforts much easier, as the first half could be guessed independently of the second half. While rate limiting at the AP server is an obvious defense, it turns out to be the only defense, as both not continuing the protocol, and providing a fake message, are trivially detectable signs that the half-PIN was guessed incorrectly. Meanwhile, providing the legitimate response even in the face of an incorrect PIN guess actually leaks enough data for the attacker to brute force the PIN. I further note that, even for the purposes of mutual authentication, the split-PIN is problematic. Servers can immediately offline brute force the first half of the PIN from the M4 message, and after resetting the protocol, they can generate the appropriate messages to allow them to offline brute the second half of the PIN. To conclude, I discuss the vagaries of just how much needs to be patched, and thus how long we need to wait, before this vulnerability can actually be addressed server side. Finally, I note that we can directly trace the existence of this vulnerability to the unclear patent state of Secure Remote Passwords (SRP), which really is the canonical way to do Password Authenticated Key Exchange (PAKE).
Note: I really should be writing about DNSSEC right now. What’s interesting is, in a very weird way, I actually am.
Security is not usually a field with good news. Generally, all our discussions center around the latest compromise, the newest vulnerabilities, or even the old bugs that stubbornly refuse to disappear despite our “best practices”. I’m somewhat strange, in that I’m the ornery optimist that’s been convinced we can, must, and will (three different things) fix this broken and critical system we call the Internet.
That being said — fixing the Net isn’t going to be easy, and it’s important people understand why. And so, rather than jump right into DNSSEC, I’m going to write now about what it looks like when even an optimist gets backed into a corner.
Lets talk about what I think has to be considered one of the biggest finds of 2011, if not the biggest: Stefan Viehböck’s discovery of fundamental weaknesses in Wi-Fi Protected Setup, or WPS. Largest number of affected servers, largest number of affected clients, out of some of the most battle hardened engineers in all of technology. Also, hardest to patch bug in recent memory. Maybe ever.(Craig Heffner also found the bug, as did some uncountable number of unnamed parties over the years. None of us are likely to be the first to find a bug…we just have the choice to be the last. Craig did release the nicely polished Reaver for us though, which was greatly appreciated, and so I’m happy to cite his independent discovery.)
It wasn’t supposed to be this way. If any industry group had suffered the slings and arrows of the security community, it was the Wi-Fi Alliance. WEP was essentially a live-fire testing range for all the possible things that could go wrong when using RC4 as a stream cipher. It took a while, but we eventually ended up with the fairly reasonable WPA2 standard. Given either a password, or a certificate, WPA2 grants you a fairly solid encrypted link to a network.
That’s quite the given — for right there, staring us in the face once more, was the key management problem.
(Yes, the very same problem we ultimately require DNSSEC to solve. What follows is what happens when engineers don’t have the right tools to fix something, but try anyway. Please forgive me for not using the word passphrase; when I actually see such phrases in the wild we can talk.)
So it should be an obvious statement, but a system that is not usable, is not actually used. But wireless encryption in general, and WPA2 in particular, actually has been achieving a remarkable level of adoption. I’ve already written about the deployment numbers; suffice it to say, this is one of the brighter spots in real world cryptography. (The only things larger are the telnet->ssh switch, and maybe the widespread adoption of SSL by Top 100 sites in the wake of Firesheep and state level packet chicanery.)
So, if this WPA2 technology is actually being used, something had to be making it usable. That something, it turned out, was an entirely separate configuration layer: WPS, for WiFi Protected Setup. WPS was pretty clearly originally designed as a mechanism for consumer electronic devices to link to access points. This makes a lot of sense — there’s not much in the way of user interface on these devices, but there’s always room to paste a sticker. Not to mention that the alternative, Bluetooth, is fairly complex and heavy, particularly if all you want is just IP to a box.
However, a good quarter to half of all the access points exposing encrypted access today expose WPS not for random consumer devices, but for PCs themselves. After all, home users rather obviously didn’t have an Enterprise Certificate Authority around to issue them a certificate, and those passwords that WPA2 wanted are (seriously, legitimately, empirically) hard to remember.
Not to mention, how does the user deal with WPA2 passwords during “onboarding”, meaning they just pulled the device out of the box?
WPS cleaned all this up. In its most common deployment concept, “Label”, it envisioned a second password. This one would be unique to the device and stickered to the back of it. A special exchange, independent of WPA2 or WPA or even WEP, would execute. This exchange would not only authenticate the client to the server, but also the server to the client. By this mechanism, the designers of WPS would defend against so-called “evil twin” attacks, in which a malicious access point present during initial setup would pair with the client instead of the the real AP. The password would be simplified — a PIN, 8 digits, with 1 digit being a check so the user could be quickly informed they’d typed it wrong.
How could this possibly be safe? Well, one of the finer points of password security is that there is a difference between “online” and “offline” attackers. An online attacker has to interact with the defender on each password attempt — say, entering a password into a web form. Even if a server doesn’t lock you out after a certain number of failures (itself a mechanism of attack) only so many password attempts per second can be reasonably attempted. Such limits do not apply to the offline attacker, who perhaps is staring at a website’s backend database filled with hashes of many users’ passwords. This attacker can attempt as many passwords as he has computational resources to evaluate.
So there does exist a game, in which the defender forces the attacker to interact with him every time a password is attempted. This game can be made quite slow, to the point where as long as the password itself isn’t something obvious (read: chosen by the user), the attacker’s not getting in anytime soon. That was the path WPS was on, and this is what it looked like, as per the Microsoft document that Stefan Viehbock borrowed from in his WPS paper:
There’s a lot going on here, because there’s a lot of different attacks this protocol is trying to defend against. It wants to suppress replay, it wants to mutually authenticate, it wants to create an encrypted channel between the two parties, and it really wants to be invulnerable to an offline attack (remember, there’s only 10^7, or around 23 bits of entropy in the seven digit password). All of these desires combine to make a spec that hides its fatal, and more importantly, unfixable weakness.
Lets simplify. Assume for a moment that there is in fact a secure channel between the client and server. We’re also going to change the word “Registrar” to “Client” and “Enrollee” to “Server”. (Yes. This is backwards. I blame some EE somewhere.) The protocol now becomes something vaguely like:
1. Server -> Client: Hash(Secret1 || PIN) 2. Client -> Server: Hash(Secret2 || PIN), Secret2 3. Server -> Client: Secret1
The right mental image to have, is one of “lockboxes”: Neither party wants to give the other the PIN, so they give eachother hashes of the PIN combined with long secrets. Then they give eachother the long secrets. Now, even without sending the PINs, they can each combine the secrets they’ve received with the secret PINs they already know, see that the hashes computed are equal to the hashes sent, and thus know that the other has knowledge of the PIN. The transmission of Secrets “unlocks” the ability for the Hash to prove possession of the PIN.
Mutual authentication achieved, great job everyone.
Not quite. What Stefan figured out is that WPS doesn’t actually operate across the entire PIN directly; instead, it splits things up more like:
1. Server -> Client: Hash(Secret1 || first_half_of_PIN) 2. Client -> Server: Hash(Secret2 || first_half_of_PIN), Secret2 3. Server -> Client: Secret1
First_half_of_PIN is only four digits, ranging from 0000 to 9999. A client sending Hash(Secret2 |first_half_of_PIN) may actually have no idea what the real first half is. It might just be trying them all…but whenever they get it right, Server’s going to send Client Secret! It’s a PIN oracle! Whatever can the server do?
The obvious answer is that the server could rate limit the client. That was the recommended path, but most APs didn’t implement that. (Well, didn’t implement it intentionally. Apparently it’s really easy to knock over wireless access points by just flooding this WPS endpoint. It’s amazing the degree to which global Internet connectivity depends on code sold at $40/box.)
What’s not obvious is the degree to which we really have no other degrees of freedom but rate limiting. My initial assumption, upon hearing of this bug, was that we could “fake” messages — that it would be possible to leave an attacker blind to whether they’d guessed half the PIN.
Thanks to mutual authentication, no such freedom exists. Here’s the corner we’re in:
If we do as the protocol specifies — refuse to send Secret1, because the client’s brute forcing us — then the “client” can differentiate a correct guess of the first four digits from an incorrect guess. That is the basic vulnerability.
If we lie — send, say, Secret_fake — ah, now the client simply compares Hash(Secret1 || first_half_of_PIN) from the first message with this new Hash(Secret_fake || first_half_of_PIN) and it’s clear the response is a lie and the first_half_of_PIN was incorrectly guessed.
And finally, if we tell the truth — if we send the client the real Secret1, despite their incorrect guess of first_half_of_PIN — then look what he’s sitting on: Hash(secret1, first_half_of_PIN) and Secret1. Now he can run an offline attack against first_half_of_PIN.
It takes very little time for a computer to run through 10,000 possibilities.
So. Can’t shut up, can’t lie, and absolutely cannot tell the truth. Ouch.
Aside from rate limiting, can we fix the protocol? It’s surprisingly difficult. Lets remove a bit more simplification, and add second_half_of_PIN:
[M3] 1. Server -> Client: Hash(Secret1 || first_half_of_PIN), Hash(Secret3 || second_half_of_PIN) [M4] 2. Client -> Server: Hash(Secret2 || first_half_of_PIN), Hash(Secret4 || second_half_of_PIN), Secret2 [M5] 3. Server -> Client: Secret1 [M6] 4. Client -> Server: Secret4 [M7] 5. Server -> Client: Secret3
So, in this model, the server provides information about the entire PIN, but with Secret1 and Secret3 (really, E-S1 and E-S2) being 128 random bits, the PIN cannot be brute forced. Then the client provides the same mishmash of information about the PIN, now with Secret2 (R-S1) and Secret4 (R-S2). But immediately the client “unlocks” the first half of the PIN to the server, who replies by “unlocking” that same half. Only when the server “unlocks” the first half to the client, does the client “unlock” the second half of the PIN to the server and vice versa.
It’s always a bit tricky to divine what people where thinking when they made a protocol. Believe me, DNSSEC is way more useful than even its designers intended. But it’s pretty clear the designers were trying to make sure that a malicious server never got its hands on the full PIN from the client. That seems to be the thinking from the designer here — Secret4, unlocking second_half_of_PIN from the client to the server, is only sent if the server actually knew first_half_of_PIN back in that first (well, really M3) message.
For such a damaging flaw, we sure didn’t get much in return. Problem is, a malicious “evil twin” server is basically handed first_half_of_PIN right there in the second, M4 message. He’s got Hash(Secret2||first_half_of_PIN) and Secret2. That’s 10,000 computations away from first_half_of_PIN.
Technically, it’s too late! He can’t go back in time and fix M3 to now contain the correct first_half_of_PIN. But, in an error usually only seen among economists and freshman psychology students, there is an assumption that the protocol is “one-shot” — that the server can’t just tell the server that something went wrong, and that client should then start from scratch.
So, in the “evil twin” attack, the attacker resets the session after brute forcing the correct first_half_of_PIN. This allows him to send a correct Hash(Secret1||first_half_of_PIN) in the next iteration of M3, which causes the Secret1 in M5 to be accepted, which causes Secret4 in M6 to be sent to the server thus unlocking second_half_of_PIN.
Again, it’s possible I’m missing the “real reason” for splitting the PIN. But if this was it, not only did the split make it much easier for a malicious client to attack an AP server, but it also made it absolutely trivial for a malicious AP server to attack a client. There are rumors of this pattern elsewhere, and I’m now rather concerned.
(To be entirely honest, I was not intending to find another WPS attack. I was just trying to document Stefan’s work.) Ouch.
Fixing WPS is going to be a mess. There’s a pile of devices out there that simply expect to be able to execute the above protocol in order to discover their wireless configuration, including encryption keys. Even when a more advanced protocol becomes available, preventing the downgrade attack in which an attacker simply claims to be an unpatched client is brutal. Make no mistake, the patch treadmill developed over the last decade is impressive, and hundreds of millions of devices recieve updates on a regular basis.
There’s billions that don’t. Those billions include $40 access points, mobile phones, even large numbers of desktops and laptops. When do you get to turn off an insecure but critical function required for basic connectivity?
Good question. Hard, hard answer. Aside from rate limiting defenses, this flaw is going to be around, and enabled by default, for quite some time. years, at least.
To some extent, that’s the reality of security. It’s the long game, or none at all. Accepting this is a ray of hope. We can talk about problems that will take years to solve, because we know that’s the only scale in which change can occur. The Wi-Fi Alliance deserves credit for playing this game on the long scale. This is a lost battle, but at least this is a group of engineers that have been here before.
Now, what should have been built? Once again, we should have used SRP, and didn’t. SRP is unambigiously the right way to solve the PAKE (Password Authenticated Key Exchange) problem, quite a bit more even than my gratuitious (if entertaining) work with Phidelius which somewhat suffers from offline vulnerability. Our inability to migrate to SRP is usually ascribed to some unclear status relating to patents; that our legal system has been unable to declare SRP covered or not pretty much led to the obviously correct technology not being used.
Somebody should start lobbying or something.