Spelunking the Triangle: Exploring Aaron Swartz’s Take On Zooko’s Triangle
[Heh! This blog post segues into a debate in the comments!]
Short Version: Zooko’s Triangle describes traits of a desirable naming system — and constrains us to two of secure, decentralized, or human readable. Aaron Swartz proposes a system intended to meet all three. While his distribution model has fundamental issues with distributed locking and cache coherency, and his use of cryptographic hashcash does not cause a significant asymmetric load for attackers vs. defenders, his use of an append-only log does have interesting properties. Effectively, this system reduces to the SSH security model in which “leaps of faith” are taken when new records are seen — but, unlike SSH, anyone can join the cloud to potentially be the first to provide malicious data, and there’s no way to recover from receiving bad (or retaining stale) key material. I find that this proposal doesn’t represent a break of Zooko’s Triangle, but it does show the construct to be more interesting than expected. Specifically, it appears there’s some room for “float” between the sides — a little less decentralization, a little more security.
Zooko’s Triangle is fairly beautiful — it, rather cleanly, describes that a naming system can offer:
- Secure, Decentralized, Non-Human-Readable names like Z0z9dTWfhtGbQ4RoZ08e62lfUA5Db6Vk3Po3pP9Z8tM.twitter.com
- Secure, Centralized, Human-Readable Names like http://www.twitter.com, as declared by trusted third parties (delegations from the DNS root)
- Insecure, Decentralized, Human-Readable Names like http://www.twitter.com, as declared by untrusted third parties (consensus via a P2P cloud)
There’s quite the desire to “square Zooko’s triangle” — to achieve Secure, Decentralized, Human-Readable Names. This is driven by the desire to avoid the vulnerability point that centralization exposes, while neither making obviously impossible demands on human memory nor succumbing to the rampantly manipulatable opinion of a P2P mob.
Aaron Swartz and I have been having a very friendly disagreement about whether this is possible. I promised him that if he wrote up a scheme, I’d evaluate it. So here we are. As always, here’s the source material:
The basic idea is that there’s a list of name/key mappings, almost like a hosts file, but it links human readable names to arbitrary length keys instead of IP addresses. This list, referred to as a scroll, can only be appended to, never deleted from. Appending to the list requires a cryptographic challenge to be completed — not only do you add (name,key), you also add a nonce that causes the resulting hash from the beginning to be zero.
Aaron grants that there might be issues with initially trusting the network. But his sense is, as long as either your first node is trustworthy, or the majority of the nodes you find out about are trustworthy, you’re OK.
It’s a little more complicated than that.
There are many perspectives from which to analyze this system, but the most interesting one I think is the one where we watch it fail — not under attack, but under everyday use.
How does this system absorb changes?
While Aaron’s proposal doesn’t allow existing names to alter their keys — a fatal limitation by some standards, but one we’ll ignore for the moment — it does allow name/key pairs to be added. It also requires that additions happen sequentially. Suppose we have a scroll with 131 name/key pairs, and both Alice and Bob try to add a name without knowing about the other’s attempt. Alice will compute and distribute 131,Alice, and Bob will compute and distribute 131,Bob.
What now? Which scroll should people believe? Alice’s? Bob’s? Both?
Should they update both? How will this work over time?
Aaron says the following:
What happens if two people create a new line at the same time? The debate should be resolved by the creation of the next new line — whichever line is previous in its scroll is the one to trust.
This doesn’t mean anything, alas. The predecessor to both Alice and Bob’s new name is 131. The scroll has forked. Either:
- Some people see Alice, but not Bob
- Some people see Bob, but not Alice
- Somehow, Alice and Bob’s changes are merged into one scroll
- Somehow, the scroll containing Alice and the scroll containing Bob are both distributed
- Both new scrolls are rejected and the network keeps living in a 131 record universe
Those are the choices. We in security did not invent them.
There is a belief in security sometimes that our problems have no analogue in normal computer science. But this right here is a distributed locking problem — many nodes would like to write to the shared dataset, but if multiple nodes touch the same dataset, the entire structure becomes corrupt. The easiest fix of course is to put in a centralized locking agent, one node everyone can “phone home to” to make sure they’re the only node in the world updating the structure’s version.
But then Zooko starts jumping up and down.
(Zooko also starts jumping up and down if the output of the system doesn’t actually work. An insecure system at least needs to be attacked, in order to fail.)
Not that security doesn’t change the game. To paraphrase BASF, “Security didn’t make the fundamental problems in Comp Sci. Security makes the fundamental problems in Comp Sci worse.” Distributed locking was already a mess when we trusted all the nodes. Add a few distrusted nodes, and things immediately become an order of magnitude more difficult. Allow a single node to create an infinite number of false identities — in what is known as a Sybil attack — and any semblence of security is gone.
[Side note: Never heard of a Sybil attack? Here's one of the better stories of one, from a truly devious game called Eve Online.]
There is nothing in Aaron’s proposal that prevents the generation of an infinite number of identities. Arguably, the whole concept of adding a line to the scroll is equivalent to adding an identity, so a constraint here is probably a breaking constraint in and of itself. So when Aaron writes:
To join a network, you need a list of nodes where at least a majority are actually nodes in the network. This doesn’t seem like an overly strenuous requirement.
Without a constraint on creating new nodes, this is in fact a serious hurdle. And keep in mind, “nodes in the network” is relative — I might be connecting to majority legitimate nodes, but are they?
We shouldn’t expect this to be easy. Managing fully distributed locking is already hard. Doing it when all writes to the system must be ordered is already probably impossible. Throw in attackers that can multiply ad infinatum and can dynamically alter their activities to interfere with attempts to insert with specific ordered writes?
Before we discuss the crypto layer, let me give you an example of dynamic alterations. Suppose we organize our nodes into a circle, and have each node pass write access to the next node as listed in their scroll. (New nodes would have to be “pulled into” the cloud.) Would that work?
No, because the first malicious node that got in, could “surround” each real node with an untrusted Sybil. At that point, he’d be consulted after each new name was submitted. Any name he wanted to suppress — well, he’d simply delete that name/key pair, calculate his own, and insert it. By the time the token went around, the name would be long since destroyed.
(Not that a token approach could work, for the simple reality that someone will just lose the token!)
Thus far, we’ve only discussed propagation, and not cryptography. This should seem confusing — after all, the “cool part” of this system is the fact that every new entry added to the scroll needs to pass a computational crypto challenge first. Doesn’t that obviate some of these attacks?
The problem is that cryptography is only useful in cases of significant asymmetric workload between the legitimate user and the malicious attacker. When I have an AES256 symmetric key or an RSA2048 private key, my advantage over an attacker (theoretically) exceeds the computational capacity of the universe.
What about a hash function? No key, just the legitimate user spinning some cycles to come to a hash value of 0. Does the legitimate user have an advantage over the attacker?
Of course not. The attacker can spin the same cycles. Honestly, between GPUs and botnets, the attacker can spin far more cycles, by several orders of magnitude. So, even if you spend an hour burning your CPU to add a name, the bad guy really is in a position to spend but a few seconds cloning your work to hijack the name via the surround attack discussed earlier.
(There are constructions in which an attacker can’t quite distribute the load of cracking the hash — for example, you could encode the number of iterations a hash needed to be run, to reach N zero bits. But then, the one asymmetry that is there — the fact that the zero bits can be quickly verified despite the cost of finding them — would get wiped out, and validation of the scroll would take as long as calculation of the scroll.)
But what about old names? Are they not at least protected? Certainly not by the crypto. Suppose we have a scroll with 10,000 names, and we want to alter name 5,123. The workload to do this is not even 10,000 * time_per_name, because we (necessarily) can reuse the nonces that got us through to name 5,122.
Aaron thinks this isn’t the case:
First, you need to need to calculate a new nonce for the line you want to steal and every subsequent line….It requires having some large multiple of the rest of the network’s combined CPU power.
But adding names does not leverage the combined CPU power of the entire network — it leverages the power of a single node. When the single node wanted to create name 5,123, it didn’t need to burn cycles hashcashing through 0-5122. It just did the math for the next name.
Now, yes, names after 5,123 need to be cracked by the attacker, when they didn’t have to be cracked by the defender. But the attackers have way more single nodes than the defenders do. And with those nodes, the attacker can bash through each single name far, far quicker.
There is a constraint — to calculate the fixed nonce/zerohash for name 6001, name 6000 needs to have completed. So we can’t completely parallelize.
But of course, it’s no work at all to strip content from the scroll, since we can always remove content and get back to 0-hash.
Ah! But there’s a defense against that:
“Each remembers the last scroll it trusted.”
And that’s where things get really interesting.
Complexity is a funny thing — it sneaks up on you. Without this scroll storage requirement, the only difference between nodes in this P2P network are what nodes it happens to stumble into when first connecting. Now, we have to be concerned — which nodes has a node ever connected to? How long has it been since a node has been online? Are any significant names involved between the old scroll and the modern one?
Centralized systems don’t have such complexities, because there’s a ground truth that can be very quickly compared against. You can be a hundred versions out of date, but armed with nothing but a canonical hash you can clear your way past a thousand pretenders to the one node giving out a fresh update.
BitTorrent hangs onto centralization — even in small doses — for a reason. Zooko is a harsh mistress.
Storage creates a very, very slight asymmetry, but one that’s often “all we’ve got”. Essentially, the idea is that the first time a connection is made, it’s a leap of faith. But every time after that, the defender can simply consult his hard drive, while the attacker needs to…what? Match the key material? Corrupt the hard drive? These are in fact much harder actions than the defender’s check of his trusted store.
This is the exact system used by SSH, in lieu of being able to check a distributed trusted store. But while SSH struggles with constant prompting after, inevitably, key material is lost and a name/key mapping must change, Aaron’s proposal simply defines changing a name/key tuple as something that Can’t Happen.
Is that actually fair? There’s an argument that a system that can’t change name/key mappings in the presence of lost keys fails on feature completeness. But we do see systems that have this precise property: PGP Keyservers. You can probably find the hash of the PGP key I had in college somewhere out there. Worse, while I could probably beg and plead my PGP keys offline, nobody could ever wipe keys from these scrolls, as it would break all the zero-hashes that depend on them.
I couldn’t even put a new key in, because if scroll parsers didn’t only return the first name/key pair, then a trivial attack would simply be to write a new mapping and have it added to the set of records returned.
Assume all that. What are the ultimate traits of the system?
The first time a name/key peer is seen, it is retrieved insecurely. From that point forward, no new leap of faith is required or allowed — the trust from before must be leveraged repeatedly. The crypto adds nothing, and the decentralization does little but increase the number of attackers who can submit faulty data for the leap of faith, and fail under the sort of conditions distributed locking systems fail. But the storage does prevent infection after the initial trust upgrade, at the cost of basic flexibility.
I could probably get away with saying — a system that reduces to something this and undeployable, and frankly unstable, can not satisfy Zooko’s Triangle. After all, simply citing the initial trust upgrade, as we upgrade scroll versions from a P2P cloud on the false assumption that an attacker couldn’t maliciously synthesize new records unrecoverably, would be enough.
It’s fair, though, to discuss a few annoying points. Most importantly, aren’t we always upgrading trust? Leaps of faith are taken when hardware is acquired, when software is installed, even when the DNSSEC root key is leveraged. By what basis can I say those leaps are acceptable, but a per-name/key mapping leap represents an unacceptable upgrade?
That’s not a rhetorical question. The best answer I can provide is that the former leaps are predictable, and auditable at manufacture, while per-name leaps are “runtime” and thus unpredictable. But audits are never perfect and I’m a member of a community that constantly finds problems after release.
I am left with the fundamental sense that manufacturer-based leaps of faith are at least scalable, while operational leaps of faith collapse quickly and predictably to accepting bad data.
More interestingly, I am left with the unescapable conclusion that the core security model of Swartz’s proposal is roughly isomorphic to SSH. That’s fine, but it means that if I’m saying Swartz’s idea doesn’t satisfy Zooko due to security, then neither does SSH.
Of course, it shouldn’t. We all know that in the real world, the response by an administrator to encountering a bad key, is to assume the box was paved for some reason and thus to edit known_hosts and move on. That’s obviously insecure behavior, but to anyone who’s administered more than a box or two, that’s life. We’re attempting to fix that, of course, by letting the guy who paves the box update the key via DNSSEC SSHFP records, shifting towards security and away from decentralization. You don’t get to change client admin behavior — or remove client key overrides — without a clear and easy path for the server administrator to keep things functional.
So, yeah. I’m saying SSH isn’t as secure as it will someday be. A day without known_hosts will be a good day indeed.
I’ve had code in OpenSSH for the last decade or so; I hope I’m allowed to say that :) And if you write something that says “Kaminsky says SSH is Insecure” I will be very unhappy indeed. SSH is doing what it can with the resources it has.
The most interesting conclusion is then that there must be at least some float in Zooko’s Triangle. After all, OpenSSH is moving from just known_hosts to integrating a homegrown PKI/CA system, shifting away from decentralization and towards security. Inconvenient, but it’s what the data seems to say.
Looking forward to what Zooko and Aaron have to say.