Why Blockchains Don’t Suck, and the Perils of Distributed Databases
Even casual observers of the cryptocurrency space can see that the massive hype cycle in 2017 has subsided quite a bit and we’re in a down market this year. This bearish sentiment isn’t isolated to just cryptocurrencies. It’s extended to the underlying blockchain technology itself. People are realizing it’s not the technological panacea it was purported to be. It’s not going to take down governments overnight. It’s not going to swiftly enable robot proxies of ourselves to duke it out with our robot adversaries. Turns out, we can’t even effectively trade artwork using the blockchain yet.
So naturally, as humans tend to do, we’ve swung the pendulum way too far the other way. Many crypto-influencers are castigating the blockchain every chance they get. The same people who touted its merits and declared anyone who didn’t jump on the bandwagon would be left in the dust are now hurling pejoratives at almost every new blockchain project they come across.
The flavor of the week in blockchain criticism basically sounds like this:
Blockchains suck. They’re just slow distributed databases. Distributed databases have been around for decades. Just use X instead.
Where X is one of PostgreSQL, MySQL, MongoDB or any other widely used SQL or NoSQL database. If one of these databases has partitioning features (i.e. distribution or decentralization) then the speaker feels even more confident in slamming blockchains.
We get why the critics are doing this. In some ways they have a point. It’s also an effective way to sound technically qualified and credible when speaking to an audience of non-developers. Even when speaking to developers, a simple statement like this seems smart and authoritative. Here is an example:
Miko is a pretty well known figure in the crypto space. We’re not sure how much software development he does these days but we’ll assume he’s a fairly technical guy. We also understand he only has so many characters on Twitter to make a point. But we’ll explain why his last statement is a gross oversimplification. At the very least, even if he’s right for your use case, we’ll present some other important considerations when comparing databases (both distributed and centralized) to blockchains.
Let’s first address the obvious advantages of centralized, traditional databases. Vitalik Buterin gave a nice talk on how much faster and cheaper writes and storage are when you don’t have to decentralized things. Here are the highlights:
- Amazon EC2 is $0.04 per hour. The Ethereum world computer is $13.4 per 200 ms. Ethereum is over 1,000,000x more expensive.
- A 250GB solid state drive is $79.99. Storage on the Ethereum world computer is $84,000,000 for 250GB. Again, over 1,000,000x more expensive.
Blockchains are slow and more expensive. So what do they do well?
- Persistence and robustness. Data is distributed across nodes so if a single node fails, the network stays alive.
- Fraud and censorship resistance. Because chained hashes and consensus results can be readily verified, a malicious party can’t easily mutate or obfuscate data without a network majority.
This is where people who haven’t actually done much work with distributed databases make their mistakes when they say “blockchains are stupid”. The most common things you hear are:
- Blockchains are stupid because you can already get decentralization with replication and sharding in MongoDB clusters or Citus Data in Postgres.
- Blockchains are stupid because chained hashes are nothing new. Git uses them. Proof of Work isn’t new either. It’s simply a rate limiting algorithm that’s been around forever. Just apply both to a distributed database.
In theory this sounds great. If distributed databases could do distribution as well as blockchains do, then yes, blockchains would probably be obviated.
However, what no one talks about is that distributed databases are really hard to get right. Very few companies have done it well. In the next couple sections, we’ll explain why.
Distributed Databases Are Not Simple. So Stop Oversimplifying Them.
Hey I just met you
The network’s laggy
But here’s my data
So store it maybe
Kyle Kingsbury, Carly Rae Jepsen and the Perils of Network Partitions (2013)
Working with distributed systems is fundamentally different from working with single computers, and the main difference is in the number of things that go wrong. We didn’t say potentially go wrong on purpose, because as every experienced system administrator will tell you, all things that have the potential to go wrong, will go wrong.
Distributed databases, by definition, mean that we are operating on multiple nodes. The operational correctness of our system has to deal with the messy reality of the physical world, like long lived network partitions, switch failures, power outages, HVAC, etc. As a thought exercise, let’s consider deploying a geographically distributed database over the internet, where network delays and connection timeouts are common. The internet and most internal datacenter networks (often Ethernet) are asynchronous packet networks. On asynchronous networks, a node may send a message to another node, but the network gives no guarantees as to when it will arrive, or whether it will arrive at all.
- the message might be lost
- the message might wait in a queue
- the remote node might crash
- the remote node might experience a temporary pause
- the remote node might receive the message but the network might lose the response
- the remote node might just take a while to process and respond to the message
Essentially, when we don’t receive a response to a message, it’s very hard to tell why.
So what happens if the nodes of our distributed database lose messages? How bad is this? The answer depends on the use case of our application. Are we guaranteeing 100% consistency of our data and full auditability to our end users, or is it acceptable that some of the data is lost? Many of the distributed stores touted as “highly scalable” do not offer guarantees of data consistency. Some modes have no guarantees and some are eventually consistent. With concurrent writes, usually one of the values wins and intermediate querying of the data by third parties produces one of the values, but not all. Similarly, if multiple values are modified in a transaction, the observer can sometimes see a random mix of the two separate sessions. Some systems are even designed to reject (or simply ignore) values once the write throughput goes above a certain limit (e.g. stream processing with AWS Kinesis). When building analytics systems, it’s fine to lose a couple data points out of millions, but when building systems that offer strong guarantees around data consistency, linearizability, ordering and causality, we have to guarantee the fidelity of every single data point, even if it’s changed concurrently by multiple users.
We’ll focus our argument that distributed databases aren’t magic bullets on applications that require strong consistency guarantees. Bitcoin in particular is the quintessential application where consistency guarantees are crucial, e.g. losing track of a single Bitcoin transaction is a really big deal. The same consistency guarantees are critical for other cryptocurrencies.
Consistency Guarantees and Consensus
Given the fact that communication between distributed database nodes can be unreliable, intermittent, or perhaps non-existent, our application using a distributed database (or our system administrator) has to be able to deal with data conflicts. Being able to deal with conflicts is the fundamental difference between standard distributed databases and blockchains. Conflict resolution, transaction verification and transaction auditability, i.e. consensus, is a core function of blockchains and has been built into them at a base level.
So what kind of databases are out there and why are they worse at achieving consensus?
The first class of data stores that come to mind are relational (SQL) databases. They do support transactions and they claim atomicity of transactions, which one would think would make them good choices for implementing applications with strong consistency guarantees. A popular comment related to SQL databases is, “use an ACID database if you’re handling financial data”.
But that really misses a large chunk of what we want out of consistency guarantees. Many popular relational database systems (which are usually considered “ACID”) use weak isolation, and can cause all sorts of inconsistencies with concurrent writes. For example, if we submit multiple parallel value decrement transactions (e.g. on financial account balances), with weak isolation levels, we can easily end up with a negative balance. And these weak isolation levels (usually the default setting) are what makes relational databases scale well.
The fully consistent alternative is serializable isolation (i.e. one transaction at a time) and perhaps SSI (serializable snapshot isolation) which was implemented for the first time in PostgreSQL 9.1 and is used in FoundationDB. Another alternative is optimistic locking (typically implemented at the application level) which scales well, but only if you don’t expect frequent conflicts. But how realistic is it to not expect frequent conflicts? In any retail store on Black Friday, how often does the merchant’s balance have to change?
Let’s now look at what happens with distributed databases if one or more of their nodes fail.
First, let’s cover some of the standard ways traditional SQL databases handle distribution. Given that we want to have a truly distributed database we assume that the nodes are in different locations and don’t use specialized storage or networks (like shared discs, e.g. NAS). File system replication (e.g. DRDB) can be used to implement fault tolerance, but any network partition will break the replication link immediately, very quickly producing “split brain” situations, which require manual file systems constantly diff-ing to resolve conflicts. This leaves us with the standard master-slave/standby models in traditional SQL databases. In this model, we have one node to which we write (i.e. the master) and binary log replication keeps other instances up-to-date. In the event of a master failure (stepdown) one of the replicas needs to take over. So far, so good, but how does this failover to the newly elected master happen? Is it manual (performed by humans) or automatic?
If the failover is done manually by a human (which is the safest in these kinds of setups) then we’ll obviously have downtime. Someone has to be paged and woken up. They will have to assess the situation and decide to promote a specific replica to a master.
However, If the failover is automatic then there’s a big chance things will go horribly wrong. Say some automated logic uses a timeout to verify that the current master is dead. What if the timeout is because of a network partition between the automated logic and the master? What if the master wakes up right after the timeout and it was just running a very taxing query that took all of its CPU resources and it needed some time to accept new connections? We end up with an inconsistent database very easily. Some writes can end up on one server and not another. This is exactly what happened to GitHub in 2012.
Okay, what about the multi-master (leader) replication model for SQL (e.g. Tungsten Replicator for MySQL, BDR for PostgreSQL, GoldenGate for Oracle)? While these systems can achieve cross datacenter replication (i.e. a truly distributed model) the same data may be concurrently modified in multiple datacenters and the application using the database must be able to resolve the conflicts. Alright, no big deal, we’ll resolve the conflict with the most recent value, except wait, even if we do store timestamps with our data, how much can we trust our system clock? Yes, we have NTP to synchronize the clock, but at any given moment the machine clock can be off by tens of seconds or even minutes. This is why Google uses atomic clocks in their datacenters for their Spanner database.
What about NoSQL distributed databases, for example MongoDB? It supports sharding and implements a consensus protocol for leader election. It sounds like it would work, right? Not so fast. They use something called “eventual consistency”, which is exactly what it sounds like. It’s a whole ‘nother can of worms. In “eventual consistency”, writes can be acknowledged but may actually never be executed. This can happen if the writing node goes down before execution. They also have something called “linearizable reads”, where a read request is only fulfilled after MongoDB checks to see the majority of nodes are returning the same value. This at least ensures all the nodes are getting back the same value, although at a performance cost.
The path to a consistent distributed database has been a rocky one for MongoDB and it could be argued they’ve done the most work out of anyone. It appears that the 3.2 and 3.4 MongoDB versions have finally implemented a working consensus protocol, however, there’s still one problem — they don’t allow for multi-document transactions (this might be different with MongoDB 4.0). What this means is that if we want to execute a simple withdraw and deposit to an account in one go, we are out of luck.
So we’ve ragged on distributed databases enough. What’s the path forward?
Light at the end of the tunnel
The good news for distributed databases is that there are some newer solutions that are very good at distributed availability and consistency. AWS DynamoDB, Google Spanner, Azure CosmosDB are a few examples.
Before we proclaim these solutions are blockchain killers, guess what DynamoDB uses for replica synchronization to figure out the correct version of an object? Merkle trees! Exactly what we see in blockchains!
Going back to the beginning of this post, while there are consistent, durable, linearizable, distributed databases that do transactions well, you can’t “just use Oracle or PostgreSQL”. Designing, running and maintaining systems like these requires the engineering talents of Google, Amazon and Microsoft, and even for them it’s not easy. People often criticize blockchains as being expensive to run and operate, but don’t consider the cost of running one of these aforementioned alternatives. Not all of these services are free. Next, calculate the actual and mental overhead of having to keep a change log of all previous writes for audit purposes. And finally, what are the costs of improperly dealing with a malicious node (which these distributed options don’t address)? You can see how complicated this gets and why you can’t just do a 1:1 comparison between blockchains and traditional databases.
How are blockchains better?
Blockchains were built for transaction consistency in a datastore, allowing for auditability, atomicity, linearizability, durability, and most importantly, data integrity. From a data systems standpoint, blockchain technologies offer some very interesting ideas. They serve a similar purpose to distributed databases using data models and transaction mechanisms, where different replicas can be hosted by mutually untrusting organizations. The replicas continually check each other’s integrity and use a consensus protocol to agree on the transactions that should be executed. Blockchains use cryptographic auditing that rely on Merkle trees.
To achieve consistency, blockchains follow the order-execute architecture. This means that the blockchain network orders transactions first, using a consensus protocol, and then executes them in the same order on all peers sequentially. This approach is what typically causes the performance issues with many blockchains. The execution is serial (much like the only fully consistent relational database mode — serializable transactions) and the consensus protocol (e.g. Proof-of-Work, used in many popular blockchains) can be slow and computationally wasteful.
There are alternatives to the order-execute model. For example Hyperledger’s execute-order-validate can achieve much higher transaction throughput and allows for consensus protocol substitutions with less computationally intensive algorithms, such as Proof-of-Stake and Byzantine Fault Tolerant ordering. Setting up Hyperledger involves connecting multiple-nodes with services like Kafka and Zookeeper, which are battle tested components for ensuring strong event ordering and node consensus. Hyperledger’s performance does degrade with adding more geographically distributed nodes, but a lot of that can be countered with the optional node gossip protocol. Similarly, Corda is moving towards parallel processing through heavy use of inter-node gossip protocols to achieve higher transaction throughput. It can also achieve higher throughput by replacing the consensus protocol with RAFT to significantly reduce the computing power necessary for node consensus (RAFT is a very popular consensus algorithm, notably implemented by etcd).
The aim of this post was to present a counterpoint to the mid-2018 trend of slamming blockchain technologies. It’s really easy for a crypto-influencer to sound smart by saying “you don’t need a blockchain. Just use a distributed database instead”.
We believe people like Miko Matsumara are well intentioned. They have been in this space for a long time and are understandably fatigued by every layman and his dog starting a new cryptocurrency and asking for funding. However, we need to be cautious when stating that everyone should just use PostgreSQL. There are legitimate places to use the blockchain, particularly when you have many trustless parties, as Miko pointed out himself.
The key is to understand the relative strengths of blockchains, centralized databases and distributed databases and apply them to your specific use case.
Don’t just use a blockchain because it’s the cool thing to do. But more importantly, don’t just use a distributed database because it’s the cooler thing to do.
Don’t forget to check out our other technical blog posts: