A Comparison of Scalable Database Isolation Levels

It is very difficult to find accurate information about the correctness and isolation levels offered by modern distributed databases, and the operational conditions required to achieve them. Developers use different terms for the same thing, the meaning of terms varies or is ambiguous, and sometimes vendors themselves do not actually know.

At Fauna, we care a lot about accurately describing which guarantees different systems actually provide. This is our effort to centralize a description of which database does what, based on publicly available information (documentation, source code, third-party analyses, and developers' comments). For consistency’s sake, we will use the terminology from Kyle Kingsbury’s explanation on the Jepsen site . The chart is ranked by the maximum multi-partition isolation level offered.

The data is based on statements about isolation levels from vendor documentation, white papers, and developer commentary, exclusive of aspirational marketing statements. We have tried to be neutral in the characterization of the various systems' architectural properties. Whether the system implementations uphold these guarantees is addressed elsewhere . If you haven't already, please see FaunaDB's ownJepsen resultsfor confirmation that FaunaDB upholds its guarantees.

Before we BEGIN

In discussing transactional isolation, we frequently encounter the "worse is better" argument, which essentially goes:

This database does what it does
Implementing better isolation in the database is impossible or has unacceptable tradeoffs
Implementing better isolation in the application is simple and useful

This argument also goes by "it's not a bug, it's a feature".

The pretense of low maximum isolation levels, eventual consistency, or CRDTs is that application developers are ready and willing to work through every failure and recovery condition of their distributed dataflow. But in practice, moving beyond “works on my machine” verification of correctness requires an extraordinary level of investment that product teams simply will not do.

In my experience, the implications of different isolation levels in practice arevery subtle, and pushing the burden to the application developers—especially when there are a lot of distinct applications, like in a microservices architecture—is tremendously detrimental to productivity. And although tunable consistency increases flexibility, it cannot be used to paper over an isolation level that is fundamentally too weak to effectively compose .

After all, /dev/null is serializable, but not very useful as a database.

Distributed Databases

Distributed databases present a unified topology and do not require operator management of replication, although some, like the Percolator systems, do require management of special nodes.

Maximum isolation level

Default isolation level

Minimum isolation level

Consensus
architecture

Limitations

FaunaDB

Strict serializability

Strict serializability for transactions with writes.

Snapshot for read-only transactions.

Snapshot Calvinwith optimistic concurrency control
Writes must coordinate on local log leaders. Reads can be served from any replica.

Google Cloud Spanner

Strict serializability (called "external consistency") Strict serializability Snapshot (called “bounded staleness”) Spanner
Writes must coordinate on partition leaders which may be remote. Reads can be served from any replica. Sequential Sequential Snapshot Modified Percolator
All queries must coordinate on the timestamp oracle.
Lock nodes are distinct from data nodes. FoundationDB white papers also claim strict serializability, but sequentially consistent partitions cannot be made strict— only linearizable partitions can . Serializable Serializable Serializable Modified Spanner
All queries must coordinate on the partition leaders for their respective keys. Transactions with shared keys are mutually serializable, but transactions with disjoint keys can suffer “causal reversal”.
Isolation is violated under clock skew. Snapshot Snapshot Snapshot Spanner Isolation is violated under clock skew. Repeatable read Repeatable read Repeatable read Percolator
All queries must coordinate on the timestamp oracle.

DynamoDB

Single partition linearizability Read committed Read committed Paxos
Multi-partition two-phase commit offers limited serializability support. Multi-partition transactions limited to 10 primary keys with explicit read dependencies.
Indexes are not serializable.
Isolation is violated if there are non-transactional queries to the same keys or if global tables are used. Single partition linearizability Linearizable for single-region, snapshot for multi-region Read uncommitted Paxos Multi-partition transactions are not supported. Single partition linearizability Read uncommitted (aka “eventual consistency) Read uncommitted Single-decree Paxos Multi-partition transactions are not supported.
Isolation is violated if there are non-transactional queries to the same keys, or if global secondary indexes are used. Session causality Read uncommitted Read uncommitted Sharded, semi-synchronous replication with automated failover Multi-partition transactions are not supported.
Isolation is violated during partitions and shard leader election.

Replicated Databases

Replicated databases require operator management of primaries and secondaries and the associated replication links. Asynchronous replication can improve availability and scale read capacity, but does not offer any distributed consistency guarantees. Semi-synchronous replication further improves availability, but does not improve distributed isolation.

This is the traditional RDBMS scale-out model.

Maximum isolation level

Default isolation level

Minimum isolation level

Replication architecture

Limitations

Oracle

Snapshot Snapshot Read committed Asynchronous replication Oracle's SERIALIZABLE isolation is not serializable, but is actually snapshot isolation with write conflict detection. This allows write skew anomalies.

MySQL

Serializable , primary node only Repeatable read , primary node only Read uncommitted Semi-synchronous replication

PostgreSQL

Serializable , primary node only Read committed Read committed Semi-synchronous replication

Conclusion

A good way to think about isolation is in terms of the breadth of potential anomalies. The lower the isolation level, the more types of anomalies can occur, and the harder it is to reason about application behavior both at steady-state and under faults. At Fauna, we encourage you to think critically about whether your current databases really guarantee the level of transactional isolation you need.

References

If you enjoyed this topic and want to work on systems and challenges just like this, Fauna ishiring!

Before we BEGIN

Distributed Databases

Maximum isolation level

Default isolation level

Minimum isolation level

FaunaDB

Google Cloud Spanner

DynamoDB

Replicated Databases

Maximum isolation level

Default isolation level

Minimum isolation level

Replication architecture

Limitations

Oracle

MySQL

PostgreSQL

Conclusion

References

Recommend

Adobe提出新型超分辨率方法：用神经网络迁移参照图像纹理

澳大利亚议会网络攻击活动相关的恶意程序分析

PyTorch进阶之路（三）：使用logistic回归实现图像分类

ijkplayer框架简析 -- 从构造到 onPrepared

Redis 对象底层数据结构实现概述

R and Python: Using reticulate to get the best of both worlds

爬取5K分辨率超清唯美壁纸

大堆栈带来的高GC开销的问题

Ring-buffers in go without interface{}

使用 govendor 管理你的 go 项目包版本

About Joyk