YugaByte DB

The YugaByte Database Blog

Thoughts on open source, cloud native and distributed databases

A Primer on ACID Transactions: The Basics Every Cloud App Developer Must Know

ACID transactions were a big deal when first introduced formally in the 1980s in monolithic SQL databases such as Oracle and IBM DB2. Popular distributed NoSQL databases of the past decade including Amazon DynamoDB and Apache Cassandra initially focused on “big data” use cases that did not require such guarantees and hence avoided implementing them altogether. However, ACID transactions are making a strong comeback in the last 2 years with the launch of next-generation distributed databases that have built-in support for them.

This post serves as a primer on ACID transactions for app developers building distributed apps in the cloud. It highlights:

  • Why ACID transactions remain a fundamental need for cloud apps?
  • What’s necessary to implement them in databases?
  • Why monolithic databases have them but 1st generation distributed databases don’t?

Defining a Transaction

A transaction symbolizes a unit of work performed within a database. It is often composed of multiple operations.

The example below shows a transaction at a bank with 4 operations that transfer $100 from Alice’s account at one branch to Bob’s account at another branch.

Source: Get To Know PostgreSQL

Defining ACID

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of system crashes, power failures and other errors.

Atomic

Guarantees that all operations in a transaction are treated as a single “unit”, which either succeeds completely or fails completely.

Consistent

Ensures that a transaction can only bring the database from one valid state to another by preventing data corruption.

Isolation

Determines how and when changes made by one transaction become visible to the other. Serializable and Snapshot Isolation are the top 2 isolation levels from a strictness standpoint.

Durable

Ensures that the results of the transaction are permanently stored in the system. The modifications must persist even in case of power loss or system failures.

ACID’s Consistency vs. CAP’s Consistency

CAP Theorem, first published in 2000, made it easier for engineers to reason about distributed systems.

Distributed systems must choose between Consistency and 100% Availability in the presence of network Partitions.

Unfortunately, the use of “Consistency” in both ACID and CAP led to confused developers. Consistency in CAP is a more fundamental concept — it refers to the guarantee that all members of a distributed system have a shared understanding of the value of a single data element from a read standpoint. This guarantee is also referred to as strong consistency or linearizability. In fact, CAP’s consistency is better compared against ACID’s isolation levels since both deal with the values that read operations are allowed to see.

On the other hand, ACID’s consistency refers to data integrity guarantees that ensure the transition of the entire database from one valid state to another. Such a transition involves strict enforcement of integrity constraints such as data type adherence, null checks, relationships and more. Given that a single ACID transaction can touch multiple data elements where as CAP’s consistency refers to a single data element, ACID transactions are a stronger guarantee than CAP’s consistency.

The Benefits of ACID Transactions

1. Absolute Data Integrity and Safety

Avoiding lost updates, dirty reads, stale reads and enforcing app-specific integrity constraints are critical concerns for app developers. This is especially true when building user-facing apps in verticals such as financial services, retail and SaaS. Solving these concerns directly at the database layer using the consistency provided by ACID transactions is a much simpler approach.

2. Simplified Concurrency Control

Concurrent access to shared resources such as retail inventory, bank balance, gaming leaderboards is unavoidable. Isolation in ACID transactions come to the rescue of app developers. E.g. when a database guarantees transactions with serializable isolation, developers can treat each transaction as if it were executed sequentially, even though it may actually be executed concurrently. Onerous reasoning about potential conflicts between operations from separate transactions is obviated altogether.

3. Intuitive Data Access Logic

ACID compliant databases usually allow complex schema modeling and native support for multi-step data manipulation operations such as consistent secondary indexes. Business logic can be now represented more directly in the application code.

4. Future-Proofing Database Needs

Durability is rarely up for debate in databases where stable persistence is a must-have. Hence our view is that “in-memory” only systems should not be even considered databases! However, there is always an urge on part of developers to trade-off either Atomicity or Consistency or Isolation or some combination of them in return for higher performance in distributed databases. While these tradeoffs are sometimes easier to justify in the short run, the loss of flexibility in the long run comes with a heavy cost. Competitive advantage in business comes from the ability to enhance apps fast. E.g. an internal, dashboards-only, non-transactional app can be transformed into a customer-facing transactional app in minimal time if and only if the original database was future-proof for such a change.

What’s Needed For Implementing ACID?

For any monolithic or distributed database to implement ACID transactions, there are 4 foundational aspects that need to be designed and developed.

Provisional Updates (Atomicity)

Transactions involve multiple operations across multiple rows. Given the need to treat all these operations as a single unit, some form of provisional update in a temporary space is needed first followed by a commit.

Strongly Consistent Core (Consistency)

A strongly consistent core is the basis for achieving ACID guarantees in a transaction involving only a single operation on a single row. The additional data integrity constraints needed to achieve full ACID (with multiple operations across multiple rows) are built on top of this core.

Transaction Ordering (Isolation)

For a database to support the strictest serializable isolation level, a mechanism such as globally ordered timestamps is needed to sequentially arrange all the transactions. On the other hand, the snapshot isolation level relies on a partial ordering where sub-operations in one transaction may interleave with those from other transactions. The benefit is lower latency and higher throughout than serializable level while continuing to detect write-write conflicts.

Persistent Storage (Durability)

Compared to the other 3 properties, durability is the easiest to achieve by simply using a storage engine that can store data in an underlying persistent device such as HDDs and SSDs. Our post A Busy Developer’s Guide to Database Storage Engines explains how various types of storage engines work and the specific workload patterns they are usually optimized for.

Why ACID Became Optional?

ACID compliance was taken for granted in monolithic databases of the past. This is because the monolithic database server can make provisional updates, is strongly consistent by default, runs on persistent disks and most importantly can act as a single source of truth for the ordering of concurrent transactions.

As databases became distributed and NoSQL starting late 2000s, the first order of business was to decide on the adherence to the CAP Theorem. The default choice was Availability over Consistency given the focus on big data workloads that don’t require absolute correctness. Net result was that the foundation necessary for the Consistency in ACID was compromised. It became easier to give up on Atomicity and Isolation thereafter. Each node came with its own database server that had control over its own subset of data in the overall cluster. Making provisional updates across multiple such nodes and that too with some ordering was deemed complex and unnecessary. At the same time, transactional data volumes were low enough and could be satisfied by vertical scaling of monolithic ACID compliant databases. The lack of ACID in distributed databases did not hurt traditional enterprises until recently.

Summary

ACID transactions are a fundamental feature of operational databases. They help enterprises simultaneously gain customer data integrity and app development agility. Implementing ACID transactions in a database requires significant systems engineering effort especially when the database is distributed across multiple nodes. In a follow up post, we will dive deeper into the challenges involved and see how next-generation distributed databases such as YugaByte DB are solving those challenges. Meanwhile, you can see in action YugaByte DB’s distributed transactions using a local cluster.

Sid Choudhury

VP Product