Unlock the Power of ACID Properties in Databases: A Comprehensive Guide for Developers

One area of database design that is incredibly powerful but seldom understood by the developer community is the ACID features of databases. The ability to truly understand ACID features enables developers to make informed decisions and build fast, secure apps on top of databases.

Understanding the ACID guarantees that a database offers frequently aids in building solid business logic at the application layer.

ACID is an acronym that stands for four very important properties that database systems offer - Atomicity, Consistency, Isolation, and Durability!

We shall go deeply into each aspect of the ACID properties in this article. We'll also examine what ACID properties are, what they can do for us as developers, and some of their shortcomings.

Along with discussing the major advantages of utilizing databases that support ACID features, we will also examine the ACID guarantees that various database providers offer. Before understanding the components of ACID we will understand what a database transaction is.

Database Transaction

Transactions are the unit of work in any database. They are either performed completely or not at all. Each database transaction can be made up of one or more SQL queries that are logically related in some manner.

Example: Let us try to understand transactions from the perspective of a banking application. Say a customer A wants to transfer $100 from his/her account to customer B’s account. This business operation can be broken into three database operations -

Check if customer A has the required balance in his/her account.
Deduct $100 from customer A’s account.
Increase customer B’s account by $100.

We want this business logic to either proceed successfully or not proceed at all. We should not have a case where money is deducted from customer A’s account and is not credited to customer B’s account.

Thus, we would want to perform the above three steps in a single database transaction. This would ensure that either all three steps would pass or none.

It is important to understand that everything that we perform in databases occurs in a transaction.

Transaction Lifespan

A transaction has the following lifespan.

BEGIN - This, when executed, indicates the beginning of a transaction.

COMMIT - This, when executed, indicates the completion of a transaction. Once a transaction is committed successfully, it is assumed that all the steps in the transaction have passed successfully.

ROLLBACK - This, when executed, indicates that whatever work has been performed in the current transaction should be discarded, and the transaction should be closed off.

Following is an example of a database transaction in postgres.

BEGIN; -- Indicates beginning of a database transaction
UPDATE ACCOUNT SET BALANCE = BALANCE - 100 WHERE CUSTOMER_ID = 'A';
UPDATE ACCOUNT SET BALANCE = BALANCE + 100 WHERE CUSTOMER_ID = 'B';
COMMIT; -- Transaction is commited

A for Atomicity

A database that guarantees its transactions to be atomic means that either the transaction would be executed as a whole or not at all. It would never happen that a transaction is executed partially while the other part fails.

This means that if any single query in a database transaction fails, all the other queries that were executed before it would be rolled back, and the transaction would not have any effect on the database.

Atomicity of transactions also means that if a database crashes before a transaction could be committed, the transaction would not have any effect on the database. In other words, database crashes would cause uncommitted transactions to roll back.

Let us continue with our example from the previous section. Let us assume that our database does not provide atomicity guarantees. Now, consider running this transaction.

BEGIN; -- Indicates beginning of a database transaction
UPDATE ACCOUNT SET BALANCE = BALANCE - 100 WHERE CUSTOMER_ID = 'A';
UPDATE ACCOUNT SET BALANCE = BALANCE + 100 WHERE CUSTOMER_ID = 'B';
COMMIT; -- Transaction is commited

Say we executed the first two commands of this transaction. Thus, while A’s account has been debited, money has not been credited to B’s account yet. Suppose at this particular moment, the database crashes.

Since our database did not provide atomicity guarantees, once our database comes back up, we would see money debited from A’s account but not credited to B’s account. And, THIS IS A DISASTER. Our money just disappeared in thin air.

It is thus very important for databases to have atomicity guarantees. It is, however, not a necessary requirement. It is totally possible for a database to not provide atomicity guarantees in exchange for some other guarantees which might be useful for certain applications.

C for Consistency

Consistency in a database refers to the requirements that data in a database can only be changed according to ways defined by various constraints set forth.

Databases that provide consistency guarantees would ensure that any programming error would not lead to data corruption based on rules defined in the database.

Foreign Keys are an example of maintaining consistency in a SQL database. Another example of consistency can be unique constraints on columns.

Database consistency does not only mean consistency in data. It also means consistency in reads.

The idea of consistency in readings states that the same values will be returned after several reads of the same material. To prevent data loss or corruption in a distributed system, this is crucial.

Strong consistency is a type of consistency that is used by most databases to guarantee that all reads always provide the most recent information. This indicates that all subsequent reads will return the value written once a write operation has been committed.

Consistency in reads becomes more relevant when we have several replicas of the database. While a user can update the data in one replica, it often takes some time for that change to propagate to other replicas.

This means that during the time in which other replicas catch up, reads made from them can show outdated data. In such cases, we often rely on the eventual consistency of the system as a whole.

I for Isolation

What happens when different transactions are executed concurrently? Does a transaction see changes made in other transactions?

These and many other such questions fall under the domain of transaction Isolation. Transaction isolation defines how various concurrent transaction behaves. But before talking about transaction isolation, let us talk about read phenomenons.

Read Phenomenons

A transaction running concurrently with another transaction can have some interference. These interference results in read phenomenons.

Dirty Reads - It occurs when a transaction reads data that was written concurrently in another transaction that has been not yet committed. We have no idea when that second transaction will be committed or rolled back, which is quite bad.

Therefore, in the event of a rollback, we might use inaccurate data in the first transaction.

Non-Repeatable Reads - This is experienced when a transaction reads the same record twice and sees different values because the row was modified by another transaction committed after the first read.

Phantom Reads - It is a similar phenomenon to non-repeatable reads, but it affects queries that search for multiple rows rather than just one.

The same query is re-executed in this case, but a different set of rows is returned as a result of changes made by other recently-committed transactions, such as inserting new rows or deleting existing rows that happen to satisfy the search condition of the current transaction's query.

Lost Updates - This phenomenon occurs when an update made by a transaction is overridden by another transaction that is committed before the first transaction committed.

Isolation Levels

To tackle these read phenomena, databases provide various levels of transaction isolations that can be configured by database users.

Read uncommitted - In this level, there is no isolation because any change from the outside is visible to the transaction, whether committed or not.

Read committed - At this level, each query in a transaction only sees committed changes from other transactions.

Repeatable Read - In this level, transactions will make sure that when a query reads a row, that row will remain unchanged while it’s running.

Snapshot - Each query in a transaction only sees changes that have been committed since the transaction began. It's like a snapshot of the database at that time.

Serializable - At this level, there is no concurrency. All the transactions run sequentially. Thus, there are no read phenomena.

The possibility of read phenomena at different isolation levels is well described in the following table from Wikipedia.

D for Durability

A database that provides transaction durability guarantees implies that any transaction committed by the database is stored durably on non-volatile storage.

For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.

Durability is achieved by storing a log of changes made to the database in a non-volatile store before acknowledging commits. Thus, a transaction is deemed committed only after it is entered in the log.

Since writing logs to a non-volatile storage is often expensive, databases often compress these logs before persisting them.

Conclusion

Thank you for reaching the end of this article on ACID properties of databases.

I hope that this article has helped you gain a better understanding of ACID and its importance in the development process. With this knowledge, developers can make informed decisions and ensure the reliability of their databases.

I thank you for taking the time to read this article and hope that you will follow me on Twitter for more such articles.

We have more articles in the pipeline that will cover a wide range of topics related to databases and development. Until then, take care, and thanks again for reading this article!