I’ve been recently working on a new project that needs to maintain a strong consistency of values between concurrent accesses. In order to better understand this problem you can think something as:

  • Keeping your bank account balance always right
  • Handling correctly the number of products in a shelf
  • The number of likes or views of a video or article

You should be wondering what’s the complexity in managing these problems. For this, let me present you The Lost Update Problem.

The Lost Update Problem

This problem is fairly known in the computer science field and is about, just like the name says, losing an update due to concurrent access. As you can see the final number of items available in stock will be 5 where it should be 2.

To solve this issue we need to think about locking, better defined as

“A mechanism to synchronize access by multiple users to the same piece of data at the same time”.

We can easily overcome this problem with database locking but it gets a little more tricky when we’re talking about distributed systems.

Database locking

A note about Isolation Levels

Isolation determines how transaction integrity is visible to other users and systems. It’s very important to say that this can’t solve the issue described in this article.

Isolation level Dirty Reads Non-repeatable reads Phantom Reads
Read Uncommitted YES YES YES
Read Committed NOPE YES YES
Repeatable Read NOPE NOPE YES
Serializable NOPE NOPE NOPE

Changing the default from Repeatable Read can just make more difficult to track a problem and will introduce you to a whole new world of inconsistencies.

Even if you allow dirty reads there will be a moment where your data access won’t be synchronized amongst concurrent transactions. Tip: Don't change this!

Pessimistic and optimistic locking

The easiest way to solve The Lost Update Problem is just to lock for writing the row you want to update before actually updating it (pessimist locking). Most databases support the following statement:

SELECT * FROM products WHERE id = 1 FOR UPDATE;

That means that other sessions can read the row, but cannot modify it until your transaction commits. So we can easily solve the concurrent access by… not having concurrent access! Databases can solve most of the issues from small-to-big companies and can work under a heavy load of transactions, but there are sometimes that this can lead to performance issues.

The optimistic locking assumes that although conflicts are possible, they will be very rare. It allows fast performance and high concurrency, at the cost of occasionally refusing to write data that was initially accepted but was found at the last second to conflict with another user’s changes.

As we know, the UPDATE statement already locks the row for writing, so it’s just one update at a time. This way we can just add a check for the old value while updating it:

UPDATE products SET units=5 WHERE units=10;

Other concurrent transactions will not satisfy the WHERE clause, updating zero rows. If that happens, you’ll need to do a rollback and inform the user that the UPDATE failed, or you can just do the entire transaction again.

Distributed Systems

Let me introduce to you some new problems:

  • We’re not using a database that natively supports locking or transactions
  • We’re working with distributed systems that need to lock a access to a shared resource

First of all, let me remind you of the first law of distributed objects: don’t distribute your objects (thanks Martin Fowler). If you can avoid going into distributed systems then do it. This will require solving a lot more problems than the ones listed above, such as: failure handling (distributed rollback), data consistency and replication, fault tolerance, resiliency, throttling/rate limit, retries and so on…

External locking

If you’re going into it, then you’ll need external locking. Here are some articles and tools that can help you in this new adventure:

Apache Zookeeper

Softwares like Apache Hadoop (distributed processing of data), Apache Kafka (distributed streaming platform) and Apache Mesos (distributed systems kernel) are using Zookeeper for distributed coordination and synchronization.

Distributed locks with Redis

Redis is an in-memory database with lots ready-to-use data structures and algorithms. This article explains how to use an algorithm called Redlock that implements a DLM (Distributed Lock Manager). You can also use a single-node Redis with the SETNX command (discouraged in favor of the first one).

How to coordinate distributed work with MySQL’s GET_LOCK

A very good article by Baron Schwartz, co-author of High Performance MySQL (published by O’Reilly Media) talking about a MySQL function called GET_LOCK capable of locking an user defined string. Of course you should not use this to lock rows (MySQL already does that very well) but probably to lock external resources.

How to do distributed locking

This is a very well written article by Martin Kleppmann, the author of Designing Data-Intensive Applications (also published by O’Reilly Media) talking about the usefulness of locking, some considerations about the Redlock algorithm and fencing tokens.

Distributed data stores

CAP Theorem

As discussed above, there are several ways of doing locking and avoiding inconsistencies. RDBMSs are already known as great tools for this job and they are what is called a CP system, i.e., they provide an environment with Consistency plus Partition tolerance.

The CAP theorem states that in the presence of a network partition, one has to choose between consistency and availability:

Consistency Every read receives the most recent write or an error
Availability Every request receives a (non-error) response – without guarantee that it contains the most recent write
Partition tolerance The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Distributed data stores are more focused in delivering availability over consistency and have some features that can help us dealing with concurrent access in a different way.

Conflict resolution and eventual consistency

Performance and availability comes with a cost. You can have multiple updates in the same resource and in different nodes of your cluster. When that happens you can end up with two different versions of the data (also known as siblings), that can (or cannot) eventually converge.

Some ways of resolving conflicts are:

These last two items are mechanisms for tracking object updates in terms of logical time rather than chronological time (as with timestamps), enabling then a better conflict resolution. But if a conflict can’t be solved then it’s the application’s responsibility to deal with it.

Proposed at the beginning of this article, the problem The number of likes or views of a video or article can be easily solved by CRDTs. They are data structures that can be updated independently and concurrently without coordination, and where it is always mathematically possible to resolve inconsistencies which might result. Riak Data Types are a good examples about this.

The point here is:

You can design your system to understand eventual consistency and conflict resolution. Some problems can be easily modeled and resolved without explicit locking.

Conclusion

Understand your needs. Relational databases are a long standing technology and a very accurate way of handling all of these problems. On the other side, probably with a higher cost but higher performance, availability and so on, are distributed systems and distributed data stores, handling problems in a very different way. Keep studying!

References