Hands-On Microservices with Kubernetes
上QQ阅读APP看书,第一时间看更新

Applying the saga pattern to microservices

Relational databases can provide ACID compliance for distributed systems through algorithms, such as two-phase commit and control over all the data. The two-phase commit algorithm works in two phases: prepare and commit. However, the services that participate in the distributed transaction must share the same database. That doesn't work for microservices that manage their own databases.

Enter the saga pattern. The basic idea of the saga pattern is that there is centralized management of the operations across all the microservices and that, for each operation, there is a compensating operation that will be executed if, for some reason, the entire transaction can't be completed. This achieves the atomicity property of ACID. But, the changes on each microservice are visible immediately and not only at the end of the entire distributed transaction. This violates the consistency and isolation properties. This is not a problem if you design your system as AP, also known as, eventually consistent. But, it requires your code to be aware of it and be able to work with data that may be partially inconsistent or stale. In many cases, this is an acceptable compromise.

How does a saga work? A saga is a set of operations and corresponding compensating operations on microservices. When an operation fails, its compensating operation and the compensating operations of all the previous operations are called in reverse order to roll back the entire state of the system.

Sagas are not trivial to implement because the compensating operations might fail too. In general, the transient state must be persistent and marked as such and a lot of metadata must be stored to enable reliable rollback. A good practice is to have an out-of-band process run frequently and clean up failed sagas that didn't manage to complete all their compensating operations in real time.

A good way to think about sagas is as workflows. Workflows are cool because they enable long processes that even involve humans and not just software.