Apache Ignite Quick Start Guide
上QQ阅读APP看书,第一时间看更新

CAP theorem and Apache Ignite

Apache Ignite supports distributed transactional cache operations, and at the same time it is highly available. Supporting both ACID transactions and high availability is a big ask for any distributed data store. Distributed data stores follow the CAP theorem. Computer scientist Eric Brewer proposed the CAP theorem, and it says that a distributed data store cannot offer more than two of the following three capabilities:

  • Consistency: You will always get the latest and greatest data. Suppose you have two nodes, A and B, and someone is updating a document/record in node B and you are reading that same record from node A. You should get the latest update made to the record in node B.
  • Availability: You should always get a response; it may not be the latest data, but it should not throw an error.
  • Partition tolerance: This means that if you remove the network connection between the nodes (A and B, in our case), the system should still operate.

Distributed data stores cannot escape from network failures; CAP theorem states that in the case of network partitions, a node can either be consistent or available.

What does that mean? In our example, we have two nodes, A and B. Suppose someone chops the network cable and disconnects A and B. Someone may make changes to node B after that network partitioning. If you are connected to node A, you can get old data or A may stop responding to your queries. If A returns you data, that means your distributed system is available but not consistent as you are getting stale data, which is not consistent. If A stops responding to you, that means your data store system is consistent but not available, as it is not giving you any stale data and at the same time has stopped responding to you.

Any distributed network data store can either be CA, CP, or AP. Let's explore the characteristics of each system:

  • CA system: Here, the data store supports consistency and availability but doesn't support network partitioning, so this architecture is not scalable. To provide ACID transactions and high availability, it sacrifices network partitioning. All RDBM data stores are CA systems.
  • CP system: This system architecture supports consistency and partition tolerance, but sacrifices availability. In the case of network outage, some nodes stop responding to queries to maintain consistency. Some data is lost.
  • AP system: The distributed data store is always available and partitioned. This system architecture sacrifices consistency in the case of network partitioning. 

The following diagram depicts the CAP abilities:

Apache Ignite is a distributed data store, so it must adhere to the CAP theorem and support any two of the following three capabilities: C, A, or P. Apache Ignite supports ACID-compliant distributed transactions with partition tolerance and also offers high availability when the network is partitioned, so it's a CP and AP, but not CA, as it is scalable and supports network partitioning. But how in the world can a distributed network data store support both consistency and availability during a network outage? It must adhere to CAP and support either availability or consistency. The catch is that the Apache Ignite cache API supports two cache modes for cache operations:

  • Atomic mode 
  • Transactional mode

In atomic mode, Apache Ignite supports AP, high availability, and sacrifices consistency when the network is partitioned. In transactional mode, Apache Ignite supports consistency (ACID-compliant cache operations) and sacrifices high availability.

The following diagram explains the Apache Ignite cache modes:

The consistency or transactional behavior is achieved through two-phase commit and locking. We will write code to verify both cache modes. Later in this chapter, we will explore optimistic and pessimistic locking, isolation levels, and two-phase commit in detail.

Next, we are going explore the cluster topology of Apache Ignite.