Resulting context
The primary benefit of this solution is that proper data level bulkheads are achieved in a realistic and cost-effective manner. The shared responsibility model of the cloud allows teams to delegate the undifferentiated tasks of database management to the cloud provider and benefit from the cloud's economies of scale. The learning curve to get up and running with these value-added cloud services is short, which enable teams to focus on the value proposition of their components. Teams take full control of their stack and provision the exact resources need for their components. This isolation makes components responsive because they are not competing for database resources and those resources are optimized for their specific workloads; elastic because the load of the system is spread across many database instances; and resilient because database failures are contained within a component.
One drawback of this solution is that these services tend to be the source of regional outages. As discussed in Chapter 2, The Anatomy of Cloud Native Systems, these outages do not happen frequently, but when they do they can have a dramatic impact on systems that are not prepared for their eventuality. However, these outages are contained to a single region and systems that run in multiple regions are resilient to these outages. We have also discussed that mature cloud-native systems are multi-regional, which provides a more responsive experience for regional users, in addition to active-active redundancy to withstand these outages. To facilitate multi-regional deployments, cloud-native databases provide regional replication features. These features are becoming more and more turnkey, thanks to competition between cloud providers.
Change data capture (CDC) is one of the most important features provided by these databases. As we will see in the Event Sourcing pattern, this feature is critical in implementing transactionally sound eventual consistency across components. In the example ahead, we will see that change data capture is also instrumental in implementing intra-component asynchronous processing flows. Life cycle management features are complementary to change data capture. For example, in addition to keeping a database lean, a time to live feature will result in a delete event that can be leveraged to generate interesting and valuable time-based event-driven processing logic, over and above just propagating data through its life cycle stages. Versioning and archiving features are also typical of blob storage services.
Query limitations are a perceived drawback of this solution. We are not able to join data across components because they are in different databases. Now, even within a component, each table is isolated and cannot be joined. Plus, the query APIs of these databases tend to lack some of the features we are accustomed to with SQL. The new trend towards multi-model databases is changing this last issue. However, in general, this perceived drawback is rooted in our expectations set by monolithic general-purpose databases. We are accustomed to asking these general-purpose databases to do more work than we should. Ultimately, this resulted in inefficient utilization of scarce resources. Traditionally, we have isolated OLTP databases from OLAP databases for this exact reason. The nature of their performance and tuning characteristics are orthogonal, thus they need to be isolated.
The disposable nature of cloud infrastructure, and specifically cloud-native databases, enables us to take this isolation to the most fine-grained level. This is where the Event Sourcing and CQRS patterns come into play. In those patterns, we will be discussing how they come together with the Event Streaming pattern and eventual consistency to create materialized views that not only overcome this perceived query limitation through the pre-calculation of joins, but actually result in a more responsive, resilient, and elastic solution. I mentioned previously that this is where much of the rewiring of our engineering brains are needed.
Cloud-native databases do not hide many of the details that were encapsulated in relational databases. This is both a good thing and a bad thing. It is bad in that you need to be aware of these details and account for them, but it is good in that these details are easily within your control to optimize your solution. For example, it is a surprise at first that we need to account for all the various details of indexing and that indexes are explicitly priced. However, there was always an implicit price to indexes that was typically ignored at our own peril. Hot shards is another topic of concern that we will address in the example ahead, but essentially you must take care in the modeling of your partition hash keys. Cloud-native databases free us from the limitations of database connection pools, but they introduce capacity throttling. In the Stream Circuit Breaker pattern, we will discuss throttling and retries and so forth. For now, know that some of this logic is handled within the cloud provider SDKs. And while database monitoring is nothing new, the focus will be placed on throttling statistics to determine if and when the capacity settings and auto-scaling policies need to be adjusted.
And finally, the most commonly perceived drawback of this solution, vendor lock-in, brings us around full circle to the primary benefit of the solution. In chapter 1, Understanding Cloud Native Concepts, we discussed how vendor lock-in is rooted in monolithic thinking and that we need to embrace disposable architecture, which is afforded to us by disposable infrastructure and bounded isolated components. The primary drawback of other solutions is that they are difficult and costly to manage and ultimately increase the time to market and drive us to a shared database model that eliminates proper bulkheads. Their complexity and learning curve also leads to their own form of vendor lock-in. Instead, the cloud-native solution empowers self-sufficient, full-stack teams to embrace disposable architecture, which accelerates time to market. Once a component has been proven valuable and if its implementation is lacking, then the team can revisit its design decisions, with the confidence that the previous design decision was the cost of the information that led them to their current level of understanding. In the Data Lake pattern, we will discuss how events can be replayed to seed the data in new and improved implementations of components and thus help alleviate concerns over disposable architecture.
Each cloud provider has its own implementation of the most common database types. AWS has DynamoDB for key-value and document storage, S3 for blob storage, and both Solr and Elastic search are options for implementing search engines. Azure has CosmosDB for key-value, document, and graph storage, Azure Blob Storage, and Azure Search. Cloud providers compete on these offerings, thus a niche database offering can be the impetus for welcoming polyglot cloud. We can expect this competition to drive a great deal of innovation in the coming years.