Machine Learning with Apache Spark Quick Start Guide
上QQ阅读APP看书,第一时间看更新

NoSQL databases

Relational Database Management Systems, such as Microsoft SQL Server, PostgreSQL, and MySQL, allow us to model our data in a structured manner across tables that represent the entities found in our data model that may be identified by primary keys and linked to other entities via foreign keys. These tables are pre-defined, with a schema consisting of columns of various data types that represent the attributes of the entity in question.

For example, Figure 1.5 describes a very simple relational schema that could be utilized by an e-commerce website:

Figure 1.5: A simple relational database model for an e-commerce website

NoSQL, on the other hand, simply refers to the class of databases where data is not modeled in a conventional relational manner. So, if data is not modeled relationally, how is it modeled in NoSQL databases? The answer is that there are various types of NoSQL databases depending on the use case and business application in question. These various types are summarized in the following sub-sections. 

It is a common, but mistaken, assumption that NoSQL is a synonym for distributed databases. In fact, there is an ever increasing list of RDBMS vendors whose products are designed to be scalable and distributed to accommodate huge volumes of structured data. The reason that this mistaken assumption arose is because it is often the case in real-world implementations that NoSQL databases are used to persist huge amounts of structured, semi-structured, and unstructured data in a distributed manner, hence the reason they have become synonymous with distributed databases. However, like relational databases, NoSQL databases are designed to work even on a single machine. It is the way that data is modeled that distinguishes relational, or SQL, databases from NoSQL databases.