Machine Learning with Apache Spark Quick Start Guide
上QQ阅读APP看书,第一时间看更新

Graph databases

Graph databases, such as Neo4j and OrientDB, model data as a collection of vertices (also called nodes) linked together by one or more edges (also called relationships or links). In real-world graph implementations, vertices are often used to represent real-world entities such as individuals, organizations, vehicles, and addresses. Edges are then used to represent relationships between vertices.

Both vertices and edges can have an arbitrary number of key-value pairs, called properties, associated with them. For example, properties associated with an individual vertex may include a name and date of birth. Properties associated with an edge linking an individual vertex with another individual vertex may include the nature and length of the personal relationship. The collection of vertices, edges, and properties together form a data structure called a property graph. Figure 1.6 illustrates a simple property graph representing a small social network:

Figure 1.6: A simple property graph

Graph databases are employed in a wide variety of scenarios where the emphasis is on analyzing the relationships between objects rather than just the object's data attributes themselves. Common use cases include social network analysis, fraud detection, combating serious organized crime, customer recommendation systems, complex reasoning, pattern recognition, blockchain analysis, cyber security, and network intrusion detection.

Apache TinkerPop is an example of a graph computing framework that provides a layer of abstraction between the graph data model and the underlying mechanisms used to store and process graphs. For example, Apache TinkerPop can be used in conjunction with an underlying Apache Cassandra or Apache HBase database to store huge distributed graphs containing billions of vertices and edges partitioned across a cluster. A graph traversal language called Gremlin, a component of the Apache TinkerPop framework, can then be used to traverse and analyze the distributed graph using one of the Gremlin language variants including Gremlin Java, Gremlin Python, Gremlin Scala, and Gremlin JavaScript. To learn more about the Apache TinkerPop framework, please visit http://tinkerpop.apache.org/.