上QQ阅读APP看书,第一时间看更新
Speed layer - near real time data processing
This layer is expected to perform near-real-time processing on the data received from the ingestion layer. Since the processing is expected to be in near real time, such data processing will need to be quick, fast, and efficient, with support and design for high-concurrency scenarios and an eventually consistent outcome. A lot of factors play a role in making this layer fast, which will be discussed in detail later in this book. Broadly, the specifications for such a layer can be summarized as follows:
- Must support fast operation on very specific data streams ingested.
- Must be able to produce a data model relevant to near-real-time processing needs. All long-running processes must be delegated to batch mode.
- Must be supported by fast access and storage layers so as to have no backlog/pile-up of events to be processed.
- Must be decoupled like the batch process from the ingestion layer.
- Must produce output model in a way that it can be merged with the batch-processed dataset to provide enriched enterprise data.