Machine Learning with Apache Spark Quick Start Guide
上QQ阅读APP看书,第一时间看更新

Ingestion layer

The ingestion layer is responsible for consuming and thereafter moving the source data, no matter what their format and frequency, to either a persistent data store, or directly to a downstream data processing engine. The ingestion layer should be capable of supporting both batch data and stream-based event data. Examples of open source technologies used to implement the ingestion layer include the following:

  • Apache Sqoop
  • Apache Kafka
  • Apache Flume