Machine Learning with Apache Spark Quick Start Guide

上QQ阅读APP看书，第一时间看更新

Stage

If the Spark job and, hence, the action that resulted in the launching of that job, involves the shuffling of data (that is, the redistribution of data), then that job is broken down into stages. A new stage begins when network communication is required between the worker nodes. An individual stage is therefore defined as a collection of tasks processed by an individual executor with no dependency on other executors.