Python Machine Learning By Example
上QQ阅读APP看书,第一时间看更新

Bagging

Bootstrap aggregating or bagging is an algorithm introduced by Leo Breiman in 1994, which applies bootstrapping to machine learning problems. Bootstrapping is a statistical procedure that creates datasets from existing data by sampling with replacement. Bootstrapping can be used to analyze the possible values that arithmetic mean, variance, or other quantity can assume.

The algorithm aims to reduce the chance of overfitting with the following steps:

  1. We generate new training sets from input train data by sampling with replacement
  2. For each generated training set, we fit a new model
  3. We combine the results of the models by averaging or majority voting

The following diagram illustrates the steps for bagging, using classification as an example:

We'll explore how to employ bagging mainly in Chapter 6, Predicting Online Ads Click-Through with Tree-Based Algorithms.