Advanced Machine Learning with R
上QQ阅读APP看书,第一时间看更新

Elastic net

The power of elastic net is that it performs feature extraction, unlike ridge regression, and it'll group the features that LASSO fails to do. Again, LASSO will tend to select one feature from a group of correlated ones and ignore the rest. Elastic net does this by including a mixing parameter, alpha, in conjunction with lambda. Alpha will be between 0 and 1, and as before, lambda will regulate the size of the penalty. Please note that an alpha of zero is equal to ridge regression and an alpha of 1 is equivalent to LASSO. Essentially, we're blending the L1 and L2 penalties by including a second tuning parameter with a quadratic (squared) term of the beta coefficients. We'll end up with the goal of minimizing (RSS + λ[(1-alpha) (sum|Bj|2)/2 + alpha (sum |Bj|)])/N).

Let's put these techniques to the test. We'll utilize a dataset I created to demonstrate the methods. In the next section, I'll discuss how I created the dataset with a few predictive features and some noise features, including those with high correlation. I recommend that, once you feel comfortable with this chapter's content, you go back and apply them to the data examined in the prior two chapters, comparing performance.