Machine Learning Algorithms(Second Edition)
上QQ阅读APP看书,第一时间看更新

Whitening

The StandardScaler class operates in a feature-wise fashion, however, sometimes, it's useful to transform the whole dataset so as to force it to have an identity covariance matrix (to improve the performances of many algorithms that are sensitive to the number of independent components):

The goal is to find a transformation matrix A (called the whitening matrix) so that the new dataset X' = XAT has an identity covariance C' (we are assuming that X is zero-centered or, alternatively, it has zero mean). The procedure is quite simple (it can be found in Mastering Machine Learning Algorithms, Bonaccorso G., Packt Publishing, 2018), but it requires some linear algebra manipulations. In this context, we directly provide the final result. It's possible to prove that the singular value decomposition (SVD) (see the section on PCA) of C'∝ X'TX' = AXTXAT is:

Ω is a diagonal matrix containing the eigenvalues of C', while the columns of V are the eigenvectors (the reader who is not familiar with this concept can simply consider this technique as a way to factorize a matrix). Considering the last equation, we obtain the whitening matrix:

This is the whitening matrix for the dataset X. As Ω is diagonal, the square root of the inverse is as follows:

The Python code to perform this operation (based on the SVD provided by NumPy) is as follows:

import numpy as np

def zero_center(X):
return X - np.mean(X, axis=0)

def whiten(X, correct=True):
Xc = zero_center(X)
_, L, V = np.linalg.svd(Xc)
W = np.dot(V.T, np.diag(1.0 / L))
return np.dot(Xc, W) * np.sqrt(X.shape[0]) if correct else 1.0

In the following graph, there's an example based on the original dataset with a non-diagonal covariance matrix:

Original dataset (left), and the whitened one (right)

As it's possible to see, the whitened dataset is symmetric and its covariance matrix is approximately an identity (Xw is the output of the transformation):

import numpy as np

print(np.cov(Xw.T))

[[1.00100100e+00 5.26327952e-16] [5.26327952e-16 1.00100100e+00]]

To better understand the impact, I invite the reader to test this function with other datasets, comparing the performances of the same algorithms. It's important to remember that a whitening procedure works with the whole dataset, hence it could be unacceptable whenever the training process is performed online. However, in the majority of cases, it can be employed without restrictions and can provide a concrete benefit for both the training speed and accuracy.