
How it works...
The centering and scaling function is extremely simple. It merely subtracts the mean and divides by the standard deviation.
Pictorially and with pandas, the third feature looks as follows before the transformation:
pd.Series(X[:,2]).hist(bins=50)

This is what it looks like afterward:
pd.Series(preprocessing.scale(X[:, 2])).hist(bins=50)

The x axis label has changed.
In addition to a function, there is also a centering and scaling class that is easy to invoke, and this is particularly useful when used in conjunction with pipelines, which are mentioned later. It's also useful for the centering and scaling class to persist across individual scaling:
my_scaler = preprocessing.StandardScaler()
my_scaler.fit(X[:, :3])
my_scaler.transform(X[:, :3]).mean(axis=0)
array([ 6.34099712e-17, -6.34319123e-16, -2.68291099e-15])
Scaling features to a mean of zero and a standard deviation of one isn't the only useful type of scaling.
Pre-processing also contains a MinMaxScaler class, which will scale the data within a certain range:
my_minmax_scaler = preprocessing.MinMaxScaler()
my_minmax_scaler.fit(X[:, :3])
my_minmax_scaler.transform(X[:, :3]).max(axis=0)
array([ 1., 1., 1.])
my_minmax_scaler.transform(X[:, :3]).min(axis=0)
array([ 0., 0., 0.])
It's very simple to change the minimum and maximum values of the MinMaxScaler class from its defaults of 0 and 1, respectively:
my_odd_scaler = preprocessing.MinMaxScaler(feature_range=(-3.14, 3.14))
Furthermore, another option is normalization. This will scale each sample to have a length of one. This is different from the other types of scaling done previously, where the features were scaled. Normalization is illustrated in the following command:
normalized_X = preprocessing.normalize(X[:, :3])
If it's not apparent why this is useful, consider the Euclidean distance (a measure of similarity) between three of the samples, where one sample has the values (1, 1, 0), another has (3, 3, 0), and the final has (1, -1, 0).
The distance between the first and third vector is less than the distance between the first and second although the first and third are orthogonal, whereas the first and second only differ by a scalar factor of three. Since distances are often used as measures of similarity, not normalizing the data first can be misleading.
From an alternative perspective, try the following syntax:
(normalized_X * normalized_X).sum(axis = 1)
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.
...]
All the rows are normalized and consist of vectors of length one. In three dimensions, all normalized vectors lie on the surface of a sphere centered at the origin. The information left is the direction of the vectors because, by definition, by normalizing you are dividing the vector by its length. Do always remember, though, that when performing this operation you have set an origin at (0, 0, 0) and you have turned any row of data in the array into a vector relative to this origin.