Visualization is a good first step
Datasets later in the book will grow to thousands of features. With only four features in our starting example, we can easily plot all two-dimensional projections on a single page and build predictions, which can then be extended to large datasets with many more features. As we saw in Chapter 3, Regression, visualizations are excellent in the initial exploratory phase of the analysis as they allow you to learn the general features of your problem as well as catch problems that occurred with data collection early.
Each subplot in the following plot shows all points projected into two of the dimensions. The outlying group (triangles) are the Iris Setosa plants, while Iris Versicolor plants are in the center (circle) and Iris Virginica are plotted with x marks. We can see that there are two large groups. One is of Iris Setosa and another is a mixture of Iris Versicolor and Iris Virginica:
Here is the code to load the dataset (you can find the plotting code in the online repository):
from sklearn.datasets import load_iris data = load_iris() features = data.data feature_names = data.feature_names target = data.target target_names = data.target_names labels = target_names[target]