scikit-learn Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

There's more...

To summarize, we will do all the steps with a different algorithm, logistic regression:

  1. First, import LogisticRegression:
from sklearn.linear_model import LogisticRegression
  1. Then write a program with the modeling steps:
    1. Split the data into training and testing sets.
    2. Fit the logistic regression model.
    3. Predict using the test observations.
    4. Measure the accuracy of the predictions with y_test versus y_pred:
import matplotlib.pyplot as plt
from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X = iris.data[:, :2] #load the iris data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

#train the model
clf = LogisticRegression(random_state = 1)
clf.fit(X_train, y_train)

#predict with Logistic Regression
y_pred = clf.predict(X_test)

#examine the model accuracy
accuracy_score(y_test,y_pred)

0.60526315789473684

This number is lower; yet we cannot make any conclusions comparing the two models, SVC and logistic regression classification. We cannot compare them, because we were not supposed to look at the test set for our model. If we made a choice between SVC and logistic regression, the choice would be part of our model as well, so the test set cannot be involved in the choice. Cross-validation, which we will look at next, is a way to choose between models.