Understanding the testing matrix
In this section, we will understand the testing matrix and visualization approaches to evaluate the performance of the trained ML model. So let's understand both approaches, which are as follows:
- The default testing matrix
- The visualization approach
The default testing matrix
We are using the default score API of scikit-learn to check how well the ML is performing. In this application, the score function is the coefficient of the sum of the squared error. It is also called the coefficient of R2, which is defined by the following equation:
Here, u indicates the residual sum of squares. The equation for u is as follows:
The variable v indicates the total sum of squares. The equation for v is as follows:
The best possible score is 1.0, and it can be a negative score as well. A negative score indicates that the trained model can be arbitrarily worse. A constant model that always predicts the expected value for label y, disregarding the input features, will produce an R2 score of 0.0.
In order to obtain the score, we just need to call the score function. The code for testing will be the same as that in the Test baseline model section. Now let's take a look at another testing approach that is quite helpful in understanding the output with respect to true testing labels. So, let's check that out!
The visualization approach
In this section, we will be exploring an effective and intuitive approach, which is the visualization of the predicted output versus real output. This approach gives you a lot of insight as the graphs are easy to understand and you can decide the next steps to improve the model.
In this application, we will be using the actual prices from the testing dataset and the predicted prices for the testing dataset, which will indicate how good or bad the predictions are. You will find the code and graph for this process in the next section, named Testing the baseline model.