The Deep Learning Workshop
上QQ阅读APP看书,第一时间看更新

Keras as a High-Level API

In TensorFlow 1.0, there were several APIs, such as Estimator, Contrib, and layers. In TensorFlow 2.0, Keras is very tightly integrated with TensorFlow, and it provides a high-level API that is user-friendly, modular, composable, and easy to extend in order to build and train deep learning models. This also makes developing code for neural networks much easier. Let's see how it works.

Exercise 2.05: Binary Classification Using Keras

In this exercise, we will implement a very simple binary classifier with a single neuron using the Keras API. We will use the same data.csv file that we used in Exercise 2.02, Perceptron as a Binary Classifier:

Note

The dataset can be downloaded from GitHub by accessing the following GitHub link: https://packt.live/2BVtxIf.

  1. Import the required libraries:

    import tensorflow as tf

    import pandas as pd

    import matplotlib.pyplot as plt

    %matplotlib inline

    # Import Keras libraries

    from tensorflow.keras.models import Sequential

    from tensorflow.keras.layers import Dense

    In the code, Sequential is the type of Keras model that we will be using because it is very easy to add layers to it. Dense is the type of layer that will be added. These are the regular neural network layers as opposed to the convolutional layers or pooling layers that will be used later on.

  2. Import the data:

    df = pd.read_csv('data.csv')

  3. Inspect the data:

    df.head()

    The following will be the output:

    Figure 2.20: Contents of the DataFrame

  4. Visualize the data using a scatter plot:

    plt.scatter(df[df['label'] == 0]['x1'], \

                df[df['label'] == 0]['x2'], marker='*')

    plt.scatter(df[df['label'] == 1]['x1'], \

                df[df['label'] == 1]['x2'], marker='<')

    The resulting plot is as follows, with the x axis denoting x1 values and the y-axis denoting x2 values:

    Figure 2.21: Scatter plot of the data

  5. Prepare the data by separating the features and labels and setting the tf variables:

    x_input = df[['x1','x2']].values

    y_label = df[['label']].values

  6. Create a neural network model consisting of a single layer with a neuron and a sigmoid activation function:

    model = Sequential()

    model.add(Dense(units=1, input_dim=2, activation='sigmoid'))

    The parameters in mymodel.add(Dense()) are as follows: units is the number of neurons in the layer; input_dim is the number of features, which in this case is 2; and activation is sigmoid.

  7. Once the model is created, we use the compile method to pass the additional parameters that are needed for training, such as the type of the optimizer, the loss function, and so on:

    model.compile(optimizer='adam', \

                  loss='binary_crossentropy',\

                  metrics=['accuracy'])

    In this case, we are using the adam optimizer, which is an enhanced version of the gradient descent optimizer, and the loss function is binary_crossentropy, since this is a binary classifier.

    The metrics parameter is almost always set to ['accuracy'], which is used to display information such as the number of epochs, the training loss, the training accuracy, the test loss, and the test accuracy during the training process.

  8. The model is now ready to be trained. However, it is a good idea to check the configuration of the model by using the summary function:

    model.summary()

    The output will be as follows:

    Figure 2.22: Summary of the sequential model

  9. Train the model by calling the fit() method:

    model.fit(x_input, y_label, epochs=1000)

    It takes the features and labels as the data parameters along with the number of epochs, which in this case is 1000. The model will start training and will continuously provide the status as shown here:

    Figure 2.23: Model training logs using Keras

  10. We will evaluate our model using Keras's evaluate functionality:

    model.evaluate(x_input, y_label)

    The output is as follows:

    21/21 [==============================] - 0s 611us/sample - loss: 0.2442 - accuracy: 1.0000

    [0.24421504139900208, 1.0]

    As you can see, our Keras model is able to train well, as our accuracy is 100%.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2ZVV1VY.

    You can also run this example online at https://packt.live/38CzhTc. You must execute the entire Notebook in order to get the desired result.

In this exercise, we have learned how to build a perceptron using Keras. As you have seen, Keras makes the code more modular and more readable, and the parameters easier to tweak. In the next section, we will see how to build a multilayer or deep neural network using Keras.

Multilayer Neural Network or Deep Neural Network

In the previous example, we developed a single-layer neural network, often referred to as a shallow neural network. A diagram of this follows:

Figure 2.24: Shallow neural network

One layer of neurons is not sufficient to solve more complex problems, such as face recognition or object detection. You need to stack up multiple layers. This is often referred to as creating a deep neural network. A diagram of this follows:

Figure 2.25: Deep neural network

Before we jump into the code, let's try to understand how this works. Input data is fed to the neurons in the first layer. It must be noted that every input is fed to every neuron in the first layer, and every neuron has one output. The output from each neuron in the first layer is fed to every neuron in the second layer. The output of each neuron in the second layer is fed to every neuron in the third layer, and so on.

That is why this kind of network is also referred to as a dense neural network or a fully connected neural network. There are other types of neural networks with different workings, such as CNNs, but that is something we will discuss in the next chapter. There is no set rule about the number of neurons in each layer. This is usually determined by trial and error in a process known as hyperparameter tuning (which we'll learn about later in the chapter). However, when it comes to the number of neurons in the last layers, there are some restrictions. The configuration of the last layer is determined as follows:

Figure 2.26: Last layer configuration

ReLU Activation Function

One last thing to do before we implement the code for deep neural networks is learn about the ReLU activation function. This is one of the most popular activation functions used in multilayer neural networks.

ReLU is a shortened form of Rectified Linear Unit. The output of the ReLU function is always a non-negative value that is greater than or equal to 0:

Figure 2.27: ReLU activation function

The mathematical expression for ReLU is:

Figure 2.28: ReLU activation function

ReLU converges much more quickly than the sigmoid activation function, and therefore it is by far the most widely used activation function. ReLU is used in almost every deep neural network. It is used in all the layers except the last layer, where either sigmoid or Softmax is used.

The ReLU activation function is provided by TensorFlow out of the box. To see how it is implemented, let's give some sample input values to a ReLU function and see the output:

values = tf.Variable([1.0, -2., 0., 0.3, -1.5], dtype=tf.float32)

output = tf.nn.relu(values)

tf.print(output)

The output is as follows:

[1 0 0 0.3 0]

As you can see, all the positive values are retained, and the negative values are suppressed to zero. Let's use this ReLU activation function in the next exercise to do a multilayer binary classification task.

Exercise 2.06: Multilayer Binary Classifier

In this exercise, we will implement a multilayer binary classifier using the data.csv file that we used in Exercise 2.02, Perceptron as a Binary Classifier.

We will build a binary classifier with a deep neural network of the following configuration. There will be an input layer with 2 nodes and 2 hidden layers, the first with 50 neurons and the second with 20 neurons, and lastly a single neuron to do the final prediction belonging to any binary class:

Note

The dataset can be downloaded from GitHub using the following link: https://packt.live/2BVtxIf .

  1. Import the required libraries and packages:

    import tensorflow as tf

    import pandas as pd

    import matplotlib.pyplot as plt

    %matplotlib inline

    ##Import Keras libraries

    from tensorflow.keras.models import Sequential

    from tensorflow.keras.layers import Dense

  2. Import and inspect the data:

    df = pd.read_csv('data.csv')

    df.head()

    The output is as follows:

    Figure 2.29: The first five rows of the data

  3. Visualize the data using a scatter plot:

    plt.scatter(df[df['label'] == 0]['x1'], \

                df[df['label'] == 0]['x2'], marker='*')

    plt.scatter(df[df['label'] == 1]['x1'], \

                df[df['label'] == 1]['x2'], marker='<')

    The resulting output is as follows, with the x axis showing x1 values and the y axis showing x2 values:

    Figure 2.30: Scatter plot for given data

  4. Prepare the data by separating the features and labels and setting the tf variables:

    x_input = df[['x1','x2']].values

    y_label = df[['label']].values

  5. Build the Sequential model:

    model = Sequential()

    model.add(Dense(units = 50,input_dim=2, activation = 'relu'))

    model.add(Dense(units = 20 , activation = 'relu'))

    model.add(Dense(units = 1,input_dim=2, activation = 'sigmoid'))

    Here are a couple of points to consider. We provide the input details for the first layer, then use the ReLU activation function for all the intermediate layers, as discussed earlier. Furthermore, the last layer has only one neuron with a sigmoid activation function for binary classifiers.

  6. Provide the training parameters using the compile method:

    model.compile(optimizer='adam', \

                  loss='binary_crossentropy', metrics=['accuracy'])

  7. Inspect the model configuration using the summary function:

    model.summary()

    The output will be as follows:

    Figure 2.31: Deep neural network model summary using Keras

    In the model summary, we can see that there are a total of 1191 parameters—weights and biases—to learn across the hidden layers to the output layer.

  8. Train the model by calling the fit() method:

    model.fit(x_input, y_label, epochs=50)

    Notice that, in this case, the model reaches 100% accuracy within 50 epochs, unlike the single-layer model, which needed about 1,000 epochs:

    Figure 2.32: Multilayer model train logs

  9. Let's evaluate the model's performance:

    model.evaluate(x_input, y_label)

    The output is as follows:

    21/21 [==============================] - 0s 6ms/sample - loss: 0.1038 - accuracy: 1.0000

    [0.1037961095571518, 1.0]

    Our model has now been trained and demonstrates 100% accuracy.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2ZUkM94.

    You can also run this example online at https://packt.live/3iKsD1W. You must execute the entire Notebook in order to get the desired result.

In this exercise, we learned how to build a multilayer neural network using Keras. This is a binary classifier. In the next exercise, we will build a deep neural network for a multiclass classifier with the MNIST dataset.

Exercise 2.07: Deep Neural Network on MNIST Using Keras

In this exercise, we will perform a multiclass classification by implementing a deep neural network (multi-layer) for the MNIST dataset where our input layer comprises 28 × 28 pixel images flattened to 784 input nodes followed by 2 hidden layers, the first with 50 neurons and the second with 20 neurons. Lastly, there will be a Softmax layer consisting of 10 neurons since we are classifying the handwritten digits into 10 classes:

  1. Import the required libraries and packages:

    import tensorflow as tf

    import pandas as pd

    import matplotlib.pyplot as plt

    %matplotlib inline

    # Import Keras libraries

    from tensorflow.keras.models import Sequential

    from tensorflow.keras.layers import Dense

    from tensorflow.keras.layers import Flatten

  2. Load the MNIST data:

    mnist = tf.keras.datasets.mnist

    (train_features,train_labels), (test_features,test_labels) = \

    mnist.load_data()

    train_features has the training images in the form of 28 x 28 pixel values.

    train_labels has the training labels. Similarly, test_features has the test images in the form of 28 x 28 pixel values. test_labels has the test labels.

  3. Normalize the data:

    train_features, test_features = train_features / 255.0, \

                                    test_features / 255.0

    The pixel values of the images range from 0-255. We need to normalize the values by piding them by 255 so that the range goes from 0 to 1.

  4. Build the sequential model:

    model = Sequential()

    model.add(Flatten(input_shape=(28,28)))

    model.add(Dense(units = 50, activation = 'relu'))

    model.add(Dense(units = 20 , activation = 'relu'))

    model.add(Dense(units = 10, activation = 'softmax'))

    There are couple of points to note. The first layer in this case is not actually a layer of neurons but a Flatten function. This flattens the 28 x 28 image into a single array of 784, which is fed to the first hidden layer of 50 neurons. The last layer has 10 neurons corresponding to the 10 classes with a softmax activation function.

  5. Provide training parameters using the compile method:

    model.compile(optimizer = 'adam', \

                  loss = 'sparse_categorical_crossentropy', \

                  metrics = ['accuracy'])

    Note

    The loss function used here is different from the binary classifier. For a multiclass classifier, the following loss functions are used: sparse_categorical_crossentropy, which is used when the labels are not one-hot encoded, as in this case; and, categorical_crossentropy, which is used when the labels are one-hot encoded.

  6. Inspect the model configuration using the summary function:

    model.summary()

    The output is as follows:

    Figure 2.33: Deep neural network summary

    In the model summary, we can see that there are a total of 40,480 parameters—weights and biases—to learn across the hidden layers to the output layer.

  7. Train the model by calling the fit method:

    model.fit(train_features, train_labels, epochs=50)

    The output will be as follows:

    Figure 2.34: Deep neural network training logs

  8. Test the model by calling the evaluate() function:

    model.evaluate(test_features, test_labels)

    The output will be:

    10000/10000 [==============================] - 1s 76us/sample - loss: 0.2072 - accuracy: 0.9718

    [0.20719025060918111, 0.9718]

    Now that the model is trained and tested, in the next few steps, we will run the prediction with some images selected randomly.

  9. Load a random image from a test dataset. Let's locate the 200th image:

    loc = 200

    test_image = test_features[loc]

  10. Let's see the shape of the image using the following command:

    test_image.shape

    The output is:

    (28,28)

    We can see that the shape of the image is 28 x 28. However, the model expects 3-dimensional input. We need to reshape the image accordingly.

  11. Use the following code to reshape the image:

    test_image = test_image.reshape(1,28,28)

  12. Let's call the predict() method of the model and store the output in a variable called result:

    result = model.predict(test_image)

    print(result)

    result has the output in the form of 10 probability values, as shown here:

    [[2.9072076e-28 2.1215850e-29 1.7854708e-21

      1.0000000e+00 0.0000000e+00 1.2384960e-15

      1.2660366e-34 1.7712217e-32 1.7461657e-08

      9.6417470e-29]]

  13. The position of the highest value will be the prediction. Let's use the argmax function we learned about in the previous chapter to find out the prediction:

    result.argmax()

    In this case, it is 3:

    3

  14. In order to check whether the prediction is correct, we check the label of the corresponding image:

    test_labels[loc]

    Again, the value is 3:

    3

  15. We can also visualize the image using pyplot:

    plt.imshow(test_features[loc])

    The output will be as follows:

Figure 2.35: Test image visualized

And this shows that the prediction is correct.

Note

To access the source code for this specific section, please refer to https://packt.live/2O5KRgd.

You can also run this example online at https://packt.live/2O8JHR0. You must execute the entire Notebook in order to get the desired result.

In this exercise, we created a multilayer multiclass neural network model using Keras to classify the MNIST data. With the model we built, we were able to correctly predict a random handwritten digit.