Neural Networks and the Structure of Perceptrons
A neuron is a basic building block of the human nervous system, which relays electric signals across the body. The human brain consists of billions of interconnected biological neurons, and they are constantly communicating with each other by sending minute electrical binary signals by turning themselves on or off. The general meaning of a neural network is a network of interconnected neurons. In the current context, we are referring to ANNs, which are actually modeled on a biological neural network. The term artificial intelligence is derived from the fact that natural intelligence exists in the human brain (or any brain for that matter), and we humans are trying to simulate this natural intelligence artificially. Though ANNs are inspired by biological neurons, some of the advanced neural network architectures, such as CNNs and RNNs, do not actually mimic the behavior of a biological neuron. However, for ease of understanding, we will begin by drawing an analogy between the biological neuron and an artificial neuron (perceptron).
A simplified version of a biological neuron is represented in Figure 2.1:
This is a highly simplified representation. There are three main components:
- The dendrites, which receive the input signals
- The cell body, where the signal is processed in some form
- The tail-like axon, through which the neuron transfers the signal out to the next neuron
A perceptron can also be represented in a similar way, although it is not a physical entity but a mathematical model. Figure 2.2 shows a high-level representation of an artificial neuron:
In an artificial neuron, as in a biological one, there is an input signal. The central node conflates all the signals and fires the output signal if it is above a certain threshold. A more detailed representation of a perceptron is shown in Figure 2.3. Each component of this perceptron is explained in the sections that follow:
A perceptron has the following components:
- Input layer
- Weights
- Bias
- Net input function
- Activation function
Let's look at these components and their TensorFlow implementations in detail by considering an OR table dataset.
Input Layer
Each example of input data is fed through the input layer. Referring to the representation shown in Figure 2.3, depending on the size of the input example, the number of nodes will vary from x1 to xm. The input data can be structured data (such as a CSV file) or unstructured data, such as an image. These inputs, x1 to xm, are called features (m refers to the number of features). Let's illustrate this with an example.
Let's say the data is in the form of a table as follows:
Here, the inputs to the neuron are the columns x1 and x2, which correspond to one row. At this point, it may be difficult to comprehend, but for now, accept it that the data is fed one row at a time in an iterative manner during training. We will represent the input data and the true labels (output y) with the TensorFlow Variable class as follows:
X = tf.Variable([[0.,0.],[0.,1.],\
[1.,0.],[1.,1.]], \
tf.float32)
y = tf.Variable([0, 1, 1, 1], tf.float32)
Weights
Weights are associated with each neuron, and the input features dictate how much influence each of the input features should have in computing the next node. Each neuron will be connected to all the input features. In the example, since there were two inputs (x1 and x2) and the input layer is connected to one neuron, there will be two weights associated with it: w1 and w2. A weight is a real number; it can be positive or negative and is mathematically represented as R. When we say that a neural network is learning, what is happening is that the network is adjusting its weights and biases to get the correct predictions by adjusting to the error feedback. We will see this in more detail in the sections that follow. For now, we will initialize the weights as zeros and use the same TensorFlow Variable class as follows:
number_of_features = x.shape[1]
number_of_units = 1
Weight = tf.Variable(tf.zeros([number_of_features, \
number_of_units]), \
tf.float32)
Weights would be of the following dimension: number of input features × output size.
Bias
In Figure 2.3, bias is represented by b, which is called additive bias. Every neuron has one bias. When x is zero, that is, no information is coming from the independent variables, then the output should be biased to just b. Like the weights, the bias also a real number, and the network has to learn the bias value to get the correct predictions.
In TensorFlow, bias is the same size as the output size and can be represented as follows:
B = tf.Variable(tf.zeros([1, 1]), tf.float32)
Net Input Function
The net input function, also commonly referred to as the input function, can be described as the sum of the products of the inputs and their corresponding weights plus the bias. Mathematically, it is represented as follows:
Here:
- xi: input data—x1 to xm
- wi: weights—w1 to wm
- b: additive bias
As you can see, this formula involves inputs and their associated weights and biases. This can be written in vectorized form, and we can use matrix multiplication, which we learned about in Chapter 1, Building Blocks of Deep Learning. We will see this when we start the code demo. Since all the variables are numbers, the result of the net input function is just a number, a real number. The net input function can be easily implemented using the TensorFlow matmul functionality as follows:
z = tf.add(tf.matmul(X, W), B)
W stands for weight, X stands for input, and B stands for bias.
Activation Function (G)
The output of the net input function (z) is fed as input to the activation function. The activation function squashes the output of the net input function (z) into a new output range depending on the choice of activation function. There are a variety of activation functions, such as sigmoid (logistic), ReLU, and tanh. Each activation function has its own pros and cons. We will take a deep pe into activation functions later in the chapter. For now, we will start with a sigmoid activation function, also known as a logistic function. With the sigmoid activation function, the linear output z is squashed into a new output range of (0,1). The activation function provides non-linearity between layers, which gives neural networks the ability to approximate any continuous function.
The mathematical equation of the sigmoid function is as follows, where G(z) is the sigmoid function and the right-hand equation details the derivative with respect to z:
As you can see in Figure 2.7, the sigmoid function is a more or less S-shaped curve with values between 0 and 1, no matter what the input is:
And if we set a threshold (say 0.5), we can convert this into a binary output. Any output greater than or equal to .5 is considered 1, and any value less than .5 is considered 0.
Activation functions such as sigmoid are provided out of the box in TensorFlow. A sigmoid function can be implemented in TensorFlow as follows:
output = tf.sigmoid(z)
Now that we have seen the structure of a perceptron and its code representation in TensorFlow, let's put all the components together to make a perceptron.
Perceptrons in TensorFlow
In TensorFlow, a perceptron can be implemented just by defining a simple function, as follows:
def perceptron(X):
z = tf.add(tf.matmul(X, W), B)
output = tf.sigmoid(z)
return output
At a very high level, we can see that the input data passes through the net input function. The output of the net input function is passed to the activation function, which, in turn, gives us the predicted output. Now, let's look at each line of the code:
z = tf.add(tf.matmul(X, W), B)
The output of the net input function is stored in z. Let's see how we got that result by breaking it down further into two parts, that is, the matrix multiplication part contained in tf.matmul and the addition contained in tf.add.
Let's say we're storing the result of the matrix multiplication of X and W in a variable called m:
m = tf.matmul(X, W)
Now, let's consider how we got that result. For example, let's say X is a row matrix, like [ X1 X2 ], and W is a column matrix, as follows:
Recall from the previous chapter that tf.matmul will perform matrix multiplication. So, the result is this:
m = x1*w1 + x2*w2
And then, we add the output, m, to the bias, B, as follows:
z = tf.add(m, B)
Note that what we do in the preceding step is the same as the mere addition of the two variables m and b:
m + b
Hence, the final output is:
z = x1*w1 + x2*w2 + b
z would be the output of the net input function.
Now, let's consider the next line:
output= tf.sigmoid(z)
As we learned earlier, tf.sigmoid is a readily available implementation of the sigmoid function. The net input function's output (z) computed in the previous line is fed as input to the sigmoid function. The result of the sigmoid function is the output of the perceptron, which is in the range of 0 to 1. During training, which will be explained later in the chapter, we will feed the data in batches to this function, which will calculate the predicted values.
Exercise 2.01: Perceptron Implementation
In this exercise, we will implement the perceptron in TensorFlow for an OR table. Let's set the input data in TensorFlow and freeze the design parameters of perceptron:
- Let's import the necessary package, which, in our case, is tensorflow:
import tensorflow as tf
- Set the input data and labels of the OR table data in TensorFlow:
X = tf.Variable([[0.,0.],[0.,1.],\
[1.,0.],[1.,1.]], \
dtype=tf.float32)
print(X)
As you can see in the output, we will have a 4 × 2 matrix of input data:
<tf.Variable 'Variable:0' shape=(4, 2) dtype=float32,
numpy=array([[0., 0.],
[0., 1.],
[1., 0.],
[1., 1.]], dtype=float32)>
- We will set the actual labels in TensorFlow and use the reshape() function to reshape the y vector into a 4 × 1 matrix:
y = tf.Variable([0, 1, 1, 1], dtype=tf.float32)
y = tf.reshape(y, [4,1])
print(y)
The output is a 4 × 1 matrix, as follows:
tf.Tensor(
[[0.]
[1.]
[1.]
[1.]], shape=(4, 1), dtype=float32)
- Now let's design parameters of a perceptron.
Number of neurons (units) = 1
Number of features (inputs) = 2 (number of examples × number of features)
The activation function will be the sigmoid function, since we are doing binary classification:
NUM_FEATURES = X.shape[1]
OUTPUT_SIZE = 1
In the preceding code, X.shape[1] will equal 2 (since the indices start with zero, 1 refers to the second index, which is 2).
- Define the connections weight matrix in TensorFlow:
W = tf.Variable(tf.zeros([NUM_FEATURES, \
OUTPUT_SIZE]), \
dtype=tf.float32)
print(W)
The weight matrix would essentially be a columnar matrix as shown in the following figure. It will have the following dimension: number of features (columns) × output size:
The output size will be dependent on the number of neurons—in this case, it is 1. So, if you are developing a layer of 10 neurons with two features, the shape of this matrix will be [2,10]. The tf.zeros function creates a tensor with the given shape and initializes all the elements to zeros.
So, this will result in a zero columnar matrix like this:
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float32, \
numpy=array([[0.], [0.]], dtype=float32)>
- Now create the variable for the bias:
B = tf.Variable(tf.zeros([OUTPUT_SIZE, 1]), dtype=tf.float32)
print(B)
There is only one bias per neuron, so in this case, the bias is just one number in the form of a single-element array. However, if we had a layer of 10 neurons, then it would be an array of 10 numbers—1 for each neuron.
This will result in a 0-row matrix with a single element like this:
<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32,
numpy=array([[0.]], dtype=float32)>
- Now that we have the weights and bias, the next step is to perform the computation to get the net input function, feed it to the activation function, and then get the final output. Let's define a function called perceptron to get the output:
def perceptron(X):
z = tf.add(tf.matmul(X, W), B)
output = tf.sigmoid(z)
return output
print(perceptron(X))
The output will be a 4 × 1 array that contains the predictions by our perceptron:
tf.Tensor(
[[0.5]
[0.5]
[0.5]
[0.5]], shape=(4, 1), dtype=float32)
As we can see, the predictions are not quite accurate. We will learn how to improve the results in the sections that follow.
Note
To access the source code for this specific section, please refer to https://packt.live/3feF7MO.
You can also run this example online at https://packt.live/2CkMiEE. You must execute the entire Notebook in order to get the desired result.
In this exercise, we implemented a perceptron, which is a mathematical implementation of a single artificial neuron. Keep in mind that it is just the implementation of the model; we have not done any training. In the next section, we will see how to train the perceptron.