Hands-On Python Deep Learning for the Web

上QQ阅读APP看书，第一时间看更新

Anatomy of a nonlinear neuron

A nonlinear neuron means that it is capable of responding to the nonlinearities that may be present in the data. Nonlinearity in this context essentially means that for a given input, the output does not change in a linear way. Look at the following diagrams:

Both of the preceding figures depict the relationship between the inputs that are given to a neural network and the outputs that the network produces. From the first figure, it is clear that the input data is linearly separable, whereas the second figure tells us that the inputs cannot be linearly separated. In cases like this, a linear neuron will miserably fail, hence the need for nonlinear neurons.

In the training process of a neural network, conditions can arise where a small change in the bias and weight values may affect the output of the neural network in a drastic way. Ideally, this should not happen. A small change to either the bias or weight values should cause only a small change in the output. When a step function is used, the changes in weight and bias terms can affect the output to a great extent, hence the need for something other than a step function.

Behind the operation of a neuron sits a function. In the case of the linear neuron, we saw that its operations were based on a step function. We have a bunch of functions that are capable of capturing the nonlinearities. The sigmoid function is such a function, and the neurons that use this function are often called sigmoid neurons. Unlike the step function, the output in the case of a sigmoid neuron is produced using the following rule:

So, our final, updated rule becomes the following:

But why is the sigmoid function better than a step function in terms of capturing nonlinearities? Let's compare their performance in graphical to understand this:

The preceding two figures give us a clear picture about the two functions regarding their intrinsic nature. It is absolutely clear that the sigmoid function is more sensitive to the nonlinearities than the step function.

Apart from the sigmoid function, the following are some widely known and used functions that are used to give a neuron a nonlinear character:

Tanh
ReLU
Leaky ReLU

In the literature, these functions, along with the two that we have just studied, are called activation functions. Currently, ReLU and its variants are by far the most successful activation functions.

We are still left with a few other basic things related to artificial neural networks. Let's summarize what we have learned so far:

Neurons and their two main types
Layers
Activation functions

We are now in a position to draw a line between MLPs and neural networks. Michael Nielson in his online book Neural Networks and Deep Learning describes this quite well:

Somewhat confusingly, and for historical reasons, such multiple layer networks are sometimes called multilayer perceptrons or MLPs , despite being made up of sigmoid neurons, not perceptrons.

We are going to use the neural network and deep neural network terminologies throughout this book. We will now move forward and learn more about the input and output layers of a neural network.