
Introducing non-linearity
So now, we know how data enters a perceptron unit, and how associated weights are paired up with each input feature. We also know how to represent our input features, and their respective weights, as n x 1 matrices, where n is the number of input features. Lastly, we saw how we can transpose our feature matrix to be able to compute its dot product with the matrix containing its weights. This operation left us with one single scalar value. So, what's next? This is not a bad time to take a step back and ponder over what we are trying to achieve, as this will help us to understand the idea behind why we want to employ something like an activation function.
Well, you see, real-word data is often non-linear. What we mean by this is that whenever we attempt to model an observation as a function of different inputs, this function itself cannot be represented linearly, or on a straight line.
If all patterns in data only constituted straight lines, we would probably not be discussing neural networks at all. Techniques such as Support Vector Machines (SVMs) or even linear regression already excel at this task:

Modeling sunny and rainy days with temperature, for example, will produce a non-linear curve. In effect, this just means that we cannot possibly separate our decision boundary using a straight line. In other words, on some days, it may rain despite high temperatures, and on other days, it may remain sunny despite low temperatures.
This is because temperature is not linearly related to the weather. The weather outcome on any given day is very likely to be a complex function, involving interactive variables such as wind speed, air pressure, and more. So, on any given day, a temperature of 13 degrees could mean a sunny day in Berlin, Germany, but a rainy day in London, UK:

There are some cases, of course, where a phenomenon may be linearly represented. In physics, for example, the relationship between the mass of an object and its volume can be linearly defined, as shown in the following screenshots:

This is an example of a non-linear function:


Here, m is the slope of the line, x is any point (an input or an x-value) on the line, and b is where the line crosses the y axis.
Unfortunately, linearity is often not guaranteed with real-world data, as we model observations using multiple features, each of which could have a varied and disproportional contribution towards determining our output classes. In fact, our world is extremely non-linear, and hence, to capture this non-linearity in our perceptron model, we need it to incorporate non-linear functions that are capable of representing such phenomena. By doing so, we increase the capacity of our neuron to model more complex patterns that actually exist in the real world, and draw decision boundaries that would not be possible, were we to only use linear functions. These types of functions, used to model non-linear relationships in our data, are known as activation functions.