How to Build Feedforward Neural Networks: A Step-by-Step Guide

In this post, we will try to build our own deep learning library in Python and begin to write a simple feedforward neural network.

The focus will be on the forward pass. The content covering the training of our network will occur in the next post.

Below you will find information on how feedforward neural networks take in input and produce an output from it.

Firstly, What is a Neural Network?

Neural networks are a machine learning technique that is loosely inspired by the model of the brain.

As with all machine learning techniques, it learns from a dataset that contains inputs and their corresponding outputs.

Neural networks consist of layers. Each layer is connected to the next layer with weights and biases.

These weights and biases are used by the network to calculate the output it will give. They are adjusted when the network trains, so that the network produces the optimal output based on the data it trained on.

This diagram shows a 3 layer neural network. The lines connecting the nodes are used to represent the weights and biases of the network.

How do They Work? The MATH!

Each layer has its own weights and bias.

The weights and biases initially start as a matrix of random values.

A basic feedforward neural network consists of only linear layers.

Linear layers produce their output with the following formula:

x @ w + b


Where...
x is the input to the layer
w is the weights of the layer
b is the bias of the layer
(@ means matrix multiply)

The output of each layer is fed as an input into the next layer.

Note

If you are unaware of how matrix multiplication works, this website here explains it nicely.

This is all we will cover for now - next post we will get into the mathematics behind how these weights and biases get corrected in training!

Activation Functions

Layers of neural nets are composed of nodes.

Activation functions are applied to layers to determine which nodes should "fire"/"activate". This "firing" is observed in the human brain too, hence why it was introduced in neural networks since they are loosely based on the model of the brain.

Activation functions also allow the network to model non-linear data. Without activation functions, the neural network would just be a linear regression model, meaning it would not be able to model most real-world data.

There are multiple activation functions, but here are the most common ones used...

Sigmoid

The sigmoid function maps inputs to a value between 0 and 1, as shown in the graph below.

(x is the input vector)

Relu (Rectified Linear)

The Relu function only allows positive values of the input vector to pass through. Negative values are mapped to 0.

For example:

[[-5, 10]  
 [15, -10] --> relu --> [[0, 10]
                         [15, 0]]

Tanh

Tanh is similar to Sigmoid, except it maps inputs to values between -1 and 1.

Softmax

Softmax takes in an input and maps it out as a probability distribution (meaning all the values in the output sum to 1).

(z is the input vector, K is the length of the input vector)

Writing the Code

We will need NumPy for our matrix operations...

import numpy as np

First, let's write our linear layer class:

class Linear:
    def __init__(self, units):
        #units specify how many nodes are in the layer
        self.units = units
        self.initialized = False

    def __call__(self, x):
        #initialize weights and biases if layer hasn't been called before
        if not self.initialized:
            self.w = np.random.randn(self.input.shape[-1], self.units)
            self.b = np.random.randn(self.units)
            self.initialized = True

        return self.input @ self.w + self.b

Example usage...

x = np.array([[0, 1]])
layer = Linear(5)
print (layer(x))

# => [[-2.63399933 -1.18289984  0.32129587  0.2903246  -0.2602642 ]]

Now let's write all our activation function classes, following the formulae given previously:

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

class Relu:
    def __call__(self, x):
        return np.maximum(0, x)   

class Softmax:
    def __call__(self, x):
        return np.exp(x) / np.sum(np.exp(x))   

class Tanh:
    def __call__(self, x):
        return np.tanh(x)

Now let's write a "Model" class, which will act as a container for all our layers / the actual neural network class.

class Model:
    def __init__(self, layers):
        self.layers = layers
    
    def __call__(self, x):
        output = x
        for layer in self.layers:
            output = layer(x)
            
        return output

Save all of those classes into "layer.py" (or any name you wish).

Now we can build a simple neural network, with our tiny library so far:

import layers
import numpy as np

#inputs array
x = np.array([[0, 1], [0, 0], [1, 1], [0, 1]])

#network uses all the layers we have designed so far
net = layers.Model([
    layers.Linear(32),
    layers.Sigmoid(),
    layers.Linear(16),
    layers.Softmax(),
    layers.Linear(8),
    layers.Tanh(),
    layers.Linear(4),
    layers.Relu(),
])

print (net(x))

Output:
[[0.         3.87770361 0.17602662 0.        ]
 [0.         3.85640582 0.22373699 0.        ]
 [0.         3.77290517 0.2469388  0.        ]
 [0.         3.87770361 0.17602662 0.        ]]

Also published here.