0
votes

From what I understand about Neural Networks, you have a number of hidden layers which each consist of X neurons. A neuron takes in a number of inputs and prospective weights, then using an activation function (sigmoid in my case) gives an output.

My task is to implement a network from scratch (only using numpy), with 2 hidden layers, sigmoid activation function and 500 neurons in each hidden layer. What I don't understand is, how can I implement the concept of neurons? According to this article, one neuron is when all inputs are weighted and are passed into the activation function. So do I feed in the same inputs, 500 times, with different weights each time (in the first layers, then again in the second)? I've also read this topic, where the following is said:

The neuron is nothing more than a set of inputs, a set of weights, and an activation function. The neuron translates these inputs into a single output, which can then be picked up as input for another layer of neurons later on.

So according to this, I indeed should weigh the inputs differently, 500 times, and then pass these forward to the next layer which will do the same. Am I understanding this correctly?

Here is the code I have written so far (it is very elementary but I did not want to proceed further before I clear this up), but have no idea how I would be implementing this:

class NeuralNetwork:
    def __init__(self, data, y, neurons, hidden):
        self.input = data
        self.y = y
        self.output = np.zeros(y.shape)
        self.layers = hidden
        self.neurons = neurons
        self.weights = self.generateWeightArray()
        print(self.weights)

    def generateWeightArray(self):
        weightarr = []
        #Last weight array is for inbetween hidden and output layer
        for i in range(self.layers + 1):
            weightarr.append(self.generateWeightMatrix())
        return np.asarray(weightarr)

    def generateWeightMatrix(self):
        return np.random.rand(self.input.shape[0], self.input.shape[1]-1)

    def sigmoid(self, x):
        return 1/(1+np.exp(-x))

    def dsigmoid(self, x):
        return self.sigmoid(x)*(1-self.sigmoid(x))

    def train(self):
        pass

    def run(self):
        #Since between each layer we have a matrix of weights, we can just keep going for the number of hidden 
        #layers we have
        for i in range(self.layers):
            out = np.dot(self.input.transpose(), self.weights[i]).transpose() #step1
            self.input = self.sigmoid(out) #step2

        print(self.input)

net = NeuralNetwork(np.array([[1,2,3,4],[3,5,1,2],[5,6,7,8]]), np.array([1,0,1]), 500, 2)
net.run()

EDIT

I have changed my code as follows

class NeuralNetwork:
def __init__(self, data, y, neurons, hidden):
    self.input = data
    self.y = y
    self.output = np.zeros(y.shape)
    self.layers = hidden
    self.neurons = neurons
    self.weights_to_hidden = np.random.rand(self.neurons, self.input.shape[1])
    self.weights = self.generateWeightArray()
    self.weights_to_output = np.random.rand(self.neurons,1)
    print(self.weights_to_output)

#Generate a matrix with h+1 weight matrices, where h is the number of hidden layers (+1 for output)
def generateWeightArray(self):
    weightarr = []
    #Last weight array is for inbetween hidden and output layer
    for i in range(self.layers):
        weightarr.append(self.generateWeightMatrix())
    return np.asarray(weightarr)

#Generate a matrix with n columns and m rows, where n is the number of features and m is the number of neurons
#in the layer
def generateWeightMatrix(self):
    return np.random.rand(self.neurons, self.neurons)

def sigmoid(self, x):
    return 1/(1+np.exp(-x))

def dsigmoid(self, x):
    return self.sigmoid(x)*(1-self.sigmoid(x))

def train(self):
    #2 hidden layers, then hidden -> output layer
    hidden_in = self.sigmoid(np.dot(self.input, self.weights_to_hidden.transpose()).transpose())
    print("Going into hidden layer:")
    print(hidden_in)
    for i in range(self.layers):
        in_hidden = self.sigmoid(np.dot(hidden_in.transpose(), self.weights[i]).transpose())
        print("After ",str(i+1), " hidden layer:")
        print(in_hidden)

    print("Output")
    out = self.sigmoid(np.dot(hidden_in.transpose(), self.weights_to_output).transpose())
    print(out)

net = NeuralNetwork(np.array([[1,2,3,4],[3,5,1,2],[5,6,7,8]]), np.array([1,0,1]), 5, 2)
net.train()

And the output after running is

[[0.89405222 0.89501672 0.89717842]]

I'm not sure if self.weights_to_output has the correct shape though because its a (n,1), so all features (in each record) will have the same weight, rather than having 3 weights for each row (?)

1

1 Answers

1
votes

The 'neurons' (usually called 'units' these days) are the activations in each layer. These are represented as a vector with one element for each unit. You will represent a layer's activations as a 1D array. So the short answer is that the neurons are elements in 1D arrays.

Let's look at the activation of one unit in layer 3 of a deep neural network:

a_3 = sigma(w_3 @ a_2 + b_3)

So:

  • a_3 will be a scalar — the activation for this unit;
  • sigma is the activation function (e.g. logistic function, tanh, or ReLU)
  • w_3 is the vector of weights for this layer (one element for each unit of layer 2)
  • a_2 is the vector of activations for the previous layer (one element for each unit of layer 2)
  • b_3 is the bias for this unit.

Note that w_3 and a_2 are both 1D arrays (vectors). The @ operator does matrix multiplication in Python 3.4+ and in this case it's going to perform the dot product.

Now, thanks to the magic of linear algebra, it turns out we don't need to loop over each unit to compute all the activations. If we let the weight vector w_3 be a matrix, W_3 (note the capital letter), then it can represent the weights for all units in layer 2, connecting to all units in layer 3. Then:

a_3 = sigma(W_3 @ a_2 + b_3)

Now a_3 will be a vector.

The tricky part is keeping track of all the shapes. For W @ a to work, the shapes must be compatible. For example, imagine we have a network with 2 units in layer 2 (so a_2 is a 1D array with 2 elements) and 3 units in layers 3 (so a_3 needs to have 3 elements). Now W_3 needs to be 3 × 2 and a_2 needs to be 2 × 1. Then the matrix multiply works. You can just use np.reshape() and np.transpose() to achieve the shapes you need.

I hope this helps... it's a lot of words.

Maybe this diagram (from this article) helps explain:

simepl neural net

The diagram doesn't say how many records there are. There are 3 features per data instance (i.e. we have an M × 3 input matrix). The input layer is 'just' another layer, you can think of the inputs as just another set of activations. You could think of x as a_0 (the 0-th layer).

This 3 Blue 1 Brown video is well worth watching too: https://www.youtube.com/watch?v=aircAruvnKk