0
votes

I am trying to code two neural networks. The architecture of the first network consists of an input layer, one hidden layer, and an output layer. The input layer is R^2 so it accepts two inputs (x1, x2), the hidden layer has two neurons, and the output layer has a single neuron. All the neurons use the rectified linear unit (ReLU) activation function. The only difference between the first and second neural network is that the second has four neurons in the hidden layer. Otherwise they are identical.

I finished the code for the first network and was able to run and plot results. I am mainly looking to get the neural network to learn how to separate two clusters in my data set. I generate 2000 points to form a single cluster, and then another 2000 for the next cluster. The output of the neural network will ideally find a separating plane (really multiple planes) to separate the two clusters. I have setup my plot to work when the error during the error from the testing phase is less then 0.05. I should also explain that I am trying to find the ideal learning rate and epoch for training so I have a few loops to iterate through different learning rates (alpha) and epochs.

My first network works fine, but when I add 2 neurons for some reason my network error and parameters (weights and bias) get all wonky. I can't get the 4 neuron network to get an error below 0.4. I think it has something to do with the error and weights. I have been running the network with print statements to see whats happening to the weights and noticed they don't update that well because the error during training gets stuck on 0 and so the weights never update, but I am not 100% sure that this always happens.

If anyone has clues as to why my weights and error are not updating properly I would greatly appreciate it. If you run the code you will see when you plot the two clusters the output of the neural network does not create a colored separation between the clusters. The code for the working two neuron architecture is the same but just remove the additional 2 neurons from the code.

Here is the code for the network:

import numpy as np
import random
import gc
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter


nData = 2000 #2000 points used on each cluster for 4000 points total
nTrain = 1000 #Used for training loop and to create clusters
nEpoch = 1 #Initial epoch value
nTest = 2000 #Used for testing loop
#alpha = 0.001

#Initializing 2D array for x which will carry the x1 and x2 values
#Also creating the radius and theta values for the cluster data
std = 0.5
x = np.zeros((2*nData,2))
t = np.zeros((2*nData))
r = np.random.normal(0,std,2*nData);
theta = 2*np.pi*np.random.rand(2*nData);

#w11f and w12f are used to plot the value of weights w11 and w12 as they update
w11f = np.zeros(nEpoch*nTrain)
w12f = np.zeros(nEpoch*nTrain)

#Creating cluster 1 and target data
h = -6 + 12*np.random.rand(nData)
v = 5 + (h**2)/6
x[0:nData,0] = h + r[0:nData]*np.cos(theta[0:nData])
x[0:nData,1] = v + r[0:nData]*np.sin(theta[0:nData])
t[0:nData] = 0

#Creating cluster 2 and target data
h = -5 + 10*np.random.rand(nData)
v = 10 + (h**2)/4
x[nData:2*nData,0] = h + r[nData:2*nData]*np.cos(theta[nData:2*nData])
x[nData:2*nData,1] = v + r[nData:2*nData]*np.sin(theta[nData:2*nData])
t[nData:2*nData] = 1

#Normalization
x[:,0] = 1 + 0.1*x[:,0]
x[:,1] = 1 + 0.1*x[:,1]

#Parameter Initialization
w11 = 0.5 - np.random.rand();
w12 = 0.5 - np.random.rand();
w21 = 0.5 - np.random.rand();
w22 = 0.5 - np.random.rand();
w31 = 0.5 - np.random.rand();
w32 = 0.5 - np.random.rand();
w41 = 0.5 - np.random.rand();
w42 = 0.5 - np.random.rand();
b4 = 0.5 - np.random.rand();
b3 = 0.5 - np.random.rand();
b2 = 0.5 - np.random.rand();
b1 = 0.5 - np.random.rand();
ww1 = 0.5 - np.random.rand();
ww2 = 0.5 - np.random.rand();
ww3 = 0.5 - np.random.rand();
ww4 = 0.5 - np.random.rand();
bb = 0.5 - np.random.rand();

#Creating a list from 0 to 3999
a = range(0,2*nData)
#Creating a 3D array (tensor) to store all the error values at the end of each 50 iteration loop
er_List = np.zeros((14,50,6))
#Creating the final array to store the counter of successful error. These are errors under 0.05 in value
#the rows represent the alpha values from 0.001 to 0.05 and the columns represent each epoch from 1 to 6. This way you can view the 2D array and see which alpha and epoch give the most successes for the lowest error.
nSuccess_Array = np.zeros((14,6))


#Part B - Creating nested loops to train for multiple alpha and epoch value
#pairs
#Training
for l in range(0,14): #loop for alpha values
    alpha = [0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02, 0.03, 0.04, 0.05]
    nEpoch=1
    for n in range(0,6): #loop for incrementing epoch values
        nSuccess = 0
        #Initialize these again so the size updates as the epoch changes
        w11f = np.zeros(nEpoch*nTrain)
        w12f = np.zeros(nEpoch*nTrain)
        for j in range(0,50):
            #Initialize the parameters again so they are random every 50 iterations (for each new epoch 
            value)
            w11 = 0.5 - np.random.rand();
            w12 = 0.5 - np.random.rand();
            w21 = 0.5 - np.random.rand();
            w22 = 0.5 - np.random.rand();
            w31 = 0.5 - np.random.rand();
            w32 = 0.5 - np.random.rand();
            w41 = 0.5 - np.random.rand();
            w42 = 0.5 - np.random.rand();
            b4 = 0.5 - np.random.rand();
            b3 = 0.5 - np.random.rand();
            b2 = 0.5 - np.random.rand();
            b1 = 0.5 - np.random.rand();
            ww1 = 0.5 - np.random.rand();
            ww2 = 0.5 - np.random.rand();
            ww3 = 0.5 - np.random.rand();
            ww4 = 0.5 - np.random.rand();
            bb = 0.5 - np.random.rand();
            
            sp = random.sample(a,nTrain + nTest)
            p = 0
            for epoch in range(0,nEpoch):
                for i in range(0,nTrain):
                    #Neuron dot product
                    y1 = b1 + w11*x[sp[i],0] + w12*x[sp[i],1]
                    y2 = b2 + w21*x[sp[i],0] + w22*x[sp[i],1]
                    y3 = b3 + w31*x[sp[i],0] + w32*x[sp[i],1]
                    y4 = b4 + w41*x[sp[i],0] + w42*x[sp[i],1]
                    #Neuron activation function ReLU
                    dxx1 = y1 > 0
                    xx1 = y1*dxx1
                    
                    dxx2 = y2 > 0
                    xx2 = y2*dxx2
                    
                    dxx3 = y3 > 0
                    xx3 = y3*dxx3
                    
                    dxx4 = y4 > 0
                    xx4 = y4*dxx4
                    #Output of neural network before activation function
                    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
                    yy = yy > 0 #activation function
                    e = t[sp[i]] - yy #error calculation
                    
                    #Updating parameters
                    ww1 = ww1 + alpha[l]*e*xx1
                    ww2 = ww2 + alpha[l]*e*xx2
                    ww3 = ww3 + alpha[l]*e*xx3
                    ww4 = ww4 + alpha[l]*e*xx4
                    
                    bb = bb + alpha[l]*e
                    
                    w11 = w11 + alpha[l]*e*ww1*dxx1*x[sp[i],0]
                    w12 = w12 + alpha[l]*e*ww1*dxx1*x[sp[i],1]
                    
                    w21 = w21 + alpha[l]*e*ww2*dxx2*x[sp[i],0]
                    w22 = w22 + alpha[l]*e*ww2*dxx2*x[sp[i],1]
                    
                    w31 = w31 + alpha[l]*e*ww3*dxx3*x[sp[i],0]
                    w32 = w32 + alpha[l]*e*ww3*dxx3*x[sp[i],1]
                    
                    w41 = w41 + alpha[l]*e*ww4*dxx4*x[sp[i],0]
                    w42 = w42 + alpha[l]*e*ww4*dxx4*x[sp[i],1]
                    
                    b1 = b1 + alpha[l]*e*ww1*dxx1
                    b2 = b2 + alpha[l]*e*ww2*dxx2
                    b3 = b3 + alpha[l]*e*ww3*dxx3
                    b4 = b4 + alpha[l]*e*ww4*dxx4
                    
                    w11f[p] = w11
                    w12f[p] = w12
                    p = p + 1
            er = 0
#Training
            for k in range(nTrain,nTrain + nTest):
                y1 = b1 + w11*x[sp[i],0] + w12*x[sp[i],1]
                y2 = b2 + w21*x[sp[i],0] + w22*x[sp[i],1]
                y3 = b3 + w31*x[sp[i],0] + w32*x[sp[i],1]
                y4 = b4 + w41*x[sp[i],0] + w42*x[sp[i],1]
                
                dxx1 = y1 > 0
                xx1 = y1*dxx1
                
                dxx2 = y2 > 0
                xx2 = y2*dxx2
                
                dxx3 = y3 > 0
                xx3 = y3*dxx3
                
                dxx4 = y4 > 0
                xx4 = y4*dxx4
                
                yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
                yy = yy > 0
                e = abs(t[sp[k]] - yy)
                er = er + e #Accumulates error
            er = er/nTest #Calculates average error
            er_List[l,j,n] = er
            
            if er_List[l,j,n] < 0.05:
                nSuccess = nSuccess + 1
        #Part C - Creating an Array that contains the success values of each
        #alpha and epoch value pair
        nSuccess_Array[l,n] = nSuccess #Array that contains the success
        
        if nEpoch < 6:
            nEpoch = nEpoch +1


print(er)

#Plotting

if er < 0.5:
    plt.figure(1)
    plt.scatter(x[0:nData,0],x[0:nData,1])
    plt.scatter(x[nData:2*nData,0],x[nData:2*nData,1])
    
    X = np.arange(0.25,1.75,0.02)
    Y = np.arange(1.25,2.75,0.02)
    X, Y = np.meshgrid(X,Y)
    
    y1 = b1 + w11*X + w12*Y
    y2 = b2 + w21*X + w22*Y
    y3 = b3 + w31*X + w32*Y
    y4 = b4 + w41*X + w42*Y
    
    dxx1 = y1 > 0
    xx1 = y1*dxx1
    
    dxx2 = y2 > 0
    xx2 = y2*dxx2
    
    dxx3 = y3 > 0
    xx3 = y3*dxx3    
    
    dxx4 = y4 > 0
    xx4 = y4*dxx4
    
    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
    Z = yy > 0
    plt.scatter(X,Y,c=Z+1,alpha=0.3)

    plt.figure(2)
    f=np.arange(0,nEpoch*nTrain,1)
    plt.plot(f,w11f)
    
    plt.figure(3)
    plt.plot(f,w12f)
    
    plt.figure(4)
    ax = plt.axes(projection='3d')
    ax.scatter(x[0:nData,0],x[0:nData,1],0,s=30)
    ax.scatter(x[nData:2*nData,0],x[nData:2*nData,1],1,s=30)
    
    #Plotting the separating planes
    X = np.arange(0.25,1.75,0.02)
    Y = np.arange(1.25,2.75,0.02)
    X, Y = np.meshgrid(X,Y)
    
    y1 = b1 + w11*X + w12*Y
    y2 = b2 + w21*X + w22*Y
    y3 = b3 + w31*X + w32*Y
    y4 = b4 + w41*X + w42*Y
    
    dxx1 = y1 > 0
    xx1 = y1*dxx1
    
    dxx2 = y2 > 0
    xx2 = y2*dxx2
    
    dxx3 = y3 > 0
    xx3 = y3*dxx3    
    
    dxx4 = y4 > 0
    xx4 = y4*dxx4
    
    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
    Z = yy > 0
    ax.plot_surface(X,Y,Z,rstride=1, cstride=1,cmap='viridis',alpha=0.5)
    
    plt.figure(5)
    ax = plt.axes(projection='3d')
    X = np.arange(0,5,0.02)
    Y = np.arange(0,5,0.02)
    X, Y = np.meshgrid(X,Y)
    
    y1 = b1 + w11*X + w12*Y
    y2 = b2 + w21*X + w22*Y
    y3 = b3 + w31*X + w32*Y
    y4 = b4 + w41*X + w42*Y
    
    dxx1 = y1 > 0
    xx1 = y1*dxx1
    
    dxx2 = y2 > 0
    xx2 = y2*dxx2
    
    dxx3 = y3 > 0
    xx3 = y3*dxx3    
    
    dxx4 = y4 > 0
    xx4 = y4*dxx4
    
    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
    ax.plot_surface(X, Y, yy, rstride=1, cstride=1,cmap='viridis', edgecolor='none')

1

1 Answers

0
votes

Yes, you can do it using np.matmul (a@b) and calculating gradients manually. Check out Fastai v3 course, part 2 https://course.fast.ai/videos/?lesson=8. Jeremy Howard manipulates PyTorch tensors, but you can do it in NumPy as well.