I am trying to code two neural networks. The architecture of the first network consists of an input layer, one hidden layer, and an output layer. The input layer is R^2 so it accepts two inputs (x1, x2), the hidden layer has two neurons, and the output layer has a single neuron. All the neurons use the rectified linear unit (ReLU) activation function. The only difference between the first and second neural network is that the second has four neurons in the hidden layer. Otherwise they are identical.
I finished the code for the first network and was able to run and plot results. I am mainly looking to get the neural network to learn how to separate two clusters in my data set. I generate 2000 points to form a single cluster, and then another 2000 for the next cluster. The output of the neural network will ideally find a separating plane (really multiple planes) to separate the two clusters. I have setup my plot to work when the error during the error from the testing phase is less then 0.05. I should also explain that I am trying to find the ideal learning rate and epoch for training so I have a few loops to iterate through different learning rates (alpha) and epochs.
My first network works fine, but when I add 2 neurons for some reason my network error and parameters (weights and bias) get all wonky. I can't get the 4 neuron network to get an error below 0.4. I think it has something to do with the error and weights. I have been running the network with print statements to see whats happening to the weights and noticed they don't update that well because the error during training gets stuck on 0 and so the weights never update, but I am not 100% sure that this always happens.
If anyone has clues as to why my weights and error are not updating properly I would greatly appreciate it. If you run the code you will see when you plot the two clusters the output of the neural network does not create a colored separation between the clusters. The code for the working two neuron architecture is the same but just remove the additional 2 neurons from the code.
Here is the code for the network:
import numpy as np
import random
import gc
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
nData = 2000 #2000 points used on each cluster for 4000 points total
nTrain = 1000 #Used for training loop and to create clusters
nEpoch = 1 #Initial epoch value
nTest = 2000 #Used for testing loop
#alpha = 0.001
#Initializing 2D array for x which will carry the x1 and x2 values
#Also creating the radius and theta values for the cluster data
std = 0.5
x = np.zeros((2*nData,2))
t = np.zeros((2*nData))
r = np.random.normal(0,std,2*nData);
theta = 2*np.pi*np.random.rand(2*nData);
#w11f and w12f are used to plot the value of weights w11 and w12 as they update
w11f = np.zeros(nEpoch*nTrain)
w12f = np.zeros(nEpoch*nTrain)
#Creating cluster 1 and target data
h = -6 + 12*np.random.rand(nData)
v = 5 + (h**2)/6
x[0:nData,0] = h + r[0:nData]*np.cos(theta[0:nData])
x[0:nData,1] = v + r[0:nData]*np.sin(theta[0:nData])
t[0:nData] = 0
#Creating cluster 2 and target data
h = -5 + 10*np.random.rand(nData)
v = 10 + (h**2)/4
x[nData:2*nData,0] = h + r[nData:2*nData]*np.cos(theta[nData:2*nData])
x[nData:2*nData,1] = v + r[nData:2*nData]*np.sin(theta[nData:2*nData])
t[nData:2*nData] = 1
#Normalization
x[:,0] = 1 + 0.1*x[:,0]
x[:,1] = 1 + 0.1*x[:,1]
#Parameter Initialization
w11 = 0.5 - np.random.rand();
w12 = 0.5 - np.random.rand();
w21 = 0.5 - np.random.rand();
w22 = 0.5 - np.random.rand();
w31 = 0.5 - np.random.rand();
w32 = 0.5 - np.random.rand();
w41 = 0.5 - np.random.rand();
w42 = 0.5 - np.random.rand();
b4 = 0.5 - np.random.rand();
b3 = 0.5 - np.random.rand();
b2 = 0.5 - np.random.rand();
b1 = 0.5 - np.random.rand();
ww1 = 0.5 - np.random.rand();
ww2 = 0.5 - np.random.rand();
ww3 = 0.5 - np.random.rand();
ww4 = 0.5 - np.random.rand();
bb = 0.5 - np.random.rand();
#Creating a list from 0 to 3999
a = range(0,2*nData)
#Creating a 3D array (tensor) to store all the error values at the end of each 50 iteration loop
er_List = np.zeros((14,50,6))
#Creating the final array to store the counter of successful error. These are errors under 0.05 in value
#the rows represent the alpha values from 0.001 to 0.05 and the columns represent each epoch from 1 to 6. This way you can view the 2D array and see which alpha and epoch give the most successes for the lowest error.
nSuccess_Array = np.zeros((14,6))
#Part B - Creating nested loops to train for multiple alpha and epoch value
#pairs
#Training
for l in range(0,14): #loop for alpha values
alpha = [0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02, 0.03, 0.04, 0.05]
nEpoch=1
for n in range(0,6): #loop for incrementing epoch values
nSuccess = 0
#Initialize these again so the size updates as the epoch changes
w11f = np.zeros(nEpoch*nTrain)
w12f = np.zeros(nEpoch*nTrain)
for j in range(0,50):
#Initialize the parameters again so they are random every 50 iterations (for each new epoch
value)
w11 = 0.5 - np.random.rand();
w12 = 0.5 - np.random.rand();
w21 = 0.5 - np.random.rand();
w22 = 0.5 - np.random.rand();
w31 = 0.5 - np.random.rand();
w32 = 0.5 - np.random.rand();
w41 = 0.5 - np.random.rand();
w42 = 0.5 - np.random.rand();
b4 = 0.5 - np.random.rand();
b3 = 0.5 - np.random.rand();
b2 = 0.5 - np.random.rand();
b1 = 0.5 - np.random.rand();
ww1 = 0.5 - np.random.rand();
ww2 = 0.5 - np.random.rand();
ww3 = 0.5 - np.random.rand();
ww4 = 0.5 - np.random.rand();
bb = 0.5 - np.random.rand();
sp = random.sample(a,nTrain + nTest)
p = 0
for epoch in range(0,nEpoch):
for i in range(0,nTrain):
#Neuron dot product
y1 = b1 + w11*x[sp[i],0] + w12*x[sp[i],1]
y2 = b2 + w21*x[sp[i],0] + w22*x[sp[i],1]
y3 = b3 + w31*x[sp[i],0] + w32*x[sp[i],1]
y4 = b4 + w41*x[sp[i],0] + w42*x[sp[i],1]
#Neuron activation function ReLU
dxx1 = y1 > 0
xx1 = y1*dxx1
dxx2 = y2 > 0
xx2 = y2*dxx2
dxx3 = y3 > 0
xx3 = y3*dxx3
dxx4 = y4 > 0
xx4 = y4*dxx4
#Output of neural network before activation function
yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
yy = yy > 0 #activation function
e = t[sp[i]] - yy #error calculation
#Updating parameters
ww1 = ww1 + alpha[l]*e*xx1
ww2 = ww2 + alpha[l]*e*xx2
ww3 = ww3 + alpha[l]*e*xx3
ww4 = ww4 + alpha[l]*e*xx4
bb = bb + alpha[l]*e
w11 = w11 + alpha[l]*e*ww1*dxx1*x[sp[i],0]
w12 = w12 + alpha[l]*e*ww1*dxx1*x[sp[i],1]
w21 = w21 + alpha[l]*e*ww2*dxx2*x[sp[i],0]
w22 = w22 + alpha[l]*e*ww2*dxx2*x[sp[i],1]
w31 = w31 + alpha[l]*e*ww3*dxx3*x[sp[i],0]
w32 = w32 + alpha[l]*e*ww3*dxx3*x[sp[i],1]
w41 = w41 + alpha[l]*e*ww4*dxx4*x[sp[i],0]
w42 = w42 + alpha[l]*e*ww4*dxx4*x[sp[i],1]
b1 = b1 + alpha[l]*e*ww1*dxx1
b2 = b2 + alpha[l]*e*ww2*dxx2
b3 = b3 + alpha[l]*e*ww3*dxx3
b4 = b4 + alpha[l]*e*ww4*dxx4
w11f[p] = w11
w12f[p] = w12
p = p + 1
er = 0
#Training
for k in range(nTrain,nTrain + nTest):
y1 = b1 + w11*x[sp[i],0] + w12*x[sp[i],1]
y2 = b2 + w21*x[sp[i],0] + w22*x[sp[i],1]
y3 = b3 + w31*x[sp[i],0] + w32*x[sp[i],1]
y4 = b4 + w41*x[sp[i],0] + w42*x[sp[i],1]
dxx1 = y1 > 0
xx1 = y1*dxx1
dxx2 = y2 > 0
xx2 = y2*dxx2
dxx3 = y3 > 0
xx3 = y3*dxx3
dxx4 = y4 > 0
xx4 = y4*dxx4
yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
yy = yy > 0
e = abs(t[sp[k]] - yy)
er = er + e #Accumulates error
er = er/nTest #Calculates average error
er_List[l,j,n] = er
if er_List[l,j,n] < 0.05:
nSuccess = nSuccess + 1
#Part C - Creating an Array that contains the success values of each
#alpha and epoch value pair
nSuccess_Array[l,n] = nSuccess #Array that contains the success
if nEpoch < 6:
nEpoch = nEpoch +1
print(er)
#Plotting
if er < 0.5:
plt.figure(1)
plt.scatter(x[0:nData,0],x[0:nData,1])
plt.scatter(x[nData:2*nData,0],x[nData:2*nData,1])
X = np.arange(0.25,1.75,0.02)
Y = np.arange(1.25,2.75,0.02)
X, Y = np.meshgrid(X,Y)
y1 = b1 + w11*X + w12*Y
y2 = b2 + w21*X + w22*Y
y3 = b3 + w31*X + w32*Y
y4 = b4 + w41*X + w42*Y
dxx1 = y1 > 0
xx1 = y1*dxx1
dxx2 = y2 > 0
xx2 = y2*dxx2
dxx3 = y3 > 0
xx3 = y3*dxx3
dxx4 = y4 > 0
xx4 = y4*dxx4
yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
Z = yy > 0
plt.scatter(X,Y,c=Z+1,alpha=0.3)
plt.figure(2)
f=np.arange(0,nEpoch*nTrain,1)
plt.plot(f,w11f)
plt.figure(3)
plt.plot(f,w12f)
plt.figure(4)
ax = plt.axes(projection='3d')
ax.scatter(x[0:nData,0],x[0:nData,1],0,s=30)
ax.scatter(x[nData:2*nData,0],x[nData:2*nData,1],1,s=30)
#Plotting the separating planes
X = np.arange(0.25,1.75,0.02)
Y = np.arange(1.25,2.75,0.02)
X, Y = np.meshgrid(X,Y)
y1 = b1 + w11*X + w12*Y
y2 = b2 + w21*X + w22*Y
y3 = b3 + w31*X + w32*Y
y4 = b4 + w41*X + w42*Y
dxx1 = y1 > 0
xx1 = y1*dxx1
dxx2 = y2 > 0
xx2 = y2*dxx2
dxx3 = y3 > 0
xx3 = y3*dxx3
dxx4 = y4 > 0
xx4 = y4*dxx4
yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
Z = yy > 0
ax.plot_surface(X,Y,Z,rstride=1, cstride=1,cmap='viridis',alpha=0.5)
plt.figure(5)
ax = plt.axes(projection='3d')
X = np.arange(0,5,0.02)
Y = np.arange(0,5,0.02)
X, Y = np.meshgrid(X,Y)
y1 = b1 + w11*X + w12*Y
y2 = b2 + w21*X + w22*Y
y3 = b3 + w31*X + w32*Y
y4 = b4 + w41*X + w42*Y
dxx1 = y1 > 0
xx1 = y1*dxx1
dxx2 = y2 > 0
xx2 = y2*dxx2
dxx3 = y3 > 0
xx3 = y3*dxx3
dxx4 = y4 > 0
xx4 = y4*dxx4
yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
ax.plot_surface(X, Y, yy, rstride=1, cstride=1,cmap='viridis', edgecolor='none')