Perceptron in R not converging

Question

I am trying to understand Neural Networks better so I am trying to implement a simple perceptron from scratch in R. I know that this is very inefficient as there are many libraries that do this extemely well optimized but my goal is to understand the basics of neural networks better and work my way forward to more complex models.

I have created some artificial test data with a very simple linear decision boundary and split this into a training set and a test set. I then ran a logistic regression on the training data and checked the predictions from the test-set and got +99% accuray, which was to be expected given the simple nature of the data. I then tried implementing a perceptron with 2 inputs, 1 neuron, 1000 iterations, a learning rate of 0.1 and a sigmoid activation function.

I would expect to get very similar accuracy to the logistic regression model but my results are a lot worse (around 70% correct classifications in the training set). so I definitly did something wrong. The predictions only seem to get better after the first couple of iterations and then just go back and forth around a specific value (I tried with many different learning rates, no success). I'm attaching my script and I#m thankful for any advice! I think the problem lies in the calculation of the error or the weight adjustment but I can't put my finger on it...

### Reproducible Example for StackOverflow


#### Setup

# loading libraries
library(data.table)

#remove scientifc notation
options(scipen = 999)

# setting seed for random number generation
seed <- 123




#### Selfmade Test Data

# input points
x1 <- runif(10000,-100,100)
x2 <- runif(10000,-100,100)

# setting decision boundary to create output
output <- vector()
output[0.5*x1 + -1.2*x2 >= 50] <- 0
output[0.5*x1 + -1.2*x2 < 50] <- 1

# combining to dataframe
points <- cbind.data.frame(x1,x2,output)

# plotting all data points
plot(points$x1,points$x2, col = as.factor(points$output), main = "Self-created data", xlab = "x1",ylab = "x2")

# split into test and training sets
trainsize = 0.2
set.seed(seed)
train_rows <- sample(1:dim(points)[1], size = trainsize * dim(points)[1])
train <- points[train_rows,]
test <- points[-c(train_rows),]

# plotting training set only
plot(train$x1,train$x2, col = as.factor(train$output), main = "Self-created data (training set)", xlab = "x1",ylab = "x2")





#### Approaching the problem with logistic regression

# building model
train_logit <- glm(output ~ x1 + x2, data = train, family = "binomial", maxit = 10000)
summary(train_logit)

# testing performance in training set
table(round(train_logit$fitted.values) == train$output)

# testing performance of train_logit model in test set
table(test$output == round(predict(train_logit,test[,c(1,2)], type = "response")))

# We get 100% accuracy in the training set and near 100% accuracy in the test set









#### Approaching Problem with a Perceptron from scratch


# setting inputs, outputs and weights
inputs <- as.matrix(train[,c(1,2)])
output <- as.matrix(train[,3])
set.seed(123456)
weights <- as.matrix(runif(dim(inputs)[2],-1,1))


## Defining activation function + derivative

# defining sigmoid and it's derivative
sigmoid <- function(x) {1 / (1 + exp(-x))}
sig_dir <- function(x){sigmoid(x)*(1 - sigmoid(x))}


## Perceptron nitial Settings
bias <- 1

# number of iterations
iterations <- 1000

# setting learning rate
alpha <- 0.1



## Perceptron

# creating vectors for saving results per iteration
weights_list <- list()
weights_list[[1]] <- weights
errors_vec <- vector()
outputs_vec <- vector()

# saving results across iterations
weights_list_all <- list()
outputs_list <- list()
errors_list <- list()


# looping through the backpropagation algorithm "iteration" # times
for (j in 1:iterations) {

  # Loop for backpropagation with updating weights after every datapoint
  for (i in 1:dim(train)[1]) {

    # taking the weights from the last iteration of the outer loop as a starting point
    if (j > 1) {

      weights_list[[1]] <- weights

    }

    # Feed Forward (Should we really round this?!)
    output_pred <- round(sigmoid(sum(inputs[i,] * as.numeric(weights)) + bias))
    error <- output_pred - output[i]

    # Backpropagation (Do I need the sigmoid derivative AND a learning rate? Or should I only take one of them?)
    weight_adjustments <- inputs[i,] * (error * sig_dir(output_pred)) * alpha
    weights <- weights - weight_adjustments

    # saving progress for later plots
    weights_list[[i + 1]] <- weights
    errors_vec[i] <- error
    outputs_vec[[i]] <- output_pred

  }

  # saving results for each iteration
  weights_list_all[[j]] <- weights_list
  outputs_list[[j]] <- outputs_vec
  errors_list[[j]] <- errors_vec

}



#### Formatting Diagnostics for easier plotting

# implementing empty list to transform weightslist
WeightList <- list()

# collapsing individual weightslist into datafames
for (i in 1:iterations) {

  WeightList[[i]] <- t(data.table::rbindlist(weights_list_all[i]))

}

# pasting dataframes together
WeightFrame <- do.call(rbind.data.frame, WeightList)
colnames(WeightFrame) <- paste("w",1:dim(WeightFrame)[2], sep = "")

# pasting dataframes together
ErrorFrame <- do.call(rbind.data.frame, errors_list)
OutputFrame <- do.call(rbind.data.frame, outputs_list)




##### Plotting Results


# Development of Mean Error per iteration
plot(rowMeans(abs(ErrorFrame)),
     type = "l",
     xlab = "Sum of absolute Error terms")

# Development of Weights over time
plot(WeightFrame$w1, type = "l",xlim = c(1,dim(train)[1]), ylim = c(min(WeightFrame),max(WeightFrame)), ylab = "Weights", xlab = "Iterations")
lines(WeightFrame$w2, col = "green")
# lines(WeightFrame$w3, col = "blue")
# lines(WeightFrame$w4, col = "red")
# lines(WeightFrame$w5, col = "orange")
# lines(WeightFrame$w6, col = "cyan")
# lines(WeightFrame$w7, col = "magenta")

# Empty vector for number of correct categorizations per iteration
NoCorr <- vector()

# Computing percentage of correct predictions per iteration
colnames(OutputFrame) <- paste("V",1:dim(OutputFrame)[2], sep = "")
Output_mat <- as.matrix(OutputFrame)

for (i in 1:iterations) {

  NoCorr[i] <- sum(output == Output_mat[i,]) / nrow(train)

}

# plotting number of correct predictions per iteration
plot(NoCorr, type = "l")


# Performance in training set after last iteration
table(output,round(OutputFrame[iterations,]))

Robin van Hoorn Robin van Hoorn · Accepted Answer · 2020-04-28T15:23:38

First of all, welcome to the world of Neural Networks :).

Secondly, I want to recommend a great article to you, which I personally used to get a better understanding of backtracking and the whole NN learning stuff: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/. Might be a bit rough to get through sometimes, and for the general implementation I think it is much easier to follow pseudocode from a NN book. However, to understand what is going on this is article is very nice!

Thirdly, I will hopefully solve your problem :) You comment yourself already with whether you should really round that output_pred. Yes you should.. if you want to use that output_pred to make a prediction! However, if you want to use it for learning it is generally not good! The reason for this is that if you round it for learning, than an output which was rounded up from 0.51 to 1 with target output 1 will not learn anything as the output was the same as the target and thus is perfect. However, 0.99 would have been a lot better of a prediction than 0.51 and thus there is definitely something to learn!

I am not 100% sure if this solves all your problems (im not an R programmer) and gets your accuracy up to 99%, but it should solve some of it, and hopefully the intuition is also clear :)

Perceptron in R not converging

1 Answers