
I am trying to identify 3 (classes) mental states based on EEG connectome data. The shape of the data is 99x1x34x34x50x130 (originally graph data, but now represented as a matrix), with respectably represent [subjects, channel, height, width, freq, time series]. For the sake of this study, can only input a 1x34x34 image of the connectome data. From previous studies, it was found that the alpha band (8-1 hz) had given the most information, thus the dataset was narrowed down to 99x1x34x34x4x130. The testing set accuracy on pervious machine learning techniques such as SVMs reached a testing accuracy of ~75%. Hence, by goal is to achieve a greater accuracy given the same data (1x34x34). Since my data is very limited 1-66 for training and 66-99 for testing (fixed ratios and have a 1/3 class distribution), I thought of splitting the data along the time series axis (6th axis) and then averaging the data to a shape of 1x34x34 (from ex. 1x34x34x4x10, 10 is the random sample of time series). This gave me ~1500 samples for training, and 33 for testing (testing is fixed, the class distributions are 1/3).


  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))    
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (drop1): Dropout(p=0.25, inplace=False)
  (fc1): Linear(in_features=9248, out_features=128, bias=True)
  (drop2): Dropout(p=0.5, inplace=False)
  (fc2): Linear(in_features=128, out_features=3, bias=True)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 5e-06
    weight_decay: 0.0001

Results: The training set can achieve an accuracy of 100% with enough iteration, but at the cost of the testing set accuracy. After around 20-50 epochs of testing, the model starts to overfit to the training set and the test set accuracy starts to decrease (same with loss).

What I have tried: I have tried tuning the hyperparameters: lr=.001-000001, weight decay=0.0001-0.00001. Training to 1000 epochs (useless bc overfitting in less than 100 epochs). I have also tried increasing/decreasing the model complexity with adding adding addition fc layers and varying amount of channels in CNN layers form 8-64. I have also tried adding more CNN layers and the model did a bit worse averaging around an accuracy of ~45% on the test set. I tried manually scheduling the learning rate every 10 epochs, the results were the same. Weight decay didn’t seem to effect the results much, changed it from 0.1-0.000001.

From previous testing, I have a model that achieves 60% acc on both the testing and the training set. However, when I try to retrain it, the acc instantly goes down to ~40 on both sets (training and testing), which makes no sense. I have tried altering the learning rate from 0.01 to 0.00000001, and also tried weight decay for this.

From training the model and the graphs, it seems like the model dosn’t know what it’s doing for the first 5-10 epochs and then starts to learn rapidly to around ~50%-60% acc on both sets. This is where the model starts to overfit, form there the model’s acc increases to 100% on the training set, and the acc for the testing set goes down to 33%, which is equivalent to guessing.

Any tips?


The model’s outputs for the test set are very very close to each other.

0.33960407972335815, 0.311821848154068, 0.34857410192489624

The average standard deviation for the whole test set between predictions for each image are (softmax):


However, the average std for the training set is .22 so...

F1 Scores:

Micro Average: 0.6060606060606061
Macro Average: 0.5810185185185186
Weighted Average: 0.5810185185185186
Scores for each class: 0.6875 0.5 0.55555556

It seems that the model labels all samples to one class. Did you use this data augmentation for SVM test too?mirzanahal
No, the SVM training set consisted of 66 samples and the testing set was comprised of 33 samplesAditya Kendre
Could you include the F1-score and the AUC plot of the model as this is a multiclass classification problem these would be better measures to evaluate the models performance.yudhiesh
How does the current SVM algorithm handle data preparation? Does it also average over the freq and time axes? It seems like you'd throw away a lot of information if you simply took the mean. Maybe looking at generating features like min, max, trend per freq band would help accuracy but still enable you to reduce the dimentionality. As an alternative you could look into using a recursive layer such as LSTM to encode the time dimension.quizzical_panini
I think the problem you're facing is that your dataset is on the small side after data prep for a "deep" model. I also don't think you're going to find any pretrained models that are remotely similar to this use case. You could create your own pretrained model on a similar task e.g. create a model that predicts the next frame (34x34x4) then use transfer learning to build a classifier with the same weights.quizzical_panini

1 Answers


I have some suggestions, what I would try, maybe you've already done it:

  • increase the probability of dropout, that could decrease overfitting,
  • I did not see or I missed it but if you don't do it, shuffle all the samples,
  • there is not so much data, did you thought about using other NN to generate more data of the classes which are having the least score? I am not sure if it is the case here but even randomly rotating, scaling the images can produce more training examples,
  • another approach you can take, if you haven't done it already, use transfer learning using another popular CNN net and see how it is doing the job, then you can have some comparison, whether it is something wrong with your architecture or it's lack of examples :) I know these are just suggestions but maybe, if you haven't try some of them, they will bring you closer to the solution. Good luck!