Can shuffling my training data lead to bad model performance?

Question

I am writing a neural network in C++ to approximate the xSin(x) function using a single hidden layer with 5 hidden neurons. The hidden neurons use tanh activation and the output layer uses Linear activation. I used 30 training examples for 10,000 epochs.

Until I shuffled my data, this is what I got: (RED: Predicted data, GREEN: Actual Data), also the MSE was near 0

But when I shuffle the indices of the training examples and verify that my shuffling does shuffle, I get terrible results:

,

and the Error vs Epoch as:

What could possibly go wrong? Can shuffling be responsible for this?

Here is the simple code for reference

//Shuffle Function 
void shuffle(int *array, size_t n)
{
    if (n > 1) //If no. of training examples > 1
    {
        size_t i;
        for (i = 0; i < n - 1; i++)
        {
            size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
            int t = array[j];
            array[j] = array[i];
            array[i] = t;
        }
    }
}


int main(int argc, const char * argv[])
{
    //Some other actions

    ///FOR INDEX SHUFFLING
    int trainingSetOrder[numTrainingSets];
    for(int j=0; j<numTrainingSets; ++j)
        trainingSetOrder[j] = j;


    ///TRAINING
    //std::cout<<"start train\n";
    vector<double> performance, epo; ///STORE MSE, EPOCH
    for (int n=0; n < epoch; n++)
    {

        shuffle(trainingSetOrder,numTrainingSets);
         for (int i=0; i<numTrainingSets; i++)
        {
            int x = trainingSetOrder[i];
            //cout<<" "<<"("<<training_inputs[x][0]<<","<<training_outputs[x][0] <<")";

            /// Forward pass
            for (int j=0; j<numHiddenNodes; j++)
            {
                double activation=hiddenLayerBias[j];
                //std::cout<<"Training Set :"<<x<<"\n";
                 for (int k=0; k<numInputs; k++) {
                    activation+=training_inputs[x][k]*hiddenWeights[k][j];
                }
                hiddenLayer[j] = tanh(activation);
            }

            for (int j=0; j<numOutputs; j++) {
                double activation=outputLayerBias[j];
                for (int k=0; k<numHiddenNodes; k++)
                {
                    activation+=hiddenLayer[k]*outputWeights[k][j];
                }
                outputLayer[j] = lin(activation);
            }



           /// Backprop
           ///   For V
            double deltaOutput[numOutputs];
            for (int j=0; j<numOutputs; j++) {
                double errorOutput = (training_outputs[i][j]-outputLayer[j]);
                deltaOutput[j] = errorOutput*dlin(outputLayer[j]);
            }

            ///   For W
           //Some Code

            ///Updation
            ///   For V and b
            ///Some Code

            ///   For W and c
            for (int j=0; j<numHiddenNodes; j++) {
                //c
                hiddenLayerBias[j] += deltaHidden[j]*lr;
                //W
                for(int k=0; k<numInputs; k++) {
                  hiddenWeights[k][j]+=training_inputs[i][k]*deltaHidden[j]*lr;
                }
            }
        }
      }


    return 0;
}

Your link to the code does not work for me. Please take the good habit of including the code in question directly in your post inside the proper code tags instead of linking to somewhere else that may not be always valid. — Fareanor
Sorry for the inconvenience. The code altogether would be long and cumbersome for this question, as I fail to focus on a particular snippet. Hence, I added a link. I'll correct it. — Pe Dro
I understand, no problem :) AFAIK, shuffling the training dataset order is not supposed to make the prediction fail. I guess you did a careless mistake when you shuffled or handled the shuffled data. But without the code, it cannot be anything else than a guess. — Fareanor

Maxim Egorushkin Maxim Egorushkin · Accepted Answer · 2019-11-29T15:14:26

Your model doesn't seem to be randomly initialized because init_weight function

double init_weight() { return (2*rand()/RAND_MAX -1); }

Almost always returns -1 because it does integer division. Such an initialization probably makes the model very hard or impossible to train.

Fix:

double init_weight() { return 2. * rand() / RAND_MAX - 1; }

That 2. above has type of double, which triggers promotion of other integer terms involved in binary operators with it to double.

Xavier initialization is a good method that speeds up training.

Can shuffling my training data lead to bad model performance?

2 Answers