0
votes

I trained a DNN model, get good training accuracy but bad evaluation accuracy.

def DNN_Metrix(shape, dropout):
    model = tf.keras.Sequential()
    print(shape)
    model.add(tf.keras.layers.Flatten(input_shape=shape))
    model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
    for i in range(0,2):
        model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
    model.add(tf.keras.layers.Dense(8,activation=tf.nn.tanh))
    model.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid))
    model.compile(loss='binary_crossentropy',
                      optimizer=tf.keras.optimizers.Adam(),
                      metrics=['accuracy'])
    return model

model_dnn = DNN_Metrix(shape=(28,20,1), dropout=0.1)
model_dnn.fit(
    train_dataset, 
    steps_per_epoch=1000, 
    epochs=10, 
    verbose=2
)

Here is my training process, and result:

Epoch 10/10 - 55s - loss: 0.4763 - acc: 0.7807

But when I evaluation with test dataset, I got:

result = model_dnn.evaluate(np.array(X_test), np.array(y_test), batch_size=len(X_test))

loss, accuracy = [0.9485417604446411, 0.3649936616420746] it's a binary classification, Positive label : Negetive label is about 0.37 : 0.63

I don't think it was result from overfiting, I have 700k instances when training, with shape of 28 * 20, and my DNN model is simple and have few parameters.

Here is my code when generating the test data and training data:

def parse_function(example_proto):
    dics = {
            'feature': tf.FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
            'label': tf.FixedLenFeature(shape=(2), dtype=tf.float32),
            'shape': tf.FixedLenFeature(shape=(2), dtype=tf.int64)
            }
    parsed_example = tf.parse_single_example(example_proto, dics)
    parsed_example['feature'] = tf.decode_raw(parsed_example['feature'], tf.float64)
    parsed_example['feature'] = tf.reshape(parsed_example['feature'], [28,20,1])
    label_t = tf.cast(parsed_example['label'], tf.int32)

    parsed_example['label'] = parsed_example['label'][1]

    return parsed_example['feature'], parsed_example['label']


def read_tfrecord(train_tfrecord):
    dataset = tf.data.TFRecordDataset(train_tfrecord)
    dataset = dataset.map(parse_function)
    dataset = dataset.shuffle(buffer_size=10000)
    dataset = dataset.repeat(100)
    dataset = dataset.batch(670)
    return dataset


def read_tfrecord_test(test_tfrecord):
    dataset = tf.data.TFRecordDataset(test_tfrecord)
    dataset = dataset.map(parse_function)
    return dataset


# tf_record_target = 'train_csv_temp_norm_vx.tfrecords'
train_files = 'train_baseline.tfrecords'
test_files = 'test_baseline.tfrecords'

train_dataset = read_tfrecord(train_files)
test_dataset  = read_tfrecord_test(test_files)


it_test_dts = test_dataset.make_one_shot_iterator()
it_train_dts = train_dataset.make_one_shot_iterator()


X_test = []
y_test = []

el = it_test_dts.get_next()

count = 1 
with tf.Session() as sess:
    while True:
        try:
            x_t, y_t = sess.run(el)
            X_test.append(x_t)
            y_test.append(y_t)
        except tf.errors.OutOfRangeError:
            break
2
I finally found the reason: I used different function to construct the tfrecord files for training and test data, when construct training data, I used df.fillna(0), but for test data, I used df.fillna(0.01). and after I reconstruct test data, all goes right. Thanks for all guys who helped me. - Richard Lee

2 Answers

1
votes

Judging from the fact that your data distribution in your test set is [37%-63%] and your final accuracy is 0.365, I would first check the labels predicted on the test set.

Most probably, all your predictions are of class 0, provided that class 0 amounts for 37% of your dataset. In this case, it means that your neural network is not able to learn anything on the training set, and you have a massive scenario of overfitting.

I recommend that you always use a validation set, so that at the end of each epoch, you would check to see if your neural network has learnt anything. In such a situation(like yours), you would see very fast the overfitting issue.

0
votes

Training accuracy doesn't mean much. A NN can fit any random set of inputs and outputs, even if they're unrelated. That's why you want to use validation data.

After training look at your loss curves, this will give you a better idea of where things are going wrong.

NN's default to just guessing the most popular class it's seen in training data for classification problems. This is usually what happens when you haven't setup your experiment correctly.

And since your dealing with binary classification you might want to look at things like StratifiedKFold which will provided you folds of train/test data were the sample % is persevered.