2
votes

Im'using CNN for short text classification (classify the production title). The code is from http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

The accuracy in trainning set, test set, validatino set is blow: enter image description here

and loss is different. The loss of validation is double than the loss of trainning set and test set.(I can't upload more than 2 pictures. sorry!)

The trainning set and test set are from web by crawler, then split them with 7:3.And the validation is from real app message and tagged by manual marking.

I have tried almost every hyper-parameters.

I have tried up-sampling, down-sampling, none-sampling.

batch size of 1024, 2048, 5096

dropout with 0.3, 0.5, 0.7

embedding_size with 30, 50, 75

But none of these work!

Now I use the param below:

batch size is 2048.

embedding_size is 30.

sentence_length is 15

filter_size is 3,4,5

dropout_prob is 0.5

l2_lambda is 0.005

At first I think it is overfit.But the model performs well in test set then trainning set.So I confused!

Is it the distribution between test set and trainning set is much different?

How can I increase the performance in validation set?

2
Are you sure you have your traces in that plot labeled correctly? Seems weird that your test accuracy is the highest. Almost definitely not right?chris
@chris_anderson Thx! I'm sure the trace in that plot labeled correctly.I don't know why, The validation accuracy is too lowNan.Zhang
Are you able to reproduce the accuracy in the original tutorial? What is the expected validation accuracy for this model?Yao Zhang
@YaoZhang Thx! The expected validation accuracy is 92% at least, like dev accuracy in the graph. And trained several hours later, the validation loss is 6 times larger than training loss. Is it something wrong with my validation set?Nan.Zhang

2 Answers

0
votes

I think this difference in loss comes from the fact that the validation dataset was collected from a different domain than the training/test sets:

The training set and test set are from web by crawler, then split them with 7:3.And the validation is from real app message and tagged by manual > marking

The model did not see any real app message data during training, so it unsurprisingly fails to deliver good results on the validation set. Traditionally, all three sets are generated from the same pool of data (say, with a 7-1-2 split). The validation set is used for hyperparameter tuning (batch_size, embedding_length, etc.), while the test set is held-out for an objective measure of model performance.

If you are concerned ultimately concerned with performance on the app data, I would split that dataset up 7-1-2 (train-validation-test) and augment the training data with web crawler data.

0
votes

I think the loss on validation set is high because the validation data comes from real app message data which may be more realistic than the training data you obtained from web crawling which may contain noise. Your learning rate is very high and batch size if pretty big than what's recommended. You can try learning rates in [0.1, 0.01, 0.001 and 0.0001], batch size in [32, 64], other hyperparameter values seems to be okay.

I would like to comment on the training, validation and test set. Training data is split into training and validation sets for training while test set is the data we don't touch and use only to test our model at last. I think your validation set is the 'test set' and your test set is the 'validation set'. That's how I would refer to them.