0
votes

I am new to TensorFlow and machine learning. I'm trying to create a sentiment analysis NN with tensorflow.

I've set up my architecture and I'm attempting to train the model but I encounter the error

ValueError: Cannot feed value of shape (32, 2) for Tensor 'InputData/X:0', which has shape '(?, 100)'

I think the error has to do with my input "layer net = tflearn.input_data([None, 100])". The tutorial I was following suggested this input shape, batch size as None and the length to be 100 since that's the sequence length. Hence (None, 100), to my understanding this is the dimensions the training data being fed into the network needs to be, correct?

Could someone explain why the suggested input shape of batch size was None and also why Tensor flow is attempting to feed the network put shaped (32,2). Where is the sequence length of 2 coming from?

If my understanding anywhere in this explanation is wrong feel free to correct me, I'm still trying to learn the theory as well.

Thanks in advance

In [1]:

import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb

In [2]:

#Loading IMDB dataset
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)
trainX, trainY = train
testX, testY = test

In [3]:

#Data sequence padding 
trainX = pad_sequences(trainX, maxlen=100, value=0.)  
testX = pad_sequences(testX, maxlen=100, value=0.)
#converting labels of each review to vectors
trainY = to_categorical(trainY, nb_classes=2)
trainX = to_categorical(testY, nb_classes=2)


In [4]:

#network building 
net = tflearn.input_data([None, 100])
net = tflearn.embedding(net, input_dim=10000, output_dim=128)
net = tflearn.lstm(net, 128, dropout = 0.8)
net = tflearn.fully_connected(net, 2, activation='softmax') 
net = tflearn.regression(net, optimizer = 'adam', learning_rate=0.0001,
                         loss='categorical_crossentropy')


WARNING:tensorflow:From C:\Users\Nason\Anaconda33\envs\TensorFlow1.8CPU\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [5]:

#Training
model = tflearn.DNN(net, tensorboard_verbose=0)   #train using tensorflow Deep nueral net
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,    #fit launches training process for training and validation data, metric displays data as its training.
          batch_size=32)


---------------------------------
Run id: U7NONK
Log directory: /tmp/tflearn_logs/
INFO:tensorflow:Summary name Accuracy/ (raw) is illegal; using Accuracy/__raw_ instead.
---------------------------------
Training samples: 2500
Validation samples: 2500
--

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-7ffd0a8836f9> in <module>()
      2 model = tflearn.DNN(net, tensorboard_verbose=0)   #train using tensorflow Deep nueral net
      3 model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,    #fit launches training process for training and validation data, metric displays data as its training.
----> 4           batch_size=32)

~\Anaconda33\envs\TensorFlow1.8CPU\lib\site-packages\tflearn\models\dnn.py in fit(self, X_inputs, Y_targets, n_epoch, validation_set, show_metric, batch_size, shuffle, snapshot_epoch, snapshot_step, excl_trainops, validation_batch_size, run_id, callbacks)
    214                          excl_trainops=excl_trainops,
    215                          run_id=run_id,
--> 216                          callbacks=callbacks)
    217 
    218     def fit_batch(self, X_inputs, Y_targets):

~\Anaconda33\envs\TensorFlow1.8CPU\lib\site-packages\tflearn\helpers\trainer.py in fit(self, feed_dicts, n_epoch, val_feed_dicts, show_metric, snapshot_step, snapshot_epoch, shuffle_all, dprep_dict, daug_dict, excl_trainops, run_id, callbacks)
    337                                                        (bool(self.best_checkpoint_path) | snapshot_epoch),
    338                                                        snapshot_step,
--> 339                                                        show_metric)
    340 
    341                             # Update training state

~\Anaconda33\envs\TensorFlow1.8CPU\lib\site-packages\tflearn\helpers\trainer.py in _train(self, training_step, snapshot_epoch, snapshot_step, show_metric)
    816         tflearn.is_training(True, session=self.session)
    817         _, train_summ_str = self.session.run([self.train, self.summ_op],
--> 818                                              feed_batch)
    819 
    820         # Retrieve loss value from summary string

~\Anaconda33\envs\TensorFlow1.8CPU\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    898     try:
    899       result = self._run(None, fetches, feed_dict, options_ptr,
--> 900                          run_metadata_ptr)
    901       if run_metadata:
    902         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\Anaconda33\envs\TensorFlow1.8CPU\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1109                              'which has shape %r' %
   1110                              (np_val.shape, subfeed_t.name,
-> 1111                               str(subfeed_t.get_shape())))
   1112           if not self.graph.is_feedable(subfeed_t):
   1113             raise ValueError('Tensor %s may not be fed.' % subfeed_t)

ValueError: Cannot feed value of shape (32, 2) for Tensor 'InputData/X:0', which has shape '(?, 100)'
3

3 Answers

0
votes

The error comes from trainX = to_categorical(testY, nb_classes=2). This needs to be changed to testY = to_categorical(testY, nb_classes=2)

Also, setting the batch size to None means it should expect the batch to be any size. In your case you set the batch size to 32 so you could also set the input shape to [32, 100]

0
votes
tflearn.input_data([None, 100])

You are expecting the input to be a tensor of any number of instances with 100 features.

trainX = pad_sequences(trainX, maxlen=100, value=0.)  
testX = pad_sequences(testX, maxlen=100, value=0.)
#converting labels of each review to vectors
trainY = to_categorical(trainY, nb_classes=2)
trainX = to_categorical(testY, nb_classes=2) #HEREEEEEE

This is problematic in your code. You are resetting trainX to have another shape instead of the padded one. I think you meant:

testY = to_categorical(testY, nb_classes=2)

If this still doesn't work.

I suspect that you are missing a reshaping of the data. You are indeed using padding but on the whole trainX,trainY,etc. Try padding each "row" separately. Then each instance will have a length of '100' as you are expecting.

Before doing that, print the shapes of the tensors (like print(trainX.shape) ) to see if you are really preprocessing the data (I also suggest doing two scripts, one with whole loading, preprocessing, reshaping and padding and the other with tensorFlow logic)

0
votes

You left the number of categories for trainX as 2, but your model is expecting 100.

EDIT:

I just notice that you are setting the trainX with the testY in this bit of code:

trainX = to_categorical(testY, nb_classes=2)

Whereas it should be:

trainX = to_categorical(trainX, nb_classes=100)

Therefore you need to change your code to:

#Data sequence padding
trainX = pad_sequences(trainX, maxlen=100, value=0.)  
testX = pad_sequences(testX, maxlen=100, value=0.)
#converting labels of each review to vectors
trainY = to_categorical(trainY, nb_classes=2)
#change the number of Classes
trainX = to_categorical(trainX, nb_classes=100) #CHANGE HERE!!

With this change you should be fine. I just tested and it works!

It is ok to set the shape of the input with [None, 100] it gives you more flexibility to change the batch size later if you need!