Please see the following code for creating an LSTM network:
NumberofClasses=8
model = Sequential()
model.add(LSTM(256,dropout=0.2,input_shape=(32,
512),return_sequences=False))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(NumberofClasses, activation='softmax'))
print(model.summary())
sgd = SGD(lr=0.00005, decay = 1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=
['accuracy'])
callbacks = [EarlyStopping(monitor='val_loss', patience=10, verbose=1),
ModelCheckpoint('video_1_LSTM_1_1024.h5', monitor='val_loss',
save_best_only=True, verbose=1 ) ]
nb_epoch = 500
model.fit(train_data,train_labels,validation_data,validation_labels,batch_size,nb_epoch,callbacks,shuffle=False,verbose=1)
In the above code, I am creating an LSTM using Keras library of python, my data has a sample of 131 videos belonging to 8 different classes. I have set a frame sequence of 32 frames for each video(thus each video has 32 frames and hence 131 video generated 4192 frames) I extracted features from a pre-trained model of VGG16 for each of these frames. I created train dataset through adding Each of these extracted features into an array. it generated a final array of 4192,512 dimensions. the corresponding train_labels holds one hot encoding for each of the eight classes and have a dimension of 4192,8. However since LSTM needs the input shape of (samples, timestamp, and feature) formate, and each of video in my case has a sequence of 32 frames, so I reshaped the trained data in to [131,32,512] and applied the same reshaping to the train_labels. However, when I run this I got the following error:
ValueError: Error when checking target: expected dense_2 to have 2 dimensions, but got
array with shape (131, 32, 8)
If I do not reshape the train_labels and leave it like (4192,8) the error is :
ValueError: Input arrays should have the same number of samples as target
arrays. Found 131 input samples and 4192 target samples.
please note that since each of my videos has 32 frame sequence length that i applied this reshaping [131,32,512] to train data and (131, 32, 8) to corresponding labels. I would appreciate any comment or advice to solve this problem