0
votes

This is my first time asking a question here (that's mean I'm really need help) and sorry for my bad English. I want to make a cnn-lstm layer for video classification in Keras but I have a problem on making my y_train. I will describe my problem after this. I have videos dataset (1 video has 10 frames) and I converted the videos to images. First I splited the dataset to xtrain, xtest, ytrain, and ytest (20% test, 80% train) and I did it.

X_train, X_test = img_data[:trainco], img_data[trainco:]
y_train, y_test = y[:trainco], y[trainco:]

X_train shape : (2280, 64, 64, 1) -> I have 2280 images, 64x64 height x widht, 1 channel

y_train shape : (2280, 26) -> 26 classes

And then I must reshape them before entering the cnn-lstm process. *note : I do the same thing with x_test and y_test

time_steps = 10 (because I have 10 frames per video)

X_train = X_train.reshape(int(X_train.shape[0] / time_steps), time_steps, X_train.shape[1], X_train.shape[2], X_train.shape[3])
y_train = y_train.reshape(int(y_train.shape[0] / time_steps), time_steps, y_train.shape[1])

X_train shape : (228, 10, 64, 64, 1), y_train shape : (228, 10, 26)

And then this is my model :

model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(2, 2), activation='relu', padding='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(256, return_sequences=False, input_shape=(64, 64)))
model.add(Dense(128))
model.add(Dense(64))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
checkpoint = ModelCheckpoint(fname, monitor='acc', verbose=1, save_best_only=True, mode='max', save_weights_only=True)
hist = model.fit(X_train, y_train, batch_size=num_batch, nb_epoch=num_epoch, verbose=1, validation_data=(X_test, y_test), callbacks=[checkpoint])

But I got an error that says

ValueError: Error when checking target: expected dense_3 to have 2 dimensions, but got array with shape (228, 10, 26)

Like it says expected to have 2 dimensions. I changed the code to

y_train = y_train.reshape(int(y_train.shape[0] / time_steps), y_train.shape[1])

And I got an error again that says

ValueError: cannot reshape array of size 59280 into shape (228,26)

And then I change the code again to

y_train = y_train.reshape(y_train.shape[0], y_train.shape[1])

And I still got an error

ValueError: Input arrays should have the same number of samples as target arrays. Found 228 input samples and 2280 target samples.

What should I do? I know the problem but I don't know how to solve it. Please help me.

1

1 Answers

1
votes

I recreated a slightly simplified version of your situation to reproduce the problem. Basically, it appears that the LSTM layer is only putting out one result for the entire sequence of time steps, thereby reducing the dimension from 3 to 2 in the output. If you run my program below, I've added the model.summary() which provides details of the architecture.

from keras import Sequential
from keras.layers import TimeDistributed, Dense, Conv2D, MaxPooling2D, Flatten, LSTM
import numpy as np

X_train = np.random.random((228, 10, 64, 64, 1))
y_train = np.random.randint(2, size=(228, 10, 26))
num_classes = 26

# Create the model
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(2, 2), activation='relu', padding='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten(),name='Flatten'))
model.add(LSTM(256, return_sequences=False, input_shape=(64, 64)))
model.add(Dense(128))
model.add(Dense(64))
model.add(Dense(num_classes, activation='softmax', name='FinalDense'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])

#
model.summary()
# hist = model.fit(X_train, y_train, epochs=1)

I believe you'll need to decide if you want to reduce the dimension of the y_train (target) data to be consistent with the model, or change the model. I hope this helps.