3
votes

I want to feed spectrograms --corresponding to uttered digits-- to a model with a Conv1D as first layer. I then use a RNN layer to classify which word is uttered. These spectrograms have different sequence/time length, but the same feature number of course.

In Keras's Conv1D doc:

When using this layer as the first layer in a model, provide an input_shape argument [..] (None, 128) for variable-length sequences with 128 features per step.

So it seems like it handles it. No need for padding/resizing.

I'm used to preparing same-shape data with numpy (ex: numpy.vstack), but now I have various-shape I can't figure out how to do it! And I only found example where people had same-shaped data.

Or maybe it is not possible with numpy, and I have to use something else?

Thank you!

1

1 Answers

1
votes

It's possible, but you have to make sure that the sequences, which are batched together have the same length, that's why the most people simply pad all their sequences. If you use masking it does also ignore the masked values, so there is no difference.

model.add(LSTM(32, return_sequences=True, input_shape=(None, 5)))
model.add(LSTM(8, return_sequences=True))
model.add(TimeDistributed(Dense(2, activation='sigmoid')))

print(model.summary(90))

model.compile(loss='categorical_crossentropy',
              optimizer='adam')

def train_generator():
    while True:
        sequence_length = np.random.randint(10, 100)
        x_train = np.random.random((1000, sequence_length, 5))
        # y_train will depend on past 5 timesteps of x
        y_train = x_train[:, :, 0]
        for i in range(1, 5):
            y_train[:, i:] += x_train[:, :-i, i]
        y_train = to_categorical(y_train > 2,5)
        yield x_train, y_train

model.fit_generator(train_generator(), steps_per_epoch=30, epochs=10, verbose=1)