I want to feed spectrograms --corresponding to uttered digits-- to a model with a Conv1D as first layer. I then use a RNN layer to classify which word is uttered. These spectrograms have different sequence/time length, but the same feature number of course.
In Keras's Conv1D doc:
When using this layer as the first layer in a model, provide an input_shape argument [..] (None, 128) for variable-length sequences with 128 features per step.
So it seems like it handles it. No need for padding/resizing.
I'm used to preparing same-shape data with numpy (ex: numpy.vstack), but now I have various-shape I can't figure out how to do it! And I only found example where people had same-shaped data.
Or maybe it is not possible with numpy, and I have to use something else?
Thank you!