How to pass this mock one-hot-encoded data through keras LSTM layer?

Question

As (I think) I understand in Keras, LSTM layers expect input data to have 3-dimensions: (batch_size, timesteps, input_dim).

However, I'm really struggling to understand what these values actually correspond to when it comes to my data. I'm hoping that if someone can explain how I might go about inputting the following mock data (with a similar structure to my actual dataset) to an LSTM layer, I might then understand how I can achieve this with my real dataset.

So the example data is sequences of categorical data encoded using one-hot-vector encoding. For example, the first 3 samples look like this:

[ [0, 0, 0, 1], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0] ]

[ [0, 1, 0, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 0] ]

[ [0, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1] ]

i.e. The sequences are of length 5, with 4 categorical options that can be in a position within the sequence. Let's also say I have 3000 sequences. It's a binary classification problem.

So I believe this would make the shape of my dataset (3000, 5, 4)?

The model I want to use looks like this:

model = keras.Sequential([
    keras.layers.LSTM(units=3, batch_input_shape=(???)),
    keras.layers.Dense(128, activation='tanh'),
    keras.layers.Dense(64, activation='tanh'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20)

This ignores any training/testing split for now, so just assume I'm training with the entire dataset. The part I'm struggling with is input_shape.

I want each element within the sequence to be a timestep. I've tried lots of different shapes and got lots of different errors. I'm guessing I actually need to reshape x_train instead of just adjusting input_shape. The problem is I have no idea what shape it actually needs to be.

I think I understand the theory behind LSTM, it's just the practicalities of the dimensionality requirements that I'm struggling to get my head around.

Any help or advice would be massively appreciated. Thank you.

EDIT - As suggested by @scign. Here is an example of an error I'm getting using the following code for the mock dataset:

x_train = [[0, 0, 0, 1], [0, 0, 1, 0], [
    1, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0]], [[0, 1, 0, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 0]], [[0, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]

y_train = [1, 0, 1]

model = keras.Sequential([
    keras.layers.LSTM(units=3, batch_input_shape=(1, 5, 4)),
    keras.layers.Dense(128, activation='tanh'),
    keras.layers.Dense(64, activation='tanh'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20)

Error - ValueError: Error when checking input: expected lstm_input to have 3 dimensions, but got array with shape (5, 4)

the batch_input_shape should be given as (batch_size, timesteps, data_dim) keras.io/getting-started/sequential-model-guide for some examples — alec_djinn
Yeah I know it's a 2d array @AKSW, my question is how do I make it a 3d array suitable for input to the LSTM layer? But obviously not just any 3D array, it needs to be the right size in each of the 3 dimensions, and that's the bit I'm struggling to get my head around. — SuperHanz98

scign scign · Accepted Answer · 2020-02-03T13:50:42

As (I think) I understand in Keras, LSTM layers expect input data to have 3-dimensions: (batch_size, timesteps, input_dim).

Correct.

i.e. The sequences are of length 5, with 4 categorical options that can be in a position within the sequence. Let's also say I have 3000 sequences. It's a binary classification problem.

So I believe this would make the shape of my dataset (3000, 5, 4)?

Correct.

Are you limited to tensorflow 1.x? Version 2.x has been out for some time and keras has been fully integrated with tf2 so unless you have some restrictions, you may want to consider using tf2.

Edit: Looking at your training data:

You need to add an extra set of square brackets around your data
Your data needs to be in a single numpy array, rather than a list of lists
Your data elements need to be float not integer

Additionally, you can use the input_dim parameter and specify only the number of features, rather than using batch_input_shape.

The following works for me.

# make tensorflow a bit quieter
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import tensorflow as tf
import numpy as np

x_train = np.array([
    [[0, 0, 0, 1], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0]],
    [[0, 1, 0, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 0]],
    [[0, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]
], dtype=float)
y_train = np.array([1, 0, 1], dtype=float)

features = x_train.shape[2]

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(units=3, input_dim=features))
model.add(tf.keras.layers.Dense(128, activation='tanh'))
model.add(tf.keras.layers.Dense(64, activation='tanh'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='adam',
            loss='binary_crossentropy',
            metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20)

print(model.predict(x_train))

Output

    >>>python so60040420.py
    Train on 3 samples
    Epoch 1/20
    WARNING:tensorflow:From C:\...\lib\site-packages\tensorflow_core\python\ops\nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    WARNING:tensorflow:Entity .initialize_variables at 0x0000019E75189598> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
    3/3 [==============================] - 1s 464ms/sample - loss: 0.7105 - accuracy: 0.0000e+00
    Epoch 2/20
    3/3 [==============================] - 0s 1ms/sample - loss: 0.6842 - accuracy: 0.6667
    Epoch 3/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.6591 - accuracy: 0.6667
    Epoch 4/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.6350 - accuracy: 0.6667
    Epoch 5/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.6119 - accuracy: 0.6667
    Epoch 6/20
    3/3 [==============================] - 0s 1ms/sample - loss: 0.5897 - accuracy: 0.6667
    Epoch 7/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.5684 - accuracy: 0.6667
    Epoch 8/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.5479 - accuracy: 0.6667
    Epoch 9/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.5282 - accuracy: 0.6667
    Epoch 10/20
    3/3 [==============================] - 0s 1ms/sample - loss: 0.5092 - accuracy: 0.6667
    Epoch 11/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.4909 - accuracy: 0.6667
    Epoch 12/20
    3/3 [==============================] - 0s 1ms/sample - loss: 0.4733 - accuracy: 0.6667
    Epoch 13/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.4564 - accuracy: 0.6667
    Epoch 14/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.4402 - accuracy: 0.6667
    Epoch 15/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.4246 - accuracy: 0.6667
    Epoch 16/20
    3/3 [==============================] - 0s 1ms/sample - loss: 0.4096 - accuracy: 0.6667
    Epoch 17/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.3951 - accuracy: 0.6667
    Epoch 18/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.3809 - accuracy: 0.6667
    Epoch 19/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.3670 - accuracy: 1.0000
    Epoch 20/20
    3/3 [==============================] - 0s 2ms/sample - loss: 0.3531 - accuracy: 1.0000
    [[0.8538592 ]
    [0.48295668]
    [0.8184752 ]]

How to pass this mock one-hot-encoded data through keras LSTM layer?

1 Answers