TensorFlow: How to reshape image to fit LSTM layer?

Question

Mate, I need your help. I aim to alter my CNN model into RNN model. First, I load my image tensor:

labels = pickle.load(open("./labels.p", "rb"))
print("1")
print(labels)
print("2")
print(type(labels))
print(labels.shape)

Below is the output:

1
[[ 902.69956724  512.52732211   12.54330104]
 [1145.09702932  401.66131612   10.68199206]
 [ 461.22967364   56.20169521    8.11038109]
 [1280.78974907  665.87978551    7.54409773]
 [ 884.68210632  480.90786089   14.95856894]
 [1210.83786034  328.45081132   13.05409176]
 [1438.93924358  127.49080945   18.27168386]
 [1281.81584308   95.668115      7.62392635]
 [ 337.53575975  803.34024046   10.60111608]
 [1481.01890124  439.19175118   13.1070263 ]
 [ 215.65222765  687.71369479   18.74468127]
 [ 269.08710447  472.80278553   13.80920512]
 [ 948.32277166  348.69578896   10.7822973 ]
 [1474.65892421  163.96707338   18.45881795]
 [ 292.23754158  149.22307508   13.4992139 ]
 [ 136.9471735   278.05769081    9.12228086]
 [ 985.74851539  588.16808794   15.61136775]
 [ 820.15600586  413.44663686    7.33000344]
 [ 600.40973646  224.73483946   15.34583232]
 [ 989.19660841  561.69562362   13.54216196]
 [1000.15931758  663.87920314   11.15197741]
 [ 298.12969626  167.39119793    5.35186742]
 [ 638.61253698  295.83490355    9.3218228 ]
 [1223.90900603  888.44809641    6.684419  ]
 [1201.89311595  749.11837266   14.4013575 ]
 [ 937.86739849  652.09623989   16.44335687]
 [ 162.47729223  463.31552105   11.75272485]
 [1156.98025949  615.87893056   14.29241447]
 [1009.83765282  165.71673262   17.80982335]
 [ 705.35752704  819.24557476   14.42351445]
 [1037.50829126  159.56129246   13.29909752]
 [1028.36148773  260.52347256    6.64187257]
 [ 597.10138934  835.7720793    17.13412845]
 [ 768.91368905  836.91912098   11.3426373 ]
 [ 460.83668559  769.8292998     6.34995396]
 [ 994.04975288  253.57209883   19.5308339 ]
 [ 895.41331805  280.30494414   17.13954225]
 [ 511.1852535   139.21590627    7.56179778]
 [ 435.40744864  952.78539745   18.65784566]
 [1271.17155467  463.45098885   19.6584129 ]
 [ 149.09007975  397.47032936   15.11780791]
 [ 997.12240755  637.36302863    6.29461804]
 [ 332.88548391  658.82651389    6.12252151]
 [1042.11968461  375.38079434   12.28855765]
 [ 705.70382871  166.88958859   13.83288034]
 [1445.74603852  814.76523232   10.99454478]
 [ 257.574952    166.86709416   17.1052005 ]
 [1201.92362302  665.70493243   12.25584347]
 [1390.86751896  427.08727019   17.86179994]
 [1134.12356525  776.99614606   15.24974708]
 [1239.4344012   749.66481108   15.93442116]
 [1312.74137859  972.81737253   15.77331154]
 [ 996.00292311  432.82690562   16.54539616]
 [ 485.73734914  748.98481509   16.47033807]
 [ 225.05390585  801.77953762    6.25199535]
 [ 719.07339038  558.49059786    6.43030475]
 [ 288.87950534  294.97441026   13.02183236]
 [ 833.35657913  520.77988763   18.52122489]
 [  80.57338018  827.11187278    8.70100782]
 [  98.64616045  795.62446572   16.84380171]
 [1026.83986177  294.71529913    6.88628891]
 [1422.67546814  668.23639302    9.48665262]
 [1081.78577113  306.63881707   15.74209534]
 [ 327.69665086  350.56995892   11.99900411]
 [ 393.97635096  542.1259421    13.33891976]
 [ 369.07280668  710.05765754    6.47136363]
 [ 211.04899084  361.80913397   12.22177137]
 [1452.62867746  540.28274757   14.40846748]
 [1024.82270684  949.89106339   19.58472306]
 [ 855.66223478  352.35966078    5.60886187]
 [1233.07514824  690.26435986    5.80422432]
 [1481.676745    144.53939859   17.86730875]
 [ 116.90898102  200.03546528   12.06906204]
 [ 344.96838994  647.59487088   17.90802996]
 [ 198.49601919  561.5796024    11.62667088]
 [1473.50407692  823.61023589   13.1372917 ]
 [1453.83133544  892.69288604   19.5555176 ]
 [1223.66982193  700.47249608    6.99368812]
 [ 176.35675008  127.33238222    8.39737645]
 [ 173.14195439  106.58526168   10.20118347]
 [1270.79511303  389.32797878   19.63275348]
 [1307.776142    973.10077476   19.95083908]
 [ 470.56947482  850.03911329    9.50662325]
 [ 536.50755789  814.16742226   18.83865899]
 [ 340.29335666  363.27534976   13.08864079]
 [1036.40320277  490.11110882   13.24101537]
 [ 942.87072899  654.04747436    9.09688255]
 [ 743.42528406  191.94994057   11.63154405]
 [ 656.05725683  252.73054433    6.99323612]
 [ 828.21065045  786.47618832   16.03489754]
 [ 444.30867675  134.33513505   19.47964341]
 [ 634.46988235  382.26825509   15.48071222]
 [ 651.90231919  349.13707809   11.68690785]
 [ 798.32702908  764.88900343   10.66210542]
 [1217.4519886   721.94313243   14.85203151]
 [  58.28239437  700.73885755    7.68089927]
 [ 578.67205191  778.34479309   16.07327847]
 [ 276.52791372  605.71030874   17.40231962]
 [1484.96952853  487.82282634   19.12038912]
 [1467.77484467  241.84709196    9.10076222]]
2
<class 'numpy.ndarray'>
(100, 3)

Then, I load the image tensor:

fetched_image_list = pickle.load(open("./image_list/STFT_image_list.p", "rb"))
fetched_image_list = tf.convert_to_tensor(fetched_image_list)
print("3")
print(fetched_image_list.shape)
print(type(fetched_image_list))

Bellow is output of image tensors:

3
(100, 128, 128, 3)
<class 'tensorflow.python.framework.ops.EagerTensor'>

Nest, I assembled the dataset as below:

dataset = tf.data.Dataset.from_tensor_slices((fetched_image_list, labels))
dataset = dataset.batch(32)
print("4")
print(dataset)

Below display the output:

4
<BatchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float64, name=None), TensorSpec(shape=(None, 3), dtype=tf.float64, name=None))>

My CNN structure is:

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), strides=(2, 2),  dilation_rate=(1,1), input_shape=(128, 128, 3), activation='relu'),
    tf.keras.layers.Conv2D(64, (3, 3), strides=(2, 2), padding="same", dilation_rate=(1,1), activation='relu'),
    tf.keras.layers.Conv2D(128, (3, 3), strides=(2, 2),  dilation_rate=(1,1),activation='relu'),
    tf.keras.layers.Conv2D(128, (3, 3), strides=(2, 2),  dilation_rate=(1,1),activation='relu'),
    tf.keras.layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same", dilation_rate=(1, 1), activation='relu'),
    tf.keras.layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same", dilation_rate=(1, 1), activation='relu'),
    tf.keras.layers.Dropout(0.30),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(3)
])

For CNN, I know the input shape is as simple as (width, height, channels). In this case, the input_shaoe is (128, 128, 3).

However, when building LSTM, the configuration gets complexed.

This is the ConvLSTM2D layer I built.

model = tf.keras.Sequential([
    tf.keras.layers.ConvLSTM2D(filters= 32, kernel_size=3, input_shape=(128, 128, 3), return_sequences=True),
    tf.keras.layers.Dropout(0.30),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(label_size)
])

When I tun the code, the error popped out:

ValueError: Input 0 of layer "conv_lstm2d" is incompatible with the layer: expected ndim=5, found ndim=3. Full shape received: (128, 128, 3)

So,

How can I reshape this image to fit the LSTM layer?
How can I create a LSTM layer to implement the image classification task.
Is there anything I should know to make this job done?

Many Thanks.