I have data with 250 days, 72 features of training sample and one column of target variable. And want to predict for next 30 days for each of 21351 rows with 72 features. How will I reshape my data both input and output. It seems that I am having a little confusion and the library is giving me error about shape incompatiblity.
I was reshaping as:
trainX.reshape(1, len(trainX), trainX.shape[1])
trainY.reshape(1, len(trainX))
But gives me error:
ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 250 target samples.
Same error with:
trainX.reshape(1, len(trainX), trainX.shape[1])
trainY.reshape(len(trainX), )
and same error with:
trainX.reshape(1, len(trainX), trainX.shape[1])
trainY.reshape(len(trainX), 1)
Currently, trainX is reshaped as:
trainX.reshape(trainX.shape[0], 1, trainX.shape[1])
array([[[ 4.49027601e+00, -3.71848297e-01, -3.71848297e-01, ...,
1.06175239e+17, 1.24734085e+06, 5.16668131e+00]],
[[ 2.05921386e+00, -3.71848297e-01, -3.71848297e-01, ...,
8.44426594e+17, 1.39098642e+06, 4.01803817e+00]],
[[ 9.25515792e+00, -3.71848297e-01, -3.71848297e-01, ...,
4.08800518e+17, 1.24441013e+06, 3.69129399e+00]],
...,
[[ 3.80037999e+00, -3.71848297e-01, -3.71848297e-01, ...,
1.35414902e+18, 1.23823291e+06, 3.54601899e+00]],
[[ 3.73994822e+00, -3.71848297e-01, 8.40698741e+00, ...,
3.93863169e+17, 1.25693299e+06, 3.29993440e+00]],
[[ 3.56843035e+00, -3.71848297e-01, 1.53710656e+00, ...,
3.28306336e+17, 1.22667253e+06, 3.36569960e+00]]])
trainY reshaped as:
trainY.reshape(trainY.shape[0], )
array([[-0.7238661 ],
[-0.43128777],
[-0.31542821],
[-0.35185375],
...,
[-0.28319519],
[-0.28740503],
[-0.24209411],
[-0.3202021 ]])
and testX reshaped as:
testX.reshape(1, testX.shape[0], testX.shape[1])
array([[[ -3.71848297e-01, -3.71848297e-01, -3.71848297e-01, ...,
-3.71848297e-01, 2.73982042e+06, -3.71848297e-01],
[ -3.71848297e-01, -3.71848297e-01, -3.71848297e-01, ...,
-3.71848297e-01, 2.73982042e+06, -3.71848297e-01],
[ -3.71848297e-01, -3.71848297e-01, -3.71848297e-01, ...,
2.00988794e+18, 1.05992636e+06, 2.49920150e+01],
...,
[ -3.71848297e-01, -3.71848297e-01, -3.71848297e-01, ...,
-3.71848297e-01, -3.71848297e-01, -3.71848297e-01],
[ -3.71848297e-01, -3.71848297e-01, -3.71848297e-01, ...,
-3.71848297e-01, -3.71848297e-01, -3.71848297e-01],
[ -3.71848297e-01, -3.71848297e-01, -3.71848297e-01, ...,
-3.71848297e-01, -3.71848297e-01, -3.71848297e-01]]])
and error is:
ValueError: Error when checking : expected lstm_25_input to have shape (None, 1, 72) but got array with shape (1, 2895067, 72)
EDIT 1:
Here is code of my model:
trainX = trainX.reshape(trainX.shape[0], 1, trainX.shape[1])
trainY = trainY.reshape(trainY.shape[0], )
testX = testX.reshape(1, testX.shape[0], testX.shape[1])
model = Sequential()
model.add(LSTM(100, return_sequences=True, input_shape = trainX.shape[0], trainX.shape[2])))
model.add(LSTM(100))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(trainX, trainY, epochs=500, shuffle=False, verbose=1)
model.save('model_lstm.h5')
model = load_model('model_lstm.h5')
prediction = model.predict(testX, verbose=0)
ValueError Traceback (most recent call last) in () 43 model.compile(loss='mse', optimizer='adam') 44 ---> 45 model.fit(exog, endog, epochs=50, shuffle=False, verbose=1) 46 47 start_date = endog_end + timedelta(days = 1)
D:\AnacondaIDE\lib\site-packages\keras\models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs) 865 class_weight=class_weight, 866 sample_weight=sample_weight, --> 867 initial_epoch=initial_epoch) 868 869 def evaluate(self, x, y, batch_size=32, verbose=1,
D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs) 1520
class_weight=class_weight, 1521 check_batch_axis=False, -> 1522 batch_size=batch_size) 1523 # Prepare validation data. 1524 do_validation = FalseD:\AnacondaIDE\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size) 1376
self._feed_input_shapes, 1377
check_batch_axis=False, -> 1378 exception_prefix='input') 1379 y = _standardize_input_data(y, self._feed_output_names,
1380 output_shapes,D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 142 ' to have shape ' + str(shapes[i]) + 143 ' but got array with shape ' + --> 144 str(array.shape)) 145 return arrays 146
ValueError: Error when checking input: expected lstm_31_input to have shape (None, 250, 72) but got array with shape (21351, 1, 72)
EDIT 2:
After trying the updated solution from @Paddy, I got this error on calling predict():
ValueError Traceback (most recent call last) in () 1 model = load_model('model_lstm.h5') 2 ----> 3 prediction = model.predict(exog_test, verbose=0) 4 # for x in range(0, len(exog_test)):
D:\AnacondaIDE\lib\site-packages\keras\models.py in predict(self, x, batch_size, verbose) 911 if not self.built: 912 self.build() --> 913 return self.model.predict(x, batch_size=batch_size, verbose=verbose) 914 915 def predict_on_batch(self, x):
D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in predict(self, x, batch_size, verbose, steps) 1693 x = _standardize_input_data(x, self._feed_input_names, 1694 self._feed_input_shapes, -> 1695 check_batch_axis=False) 1696 if self.stateful: 1697 if x[0].shape[0] > batch_size and x[0].shape[0] % batch_size != 0:
D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 130 ' to have ' + str(len(shapes[i])) + 131 ' dimensions, but got array with shape ' + --> 132 str(array.shape)) 133 for j, (dim, ref_dim) in enumerate(zip(array.shape, shapes[i])): 134 if not j and not check_batch_axis:
ValueError: Error when checking : expected lstm_64_input to have 3 dimensions, but got array with shape (2895067, 72)