0
votes

I was reading the tutorial on Multivariate Time Series Forecasting with LSTMs in Keras https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/#comment-442845

I have followed through the entire tutorial and got stuck with a problem which is as follows-

In this tutorial, the train and test splits have 8 features viz., 'pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain' at step 't-1', while the output feature is 'pollution' at current step 't'. This is because, the framing of the dataset as a supervised learning problem is about predicting the 'pollution' at current hour/time step 't', given the pollution and weather measurements at the prior hour/time step 't-1'

After fitting the model to the training and testing data splits, what if I want to make predictions for a new dataset having 7 features since it does not have 'pollution' feature in it and I explicitly just want to predict for this one feature using the other 7 features.

Thanks for your help!

How do I handle such a situation? (while the remaining 7 features remain the same)

Edit- Assume that my dataset has the following 3 features while training/fitting the model- shop_number, item_number, number_of_units_sold

AFTER, I have trained the LSTM model, I get a dataset having the features- 'shop_number' AND 'item_number'. The dataset DOES NOT have 'number_of_units_sold'.

The 'input_shape' argument in 'LSTM' has 1 as time step and 3 as features while training. But while predicting, I have 1 time step but ONLY 2 features (as 'number_of_units_sold' is what I have to predict).

So how should I proceed?

1
I really like that blog. I don't think it will work for that. LSTM has the input shape = (1, 8) there.cemsazara
Then how do we make predictions for new data which are not part of dataset? Or are we limited to only making predictions for dataset that we have and not for any new foreign data?Arun

1 Answers

0
votes

If pollution is the last feature:

X = original_data[:,:,:-1]
Y = original_data[:,:,-1:]

If pollution is the first feature

X = original_data[:,:,1:]
Y = original_data[:,:,:1]

Else

i = index_of_pollution_feature
X = np.concatenate([original_data[:,:,:i], original_data[:,:,i+1:],axis=-1)
Y = original_data[:,:,i:i+1]

Make a model with return_sequences=True, stative=False and that's it. Don't use Flatten, Global poolings or anything that removes the steps dimension.


If you don't have any pollution data at all for training, then you can't.