Train and test split for multivariate and multi-step?

Question

Using this tutorial, which deals only with multivariate and one step, I have been trying to write a multivariate and multistep code. Since the code is too long I'm attaching it here (you'll find the dataset as well in the same repository as well).

The aim of the code is to predict the Pollution values of the next 6 hours.

After preprocessing the data and normalizing it, I have split the data and reshaped it as follows:

# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24 * 1 # 5 years data 1 year training
train = values[:n_train_hours, :]
test = values[n_train_hours:n_train_hours+50, :]

# split into input and outputs
n_obs = n_hours * n_features
train_X, train_y = train[:, :n_obs], train[:, :n_out]  # the problem is here 
test_X, test_y = test[:, :n_obs], test[:,:n_out] # and here

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
train_y = train_y
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

I have a problem with the train_X, train_y, test_X, and test_y, I'm not sure if I have to use n_out and n_obs instead of n_out*n_features and n_obs or another alternative since in both cases by using inverse_transform on inv_y, I get values that are different to the real ones on the dataset:

print('predicted: ',inv_yhat)
print('real:' ,inv_y)

predicted:  [  3.04286     7.406884    6.121824  ... -10.307352   -7.0151763
  -3.4667058]
real: [36.       30.999998 19.999998 ... 24.999998 48.       48.999996]

Please let me know if you need more details.

FisheyJay FisheyJay · Accepted Answer · 2020-06-22T20:39:48

using your exact code on my Macbook Pro in PyCharm and Python 3.7.3, I get the following results with same dataset:

Using TensorFlow backend.
                     pollution  dew  temp   press wnd_dir  wnd_spd  snow  rain
date                                                                          
2010-01-02 00:00:00      129.0  -16  -4.0  1020.0      SE     1.79     0     0
2010-01-02 01:00:00      148.0  -15  -4.0  1020.0      SE     2.68     0     0
2010-01-02 02:00:00      159.0  -11  -5.0  1021.0      SE     3.57     0     0
(8760, 30, 8) (8760, 6) (35005, 30, 8) (35005, 6)
2020-06-22 16:32:21.057171: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-22 16:32:21.072767: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd8396b94b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-22 16:32:21.072780: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch 1/3
122/122 - 3s - loss: 0.2229 - val_loss: 0.1685
Epoch 2/3
122/122 - 2s - loss: 0.1521 - val_loss: 0.1490
Epoch 3/3
122/122 - 2s - loss: 0.1433 - val_loss: 0.1415
(35005, 6)
(35005, 6)
Test RMSE: 191.383
predicted:  [57.93556  30.241518 41.339252 ... 39.774303 45.738094 49.2507  ]
real: [36.       30.999998 19.999998 ... 24.999998 48.       48.999996]

The predicted values I have look a little bit more realistic. Gee, I hope this isn't a processor architecture type issue.

adding additional info pertaining to MinMaxScaler feature_range

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

you can write results out to file like this:

results = pd.DataFrame(inv_yhat)
results.index = X_test.index
results.columns = ["prediction"]
results.to_csv("prediction_results.csv")

Train and test split for multivariate and multi-step?

1 Answers