0
votes

Im having trouble sorting my data into the correct format for RNN with Keras. I have a csv file with 22 columns, 1344 rows. My data is continuous variables recorded at 30min intervals, over a number of weeks.

i understand that keras requires input in the format (num samples, timesteps, nfeatures) So for my data i saw this as (1344,48,22) (as there are 48 readings in a 24hr period in my data).

the x data is in the shape (1344,22) when imported from csv.

here is my code:

model=Sequential()
model.add(LSTM(21, input_shape=(1344,22),kernel_initializer='normal',activation='relu',return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(19, activation='relu')) #hidden layer 2
model.add(Dropout(0.2))
model.add(Dense(8, activation='relu')) #output layer
model.compile(loss='mean_squared_error', optimizer=optimiser,metrics=['accuracy','mse'])

which resulted in the error Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (1344, 22)

I tried to make the x data into the correct data by adding a embedding layer. my code now reads:

model=Sequential()
model.add(Embedding(input_dim=22,input_length=1344,output_dim=48))
model.add(LSTM(21, input_shape=(1344,22), kernel_initializer='normal',activation='relu',return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(19, activation='relu')) #hidden layer 2
model.add(Dropout(0.2))
model.add(Dense(8, activation='relu')) #output layer
model.compile(loss='mean_squared_error', optimizer=optimiser,metrics=['accuracy','mse'])
history=model.fit(x,y, verbose=0,epochs=150, batch_size=70, validation_split=0.2)

resulting in the error: Error when checking input: expected embedding_1_input to have shape (1344,) but got array with shape (22,).

im not sure i have fully understood the embedding layer or the meanings of (num samples. timesteps, nfeatures). could someone explain the meanings of input_dim, input_length and output_dim with reference to my data? ive read many other posts on this issue and cant seem to fix the issue applying the problem to my data type!

many thanks for your help.

1

1 Answers

0
votes

You can directly feed the data to the LSTM without using an Embedding layer.

1344 rows => So, I assume each row of 22 columns is a reading taken at a time point.

For input_shape, there are three parts:

input_shape (1,48,22) => batch size = 1, time-steps = 48, input-feature-size = 22.

Batch size is optional. 'time-steps' is how many past time points you would like to use to make the predictions. In the example below, 48 means, the past 24 hours worth of data will be used for prediction. So, you have to reshape the 1344 rows of data into something like this:

1st sample = rows 1 - 48

2nd sample = rows 2 - 49 and so on.

model.add(LSTM(21, input_shape=(48,22),kernel_initializer='normal',activation='relu', return_sequences=True))

# Other layers remain the same as in your first code snippet

print(model.predict(np.zeros((1,48,22)))) # Feed dummy sample to network
[[0. 0. 0. 0. 0. 0. 0. 0.]]

def create_dataset(dataset, look_back):
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back):
        dataX.append(dataset[i:(i+look_back)]) # all 22 columns for X
        dataY.append(dataset[i + look_back, 0:8]) # first 8 columns for Y, just as an example
    return np.array(dataX), np.array(dataY)

csv_data = np.random.randn(1344,22) # simulate csv data
X, Y = create_dataset(csv_data, 48) 
print(X.shape, Y.shape) # (1296, 48, 22) (1296, 8)
model.fit(X, Y)

Simple example of cosine wave prediction - easy to play around with.The create_dataset function is from this link. https://github.com/sachinruk/PyData_Keras_Talk/blob/master/cosine_LSTM.ipynb

Regarding reshaping data: https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/