0
votes

I am new to Keras and RNN I need to build a Classifier Model using LSTM RNN in Keras for a Dataset that contain a train set of shape (1795575, 6) and labels array of shape (1795575, 1).The labels is 11 class (from 0 to 10) The test set of shape (575643, 6) and Labels array of shape (575643, 1.Again, the labels is 11 (from 0 to 10)

How can I shape the following Keras Model to satisfy my Dataset.What Values should I put for ?

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.optimizers import SGD
import numpy as np
data_dim = ?
timesteps = ?
num_classes = ?
batch_size = ?
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,batch_input_shape=
(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(?, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy',optimizer='sgd', 
metrics=['accuracy'])
model.fit(train_X_arr, train_y_arr,batch_size=batch_size, epochs=epochs, 
 shuffle=False,validation_data=(test_X_arr, test_y_arr))

I appreciate your help and Thanks in advance

1
It depends on your problem.. What is the meaning of the timestamps in your problem? What the sequences in your data represent?.. I am asking because according to the shapes you gave you should use another architecture (and not lstm..)Dvir Samuel
Thank Samuel for your reply. I need to implement a RNN networks to intrusions detectionKing Of Diamond
One more question :) -> you wrote that "a Dataset that contain a train set of shape (1795575, 6)", so does that mean that you have 1795575 examples in the train set and each example is vector with 6 elements (scalars) or does it mean that your training data contains unknown number of examples but each example is vector of 1795575 timestamps and each timestamp is a vector of 6 elements?Dvir Samuel
Thanks Samuel ,The data are organized in a CSV file as follows: There are 1795575 instances in rows and 6 features in columns (Train).While the Test is 575643 instances in rows and 6 features in columns. The labels are 11 classes ( from 0 to 10).King Of Diamond
I think the main question @DvirSameul is try get is what makes this a time series problem. 2d data suggests you wouldn’t need an RNN. What are the timesteps? Are say every ten rows a sequence? You need to clarify how your data has a sequence. A small sample may help to better understand this issueDJK

1 Answers

0
votes

What you would like to do is this:

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.optimizers import SGD
import numpy as np
data_dim = 1 # EACH TIMESTAMP IS SCALAR SO SHAPE=1
timesteps = 6 # EACH EXAMPLE CONTAINS 6 TIMESTAMPS
num_classes = 1 # EACH LABEL IS ONE NUMBER SO SHAPE=1
batch_size = 1 # TAKE SIZE THAT CAN DIVIDE THE NUMBER OF EXAMPLES IN THE TRAIN DATA. THE HIGHER THE BATCH SIZE THE BETTER!
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,batch_input_shape=
(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(1, activation='softmax')) # AT THE END YOU WANT ONE VALUE (LIKE THE LABELS) -> SO DENSE SHOULD OUTPUT 1 NODE
model.compile(loss='sparse_categorical_crossentropy',optimizer='sgd', 
metrics=['accuracy'])
model.fit(train_X_arr, train_y_arr,batch_size=batch_size, epochs=epochs, 
 shuffle=False,validation_data=(test_X_arr, test_y_arr))

and that's it.

EDIT: In addition, make sure that you reshape your train data to be: (1795575, 6,1) -> 1795575 examples, each has 6 timestamps, each timestamps is scalar.
You can achieve that easily by using np.expand_dims(train_data,-1).