4
votes

I want to create a lstm model to classify signals.

Let's say I have 1000 files of signals. Each file contains a matrix of shape (500, 5) that means that in each file, I have 5 features (columns) and 500 rows.

         0          1          2         3        4
0        5        5.3         2.3       4.2      2.2

...     ...       ...        ...         ...      ...

499     2500      1.2         7.4        6.7       8.6

For each file, there is one output which is a boolean (True or False). the shape is (1,)

I created a database, data, with a shape (1000, 5, 500) and the target vector is of shape (1000, 1).

Then I split data (X_train, X_test, y_train, y_test).

Is it okay to give the matrix like this to the lstm model? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer.

The code with the lstm is like this:

input_shape=(X_train.shape[1], X_train.shape[2]) #(5,500), i.e timesteps and features
model = Sequential()
model.add(LSTM(20, return_sequences=True))
model.add(LSTM(20))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')

I changed the number of cells in a LSTM layer and the number of layers but the score is basically the same (0.19). Is it normal to have such a bad score in my case? Is there a better way to go ?

Thanks

1

1 Answers

1
votes

By transforming your data into (samples, 5, 500) you are giving the LSTM 5 timesteps and 500 features. From your data it seems you would like to process all 500 rows and 5 features of each column to make a prediction. The LSTM input is (samples, timesteps, features). So if your rows represent timesteps in which 5 measurements are taken, then you need to permute the last 2 dimensions and set input_shape=(500, 5) in the first LSTM layer.

Also since your output is Boolean, you get a more stable training by using activation='sigmoid' in your final dense layer and train with loss='binary_crossentropy for binary classification.