2
votes

Below is the code to predict if it close up or down the next day (Up =1, down =0)

What I did was to create a dataframe and predict just using PriceChange (today close - yesterday close) to predict Next Day price change up or down (Next day Close - Today Close)

df['PriceChange'] = (df['Close'] > df['Close'].shift(1)).astype(int)
df['Closeupnextday'] = (df['Close'].shift(-1) > df['Close']).astype(int)

So the dataframe looks like this:

            PriceChange  Closeupnextday
    0             0               1
    1             1               1
    2             1               1
    3             1               1
    4             1               0
    5             0               0
    6             0               0
    7             0               1

It constantly gives me an accuracy of 1.000 To be fair it should be 50+% accuracy only. I believe something is wrong in the code below but I can't find it.

I should add that after epoch 20/500 it constantly gives me 1.000 accuracy

Any advice, please?

def load_data(stock, seq_len):
    amount_of_features = len(stock.columns)
    data = stock.as_matrix() #pd.DataFrame(stock)
    sequence_length = seq_len + 1
    result = []
    for index in range(len(data) - sequence_length):
        result.append(data[index: index + sequence_length])

    result = np.array(result)
    row = round(0.9 * result.shape[0])
    train = result[:int(row), :]
    x_train = train[:, :-1]
    y_train = train[:, -1][:,-1]
    x_test = result[int(row):, :-1]
    y_test = result[int(row):, -1][:,-1]

    x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features))
    x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features))  

    return [x_train, y_train, x_test, y_test]

def build_model(layers):
    model = Sequential()

    model.add(LSTM(
        input_dim=layers[0],
        output_dim=layers[1],
        return_sequences=True))
    model.add(Dropout(0.0))

    model.add(LSTM(
        layers[2],
        return_sequences=False))
    model.add(Dropout(0.0))

    model.add(Dense(
        output_dim=layers[2]))
    model.add(Activation("linear"))

    start = time.time()
    model.compile(loss="mse", optimizer="rmsprop",metrics=['accuracy'])
    print("Compilation Time : ", time.time() - start)
    return model

def build_model2(layers):
        d = 0.2
        model = Sequential()
        model.add(LSTM(128, input_shape=(layers[1], layers[0]), return_sequences=True))
        model.add(Dropout(d))
        model.add(LSTM(64, input_shape=(layers[1], layers[0]), return_sequences=False))
        model.add(Dropout(d))
        model.add(Dense(16, activation="relu", kernel_initializer="uniform"))        
        model.add(Dense(1, activation="relu", kernel_initializer="uniform"))
        model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
        return model


window = 5
X_train, y_train, X_test, y_test = load_data(df[::-1], window)
print("X_train", X_train.shape)
print("y_train", y_train.shape)
print("X_test", X_test.shape)
print("y_test", y_test.shape) 

# model = build_model([3,lag,1])
model = build_model2([len(df.columns),window,1]) #11 = Dataframe axis 1

model.fit(
    X_train,
    y_train,
    batch_size=512,
    epochs=500,
    validation_split=0.1,
    verbose=1)


trainScore = model.evaluate(X_train, y_train, verbose=0)
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore[0], math.sqrt(trainScore[0])))

testScore = model.evaluate(X_test, y_test, verbose=0)
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore[0], math.sqrt(testScore[0])))


# print(X_test[-1])
diff=[]
ratio=[]
p = model.predict(X_test)
for u in range(len(y_test)):
    pr = p[u][0]
    ratio.append((y_test[u]/pr)-1)
    diff.append(abs(y_test[u]- pr))
    #print(u, y_test[u], pr, (y_test[u]/pr)-1, abs(y_test[u]- pr))


print(p)
print(y_test)
1
Check weather you have accidentally included the target values as training data. Unless you have done that kind of a mistake, this is impossible I guess.Dimuth Tharaka Menikgama
My dataframe has no issues... I think its with the code but I cant figure it outJ Ng
Out of curiosity, why'd you opt for minimising the MSE and have your final layer be a ReLU for a classification?jonnybazookatone
This is adapted from another code thus its the defaults which I did not change, any suggestion what will be better?J Ng
Have you outputted a test set of data to maybe a csv to do some visual inspection? This always helps be determine if I have a good model or something is wrong. Also, as others elude too, mse is for regression problems, this is binary classification, your y's should be binary and your loss should be binary_crossentropyDJK

1 Answers

6
votes

(Since you don't clarify it, I assume here that you are talking about the test accuracy - the train accuracy can indeed be 1.0, depending on the details of your data & model.)

Well, such issues are usual when one messes up problems, losses, and metrics - see this answer of mine for a similar confusion when binary_crossentropy is used as loss in Keras for a multi-class classification problem.

Before trying any remedy, try predicting a couple of examples manually (i.e. with model.predict instead of model.evaluate); cannot do it myself since I don't have your data, but I bet the results you'll get will not conform to the perfect accuracy implied by your model.evaluate results.

To the heart of your issue: since you have a binary classification problem, you should definitely ask for loss='binary_crossentropy' in your model compilation, and not mse.

Cannot be sure on what exactly is the value of 1.0 you get from model.evaluate, but as I show in the answer linked above, what evaluation metric Keras returns for a model compiled with metrics=['accuracy'] is highly dependent on the respective entry for loss; and even if I was eventually able to figure out what was the issue in that question, I cannot even start imagining what exactly goes on here, where you request the accuracy (i.e. a classification metric) for a regression loss (mse)...