1
votes

I am trying to use SK learn to perform linear regression on time series labeled data. My data format is data=(timestamp,value,label)

The labels that are assigned to my data are either 0 or 1. I tried to follow this example from SKLearn website

My questions:

1- Where are the labels of the training data in the example ? Are they in diabetes_y_train ?

2- What are the return values of the method predict() ? In my code, it returns an array of n_samples as predicted values in the range [0,1]. However, I expected to have return binary values of either 0 or 1 (no intermediate values)

2

2 Answers

0
votes

1 - diabetes_y_train are the labels for train

2 - You are using a regression function, so it is right to have continous variables. If you want to have binary output you are not solving a regression problem but a classification one you can then set a threshold to discretise the predictions or use one of the classifier offered by sklearn.

-1
votes

1 - Yes

2 - Predict calculates a floating point number, because the example is trying to predict a floating point value and not a binary value. So there is no yes/no answer, but a predictaed value, and to estimate the error, a difference is calculated and averaged in np.mean((regr.predict(diabetes_X_test) - diabetes_y_test) ** 2)