0
votes

I just started using Python scikit-learn package to do linear regression. I am confused with the dimension of data set it required. For example, I want to regress X on Y using the following code

from sklearn import linear_model
x=[0,1,2]
y=[0,1,2]
regr = linear_model.LinearRegression()
regr.fit (x,y)
print('Coefficients: \n', regr.coef_)

System returned with error : tuple index out of range. According the scikit-learn website, effective arrays should be like

x=[[0,0],[1,1],[2,2]]
y=[0,1,2]

(http://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares)

from sklearn import linear_model
x=[[0,0],[1,1],[2,2]]
y=[0,1,2]
regr = linear_model.LinearRegression()
regr.fit (x,y)
print('Coefficients: \n', regr.coef_)

so it means that package can not regress X[i] on Y[i] for two single numbers? it must be an array on a number? like [0,0] in X to 0in Y?

Thanks in advance.

2

2 Answers

1
votes

You can. Simply reshape your data to be x = [[0], [1], [2]].

In this case , every point in your data will have a single feature - single number.

0
votes

Scikit requires your x to be a 2-dimensional array. It need not be a numpy array. You can always use a simple python list.

In case if you have your x as a 1-dimensional array like you just mentioned in your question, you can simply do the following:

x = [[value] for value in [0,1,2]]

This will store a 2D array of your 1D array in x i.e. every individual value of your list is stored as an array.