data dimension of scikit learn linear regression

Question

I just started using Python scikit-learn package to do linear regression. I am confused with the dimension of data set it required. For example, I want to regress X on Y using the following code

from sklearn import linear_model
x=[0,1,2]
y=[0,1,2]
regr = linear_model.LinearRegression()
regr.fit (x,y)
print('Coefficients: \n', regr.coef_)

System returned with error : tuple index out of range. According the scikit-learn website, effective arrays should be like

x=[[0,0],[1,1],[2,2]]
y=[0,1,2]

(http://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares)

from sklearn import linear_model
x=[[0,0],[1,1],[2,2]]
y=[0,1,2]
regr = linear_model.LinearRegression()
regr.fit (x,y)
print('Coefficients: \n', regr.coef_)

so it means that package can not regress X[i] on Y[i] for two single numbers? it must be an array on a number? like [0,0] in X to 0in Y?

Thanks in advance.

Farseer Farseer · Accepted Answer · 2016-04-05T07:02:11

You can. Simply reshape your data to be x = [[0], [1], [2]].

In this case , every point in your data will have a single feature - single number.

data dimension of scikit learn linear regression

2 Answers