1
votes

I'm trying to fit a parabola into a simple generated dataset using linear regression, however no matter what I do the curve I get straight out of the model turns out to be an incomprehensible mess.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

#xtrain, ytrain datasets have been generated earlier

model = LinearRegression(fit_intercept = True)
model.fit(np.hstack([xtrain, xtrain**2]), ytrain)  
xfit = np.linspace(-3,3,20)  
yfit = model.predict(np.hstack([xtrain, xtrain**2]))
plt.plot(xfit, yfit)
plt.scatter(xtrain, ytrain, color="black")

This code outputs following graph:

Code output

However, when I manually generate the plot from the coefficients that the model produces by simply changing in the following line of code, I get exactly the result I want.

Manual output

yfit = model.coef_[0]*xfit + model.coef_[1]*xfit**2 + model.intercept_

This seems like a bit of a clunky way of going about things so I'd like to learn how to generate the curve properly. I think the issue must be the discrete nature of my data but I haven't been able to figure it out on my own.

1
there's a type.. it should be model.predict(np.hstack([xfit, xfit**2]))StupidWolf

1 Answers

1
votes

Here is your bug fixed:

yfit = model.predict(np.hstack([xfit, xfit**2]))

In your code you are plotting xfit values on X-axis while f(xtrain) on Y-axis.