I'm trying to fit a parabola into a simple generated dataset using linear regression, however no matter what I do the curve I get straight out of the model turns out to be an incomprehensible mess.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
#xtrain, ytrain datasets have been generated earlier
model = LinearRegression(fit_intercept = True)
model.fit(np.hstack([xtrain, xtrain**2]), ytrain)
xfit = np.linspace(-3,3,20)
yfit = model.predict(np.hstack([xtrain, xtrain**2]))
plt.plot(xfit, yfit)
plt.scatter(xtrain, ytrain, color="black")
This code outputs following graph:
However, when I manually generate the plot from the coefficients that the model produces by simply changing in the following line of code, I get exactly the result I want.
yfit = model.coef_[0]*xfit + model.coef_[1]*xfit**2 + model.intercept_
This seems like a bit of a clunky way of going about things so I'd like to learn how to generate the curve properly. I think the issue must be the discrete nature of my data but I haven't been able to figure it out on my own.
model.predict(np.hstack([xfit, xfit**2]))
– StupidWolf