I've built a linear regression model to examine a relationship between two variables (chemical_1
and chemical_2
) from this dataset.
According to the results, intercept = 16.83488364225717
.
I've just started discovering the math basics of data science and my current understanding of an intercept is that it's the value where the regression line crosses the y-axis
(and x=0). So now I'm confused with the resulting plot built with Seaborn.
Why does it show the regression line crossing y-axis
between 10 and 12, not at the actual value of an intercept (16.83488364225717) and x=0? What should I do to fix that?
Here is my code:
from scipy import stats
X = df['chemical_1']
Y = df['chemical_2']
slope, intercept, r_value, p_value, slope_std_error = stats.linregress(X,Y)
print ("slope = " + str(slope))
print ("intercept = " + str(intercept))
print ("r_squared = " + str(r_value**2))
print ("r_value = " + str(r_value))
print ("p_value = " +str(p_value))
slope = -0.9345759557752411
intercept = 16.83488364225717
r_squared = 0.04205938806347038
r_value = -0.20508385617466426
p_value = 0.00784469031490164
predict_y = slope * X + intercept
fig, ax = plt.subplots()
sns.set(color_codes=True)
sns.set(rc={'figure.figsize':(10, 10)})
ax = sns.regplot(x=X, y=Y, line_kws={'label':'$y=%3.7s*x+%3.7s$'%(slope, intercept)});
sns.regplot(x=X, y=Y, fit_reg=False, ax=ax);
sns.regplot(x=X, y=predict_y,scatter=False, ax=ax);
ax.set_ylabel('chemical_2')
ax.legend()
plt.show()
UPD: when I'm using the solution proposed by Simon - extending the limits of the axes, the intercept is still not shown and the plot looks like this:
When I use set_ylim(0,20), the data on the plot looks squeezed. Actually any axis parameters that I set (other than defaults) result in the data and the confidence interval on the plot look squeezed.
x-axis
starts at 0 and the intercept crossesy-axis
at 16.83488364225717? – samba