2
votes

I've built a linear regression model to examine a relationship between two variables (chemical_1 and chemical_2) from this dataset. According to the results, intercept = 16.83488364225717.

I've just started discovering the math basics of data science and my current understanding of an intercept is that it's the value where the regression line crosses the y-axis (and x=0). So now I'm confused with the resulting plot built with Seaborn.

Why does it show the regression line crossing y-axis between 10 and 12, not at the actual value of an intercept (16.83488364225717) and x=0? What should I do to fix that?

Here is my code:

from scipy import stats

X = df['chemical_1']
Y = df['chemical_2']

slope, intercept, r_value, p_value, slope_std_error = stats.linregress(X,Y)
print ("slope = " + str(slope))
print ("intercept = " + str(intercept))
print ("r_squared = " + str(r_value**2))
print ("r_value = " + str(r_value))
print ("p_value = " +str(p_value))

slope = -0.9345759557752411
intercept = 16.83488364225717
r_squared = 0.04205938806347038
r_value = -0.20508385617466426
p_value = 0.00784469031490164

predict_y = slope * X + intercept

fig, ax = plt.subplots()
sns.set(color_codes=True)
sns.set(rc={'figure.figsize':(10, 10)})
ax = sns.regplot(x=X, y=Y, line_kws={'label':'$y=%3.7s*x+%3.7s$'%(slope, intercept)});
sns.regplot(x=X, y=Y, fit_reg=False, ax=ax);
sns.regplot(x=X, y=predict_y,scatter=False, ax=ax);
ax.set_ylabel('chemical_2')
ax.legend()
plt.show()

enter image description here

UPD: when I'm using the solution proposed by Simon - extending the limits of the axes, the intercept is still not shown and the plot looks like this: enter image description here
When I use set_ylim(0,20), the data on the plot looks squeezed. Actually any axis parameters that I set (other than defaults) result in the data and the confidence interval on the plot look squeezed.

enter image description here

1
If you look at the x-axis, you see that the x-axis starts at just below 6, not 0. The intercept is where the function intersects with the y-axis at x=0, not x=6.8Yngve Moe
@YngveMoe is it possible to visualize it so that the x-axis starts at 0 and the intercept crosses y-axis at 16.83488364225717?samba

1 Answers

2
votes

As mentioned in the comments, the intercept is the value of Y when X has a value of 0. So the range of your X-axis doesnt allow the actual intercept to be shown

import numpy as np
from scipy import stats
import seaborn as sns

np.random.seed(1236)
X = np.arange(5,10) + np.random.normal(0,1,5)
Y = np.arange(5,10) + np.random.normal(0,1,5)

slope, intercept, r_value, p_value, slope_std_error = stats.linregress(X,Y)
predict_y = slope * X + intercept

print("slope = " + str(slope))
print("intercept = " + str(intercept))

sns.regplot(x=X, y=Y, fit_reg=False)
sns.regplot(x=X, y=predict_y,scatter=False)

Here we can see that the intercept is 0.115:

slope = 0.9897768121234015
intercept = 0.11521162448067557

Which gives a seaborn graph that looks like this:

enter image description here

If you want to actually see the crossing point, what you want to do is extend the limits of your axes:

p = sns.regplot(x=X, y=Y, fit_reg=False)
p.axes.set_xlim(0,)
p.axes.set_ylim(0,)
sns.regplot(x=X, y=predict_y,scatter=False)

enter image description here

EDIT:

If you want to get around the problem of squeezed data when you widen your axis limits, you could standardize your data by calculating Z scores:

X = np.arange(5,10) + np.random.normal(0,1,5)
Y = np.arange(5,10) + np.random.normal(0,1,5)
X = stats.zscore(X)
Y = stats.zscore(Y)

slope, intercept, r_value, p_value, slope_std_error = stats.linregress(X,Y)
predict_y = slope * X + intercept

print("slope = " + str(slope))
print("intercept = " + str(intercept))

sns.regplot(x=X, y=Y, fit_reg=False)
sns.regplot(x=X, y=predict_y,scatter=False)

Parameter values:

slope = 0.667021422528575
intercept = -2.8128800822178726e-16

enter image description here

Its very important to note that in this case, your X and Y are no longer in their raw metrics. So the interpretation of the slope is now "for a 1 standard deviation increase in X, the value of Y will increase by 0.667 standard deviations". But you'll see that the intercept is now essentially 0 (i.e. the value of Y when X=0), and is shown towards the center of the plot