0
votes

What could explain the difference in intercepts between statsmodel OLS regression and also seaborn lmplot?

My statsmodel code:

X = mmm_ma[['Xvalue']]
Y = mmm_ma['Yvalue']
model2 = sm.OLS(Y,sm.add_constant(X), data=mmm_ma)
model_fit = model2.fit()
model_fit.summary()

My seaborn lmplot code:

sns.lmplot(x='Xvalue', y='Yvalue', data=mmm_ma)

My statsmodel intercept is 28.9775 and my seaborn lmplot's intercept is around 45.5.

Questions

  • Should the intercepts be the same?
  • Why might explain why these are different? (can I change some code to make it equal)
  • Is there a way to achieve a plot similar to seaborn lmplot but using the exact regression results to ensure they align?

Edit

@Massoud thanks for posting that. I think I have realised what is the problem. My x-values range between 1400 to 2600 and y-values range from 40 to 70. So using seaborn lmplot, it just plots the regression and the intercept is based on the lowest range X value - which is an intercept of 46.

enter image description here

However for statsmodel OLS, it keeps the line going until X = 0, which is why I get an intercept of 28 or so.

So I guess the question is there a way to continue the trend line using seaborn to go all the way until x = 0.

enter image description here

I tried changing the axis but it doesn't seem to extend the line.

axes = lm.axes
axes[0,0].set_xlim(0,)

enter image description here

1
Please read How to create a Minimal, Complete, and Verifiable example and edit your question.Mr. T

1 Answers

1
votes

It is weird. Maybe if you could provide more details we could help better. I tried to replicate the problem but I am getting the same intercepts from both approaches.

The code:

import matplotlib.pyplot as plt
import statsmodels.regression.linear_model as sm
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(0)

mmm_ma = {'Xvalue': range(0, 40), 'Yvalue': np.random.randint(low=0, high=40, size=40)}

mmm_ma = pd.DataFrame(mmm_ma)

X = mmm_ma[['Xvalue']]
Y = mmm_ma['Yvalue']
model2 = sm.OLS(Y,sm.add_constant(X), data=mmm_ma)
model_fit = model2.fit()
print(model_fit.summary())


sns.lmplot(x='Xvalue', y='Yvalue', data=mmm_ma)
plt.show()

Here is the output:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Yvalue   R-squared:                       0.005
Model:                            OLS   Adj. R-squared:                 -0.021
Method:                 Least Squares   F-statistic:                    0.2071
Date:                Wed, 18 Jul 2018   Prob (F-statistic):              0.652
Time:                        00:51:04   Log-Likelihood:                -155.75
No. Observations:                  40   AIC:                             315.5
Df Residuals:                      38   BIC:                             318.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         17.2183      3.783      4.551      0.000       9.559      24.877
Xvalue         0.0760      0.167      0.455      0.652      -0.262       0.414
==============================================================================
Omnibus:                        3.327   Durbin-Watson:                   1.618
Prob(Omnibus):                  0.189   Jarque-Bera (JB):                1.738
Skew:                           0.197   Prob(JB):                        0.419
Kurtosis:                       2.058   Cond. No.                         44.5
==============================================================================

And below is the plot from Seaborn: enter image description here