0
votes

I am doing a hands on exercise of Poissons Regression of Stats with Python in Fresco Play. Problem statement is like: Load the R dataset Insurance from the MASS package. Capture the data as a pandas dataframe. Build a Poisson regression model with a log of an independent variable Holders, and dependent variable Claims. Fit the model with data, and find the sum of the residuals.

I am stuck with the last line i.e. Sum of Residuals

I used np.sum(model.resid). But answer is not accepted

Here is my code

import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np

INS_data = sm.datasets.get_rdataset('Insurance','MASS').data
model = smf.poisson('Claims ~ np.log(Holders)', INS_data).fit()
print(np.sum(model.resid))
3
Did you get an error when you used np.sum? Or did it not give the right answer?cenh
@cenh I got answer. No error. But answer is not accepted.Pazuzu
Do you need a sum or a cumulative sum by any chance?Dalen
@Dalen As per questions, it should be sum.Pazuzu
What form does model.resid take? What kind of data container and its values types and ranges? Did you take a look? Try np.cumsum() instead of np.sum(), just in case.Dalen

3 Answers

0
votes

I was running the code in Python2 which gave wrong answer but running it in Python3 gave the correct answer. I don't know the reason but code works perfectly in Python3

0
votes

For residual, you can use the basic concept of residual i.e. actual - predicted.

Here is the code snippet.

import statsmodels.api as sm
import numpy as np
import statsmodels.formula.api as smf
Insurance = sm.datasets.get_rdataset('Insurance','MASS')
data = Insurance.data
data['Holders_'] = np.log(data['Holders'])
model = smf.poisson('Claims ~ Holders_',data).fit()
y_predicted = p.predict(data['Holders_'])
residual = (data['Claims']-y_predicted)
print(sum(residual))

output

-1
votes

After much serach, i came to know that it is expecting cumulative sum so use np.cumsum(model.resid) It will pass in Frescoplay