2
votes

I am trying to create a normal regression model and a logistic one to predict fraud in real state data. I work with a mixed data set (categorical and numerical variables) where I have done the pre-processing and recoding so that I had balanced weight of each level per categorical variable (avoiding variables containing levels with only 1 registry mixed with levels that have many observations, and so on). I added an interaction to increase the R^2 of my lm. When I want to plot my linear model I get this warning:

    Warning messages:
1: In sqrt(crit * p * (1 - hh)/hh) : NaNs produced
2: In sqrt(crit * p * (1 - hh)/hh) : NaNs produced

It appears to be related to Cook's distance -https://bugs.r-project.org/bugzilla3/show_bug.cgi?format=multiple&id=9316- (influent factors, even though I removed outliers...). Any idea what is causing this error and what can be done to plot the linear model?

Example of my code:

lm.a3 <- lm(log(response) ~(.-file_status)*file_status, data=data) 
final.lm3 <- stepAIC(lm.a3,direction="both")
summary(final.lm3) #R^2 = 64%
par(mfrow=c(2,2))
plot(final.lm3)

Thanks for your time and I appreciate your answers

1
Could you kindly provide a minimal subset of your data so that this error is reproducible? Use dput and insert in your post.mlegge
Does response have values equal to zero?LyzandeR
Thanks @LyzandeR for your response. In fact I did logarithm transformations before the lm was run, and some of my response values where equal to 1. When logarithm transformation was done, the output was equal to zero. The solution consisted in adding a minimum quantity (0.0001234) to the argument so that the stepAIC function could run.NuValue
Thanks @mkemp6 for your feedback. No uploading of data subset was needed cause I realized how to solve this issue with LyzandeR response. Thanks anyway!NuValue

1 Answers

3
votes

The problem was that I did logarithm transformations before the stepAIC function was run to improve the fit. As some of my response variables where equal to 1, when doing log(response_variable) the output of this function was equal to zero for some cases. Adding a minimum quantity to the argument of the logarithm function resolved the issue: log(response_variable + 0.0001234). Thanks to @LyzandeR for his feedback.