2
votes

My data includes survey data of car buyers. My data has a weight column that i used in SPSS to get sample sizes. Weight column is affected by demographic factors & vehicle sales. Now i am trying to put together a logistic regression model for a car segment which includes a few vehicles. I want to use the weight column in the logistic regression model & i tried to do so using "weights" in glm function. But the results are horrific. Deviances are too high, McFadden Rsquare too low. My dependent variable is binary, independent variables are on 1 to 5 scale. Weight column is numerical, ranging from 32 to 197. Could that be a reason that results are poor? Do i need to have values in weight column below 1?

Format of input file to R is -

WGT output I1 I2 I3 I4 I5
67   1      1  3  1  5  4

I1, I2, I3 being independent variables

logr<-glm(output~1,data=data1,weights=WGT,family="binomial")

logrstep<-step(logr,direction = "both",scope = formula(data1))\

logr1<-glm(output~ (formula from final iteration),weights = WGT,data=data1,family="binomial")

hl <- hoslem.test(data1$output,fitted(logr1),g=10)

I want a logistic regression model with better accuracy & gain a better understanding of using weights with logistic regression

1

1 Answers

1
votes

I would check out the survey package. This will allow you to specify weights for the survey design using the svydesign function. Additionally, you can use the svyglm function to perform your weighted logistic regression. See http://r-survey.r-forge.r-project.org/survey/

Something like the following assuming your data is in a dataframe called df

my_svy <- svydesign(df, ids = ~1, weights = ~WGT)

Then you can do the following:

my_fit <- svyglm(output ~1, my_svy, family = "binomial")

For a full reprex check out the below example

library(survey)

# Generate Some Random Weights
mtcars$wts <- rnorm(nrow(mtcars), 50, 5)

# Make vs a factor just for illustrative purposes
mtcars$vs <- as.factor(mtcars$vs)

# Build the Complete survey Object
svy_df <- svydesign(data = mtcars, ids = ~1, weights = ~wts)

# Fit the logistic regression
fit <- svyglm(vs ~ gear + disp, svy_df, family = "binomial")

# Store the summary object
(fit_sumz <- summary(fit))

# Look at the AIC if desired
AIC(fit)

# Pull out the deviance if desired
fit_sumz$deviance

As far as the stepwise regression, this typically isn't a great methodology for a statistical point of view. It results in a higher R2 and some other issues regarding inference (see https://www.stata.com/support/faqs/statistics/stepwise-regression-problems/).