0
votes

I have data about insurance; age, sex, BMI, children, smoker, region and charges. Sex, smoker and region are factors. sex: male, female, smoker: yes, no, region: northeast, southeast, southwest, northwest.

m2 <- lm(charges ~ age + sex + bmi + children + smoker + region)

After fitting linear regression model with data I need to predict: male, age=40, bmi=30, smoker=yes, region=northwest. I have tried to factor categorical variables after reading the data

data$sex <- as.factor(data$sex)
data$region <- as.factor(data$region)

Using the predict function:

predict(m2, list(age=40, sex=factor(male), bmi=30, children=2, smoker=factor(yes), 
                 region=factor(northwest)), int="p", level=0.98)

I only get errors. Please help

1
1) do the factor levers in the new data match the levels in the old data? in your sample, e.g. the smoker variable will only have one level (yes) 2) try passing the new data as a data.frame not a listarvi1000
also when you pass new data, strings need to be quoted. smoker=factor(yes) will look for an object called yes. perhaps you mean something like smoker = factor('yes', levels = c('yes', 'no')).arvi1000

1 Answers

0
votes

Instead of redefining the factors, just use the factor level in quotation marks in predict.

predict(m2, list(age=40, sex="male", bmi=30, children=2, smoker="yes", 
                 region="northwest"), int="p", level=0.98)
#         fit       lwr      upr
# 1 -1.978994 -9.368242 5.410254

Data

dat <- structure(list(charges = c(1.37095844714667, -0.564698171396089, 
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484, 
1.51152199743894, -0.0946590384130976, 2.01842371387704, -0.062714099052421
), age = c(20L, 58L, 44L, 53L, 22L, 51L, 20L, 75L, 59L, 41L), 
    sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", 
    "male"), class = "factor"), bmi = c(25.3024309248682, 24.6058854935878, 
    25.7881406228236, 25.6707038267505, 24.0508191903124, 25.036135738485, 
    27.115755613237, 25.1674409043556, 24.1201634714689, 25.9469131749433
    ), children = c(4L, 1L, 5L, 1L, 1L, 4L, 0L, 0L, 3L, 4L), 
    smoker = c("no", "yes", "yes", "no", "no", "yes", "yes", 
    "yes", "yes", "no"), region = structure(c(1L, 2L, 2L, 3L, 
    1L, 2L, 3L, 3L, 3L, 2L), .Label = c("northeast", "northwest", 
    "southeast"), class = "factor")), row.names = c(NA, -10L), class = "data.frame")