0
votes

So I'm trying to fit a binary logistic regression model for a question to estimate the odds of the disease and here is the original disease outbreak data (there are 196 observations and I deleted some data entries):

Column 1: ID (person)

Column 2: Age of the person

Column 3: SES (Socio-economic status of the person) (1=upper class, 2=middle class, 3=lower class)

Column 4: Sect (categorical: two different regions)

Column 5: Y (1=disease, 0=no disease)

Column 6: Savings (1=person has savings, 0=no savings)

1     33      1      1      0      1
2     35      1      1      0      1
3      6      1      1      0      0
...
194     31      3      1      0      0
195     85      3      1      0      1
196     24      2      1      0      0

I tried the following command to fit the binary regression model:

lm1=glm(Y~factor(Age)+factor(SES)+factor(Sect)+factor(Savings),family=binomial("logit"))
summary(lm1)

and not surprisingly, it is a mess because there are too many age terms (the age terms are from 2 to 85)... So my question is, would someone be able to help me to modify my command so I'm able to have an age estimate, for example, 5 or 10 year intervals increment?

Also, the above model doesn't include any interaction terms. So if I was about to consider, say SES*Age interaction and I would like to see the age estimate for each every 5 or 10 years, how should I write the input command?

1
Please: Do not use factor on Age. If you want to model it as a spline function of age that would be fine. If you do you model as factor, I predict you will get a mess.IRTFM
Why not treat age as a continuous predictor (by just leaving out the factor() around Age)?Roy Pardee
so I guess I just need to remove the factor on age right? something like: lm1=glm(Y~Age+factor(SES)+factor(Sect)+factor(Savings),family=binomial("logit"))leo
Also I'm wondering what should I do if i would like to test the age effect in every 5-year increment?leo
You need to say what you plan on testing. You would greatly reduce your power to find either a linear or a quadratic effect of age if you cut into five year categories. I'm getting the idea that you need some statistical consultation more than R code help. The CrossValidated.com website has some very smart people contributing there and such questions are more on-topic there than they are here.IRTFM

1 Answers

2
votes

Use cut to turn numeric into factors, click HERE for more info about cut.

The flag you might be interested will the breaks=:

If you only pass one number to that flag, it will divide the whole range into equivalent intervals, like the example I showed below. You can also pass a vector of number which will specify how the interval will be divided.

data(mtcars)
library(plyr)
mydata <- mtcars
# Here I cut the whole numeric range into 10 equal intervals
mydata$myhp <- cut(mydata$hp, 10)
# Here is how the data looks like:
                     mpg cyl  disp  hp  drat  wt  qsec   vs am gear carb      myhp
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4   (108,137]
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4   (108,137]
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  (80.1,108]
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1   (108,137]
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   (165,194]
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  (80.1,108]
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4   (222,250]
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 (51.7,80.1]

> str(mydata)
'data.frame':   32 obs. of  12 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 ....
 $ myhp: Factor w/ 10 levels "(51.7,80.1]",..: 3 3 2 3 5 2 7 1 2 3 ...