3
votes

I tried implementing logistic regression using glm in R for winconsin breast cancer dataset. I analysed the dataset and found that wbc$V7 contained missing values. I imputed the missing values using the Hmisc package and performed logistic regression using glm

wbc=read.csv(file="https://archive.ics.uci.edu/ml/machine-learning- 
databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",header = 
FALSE)
wbc[wbc=='?']=NA  #replacing '?' with NA
a=sapply(wbc,function(x) sum(is.na(x))) #analyse the number of NA in each column
print(a)
library(Hmisc)
wbc$V7=impute(wbc$V7,mode)  #impute missing values with mode in V7
wbc$V11[wbc$V11==2]=0; #V11 has either '2' or '4' as entries, replacing '2' with '0' and '4' with '1' 
wbc$V11[wbc$V11==4]=1;
model <- glm(V11~V2+V3+V4+V5+V6+V7+V8+V9+V10,family=binomial(),data=wbc) #

OUTPUT:


Call:  glm(formula = V11 ~ V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10, 
family = binomial(), data = wbc)

Coefficients:
(Intercept)           V2           V3           V4           V5           V6          
V71         V710  
8.6625       0.4511      -0.1013       0.4842       0.2206       0.1684     
-18.7466     -14.8168  
V72          V73          V74          V75          V76          V77          
V78          V79  
-17.6684     -16.0272     -15.3552     -16.3765       0.7704     -16.2944     
-16.6171           NA  
V8           V9          V10  
0.5052       0.1144       0.4550  

Degrees of Freedom: 698 Total (i.e. Null);  681 Residual
Null Deviance:      900.5 
Residual Deviance: 102.9    AIC: 138.9

Why does the output contain coefficients for V71, V710, V72, V73, V74, V75, V76, V77, V78 and V79 when the wbc dataframe has only columns V1, V2, V3, V4, V5, V6, V7, V8, V9, V10 ?

2

2 Answers

3
votes

If V7 is a factor, it may be dummy coded automatically when applying glm. Then you would have one coefficient per category of your factor.

0
votes

you should change your variable v7 to numeric, It is factor right now so you will get a result for all the values in the column V7 . Changing it to numeric will solve your problem.

Hope this helps