I will preface this by saying that I am fairly new to R and have been stuck on this issue for a few weeks and seem to be getting no where. I am looking to perform a multivariate logistic regression to determine if water main material and soil type plays a factor in the location of water main breaks in my study area.
I have 417 positive water main break locations and create an additional 400 false locations to use in my analysis. I understand that the water main material and the soil type are both categorical variables and should be re-coded into dummy variables before using the GLM model. That is where I am having trouble. I have not worked with dummy variables until now and can't seem to understand how they are created in R. Below is the breakdown of the data I have and the current GLM model that I am using.
INDICATOR: 0 or 1 (Indicates if the location XY was or was not a water main break location)
MAIN MATERIAL: Material of the water main at the XY location (categorical value - about 8 unique values)
SOIL CLASSIFICATION: Type of soil at location of break (categorical value - around 20 values)
(logAnalysis <- glm(Indicator~main_material+soil_classification, data=Breaks, family=binomial (link="logit"))
I have only used Stack Exchange one other time so if more information is needed, please let me know.
After trying Aurther's suggestion of using factor(), this is the output that I get. R Ouput
I am a bit confused why many of the soil classifications and the PE main material have such high Std. Errors.