The following code runs a very simple lm()
and tries to summarise the results (factor level, coefficient) in a small data frame:
df <- data.frame(star_sign = c("Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn", "Aquarius", "Pisces"),
y = c(1.1, 1.2, 1.4, 1.3, 1.8, 1.6, 1.4, 1.3, 1.2, 1.1, 1.5, 1.3))
levels(df$star_sign) #alphabetical order
# fit a simple linear model
my_lm <- lm(y ~ 1 + star_sign, data = df)
summary(my_lm) # intercept is based on first level of factor, aquarius
# I want the levels to work properly 1..12 = Aries, Taurus...Pisces so I'm going to redefine the factor levels
df$my_levels <- c("Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn", "Aquarius", "Pisces")
df$star_sign <- factor(df$star_sign, levels = df$my_levels)
my_lm <- lm(y ~ 1 + star_sign_, data = df)
summary(my_lm) # intercept is based on first level of factor which is now Aries
# but for my model fit I want the reference level to be Virgo (because reasons)
df$star_sign_2 <- relevel(df$star_sign, ref = "Virgo")
my_lm <- lm(y ~ 1 + star_sign_2, data = df)
summary(my_lm)
df_results <- data.frame(factor_level = names(my_lm$coefficients), coeff = my_lm$coefficients )
# tidy up
rownames(df_results) <- 1:12
df_results$factor_level <- as.factor(gsub("star_sign_2", "", df_results$factor_level))
# change label of "(Intercept)" to "Virgo"
df_results$factor_level <- plyr::revalue(df_results$factor_level, c("(Intercept)" = "Virgo"))
levels(df_results$factor_level) # the levels are alphabetical + Virgo at the front (not same as display order from lm)
The factor levels aren't in the right order: I want to sort df_results
so that the star signs appear in the same order as they do originally (Aries, Taurus...Pisces), as captured in the df$my_levels
column. I don't think I have a good understanding of manipulating factors and their labels/levels, etc. so I'm struggling to know how to do this.
Also this is quite a long-winded and clumsy bit of code. Are there more concise ways to do this sort of thing?
Thank you.
(ps mathematically the model is obviously trivial, but that's ok for these purposes - I'm just interested in how to manipulate the outputs)