My question is similar to this question. I'm trying to tidyr::gather
multiple columns. However, the solution provided in the link is less than ideal because the attributes are generally not identical across all columns and so they are dropped.
Note, I know how to do this with base R, but I'm trying to learn how to do the equivalent operation with tidyr and/or dplyr.
Below I've simulated some data (poorly, but quickly) that illustrate the situation I often find myself in (although I generally have far more columns that follow this same sort of a pattern). I've provided the base solution with stats::reshape
so you can see my desired output.
Any help would be much appreciated.
set.seed(123)
male_g6 <- rbinom(100, 1, .5)
ell_g6 <- rbinom(100, 1, .1)
sped_g6 <- rbinom(100, 1, .15)
pullouts_g6 <- rbinom(100, 5, .1)
disability_g6 <- replicate(100,
sample(
c("asd", "cd", "ed", "hi", "id", "ohi", "ld", "none"),
1,
prob = c(rep(0.01, 6), 0.05, 0.89)
)
)
score_g6 <- rnorm(100, 200, 10)
score_g7 <- score_g6 + 5 + rnorm(100, 0, 2)
score_g8 <- score_g7 + 5 + rnorm(100, 0, 2)
d <- data.frame(
SID = 1:100,
male_g6 = male_g6,
male_g7 = male_g6,
male_g8 = male_g6,
ell_g6 = ell_g6,
ell_g7 = ell_g6,
ell_g8 = ell_g6,
sped_g6 = sped_g6,
sped_g7 = sped_g6,
sped_g8 = sped_g6,
pullouts_g6 = pullouts_g6,
pullouts_g7 = pullouts_g6,
pullouts_g8 = pullouts_g6,
disability_g6 = disability_g6,
disability_g7 = disability_g6,
disability_g8 = disability_g6,
score_g6 = score_g6,
score_g7 = score_g7,
score_g8 = score_g8
)
With base reshape
ld <- stats::reshape(d,
idvar = "SID",
varying = list(
c("male_g6", "male_g7", "male_g8"),
c("ell_g6", "ell_g7", "ell_g8"),
c("sped_g6", "sped_g7", "sped_g8"),
c("pullouts_g6", "pullouts_g7", "pullouts_g8"),
c("disability_g6", "disability_g7", "disability_g8"),
c("score_g6", "score_g7", "score_g8")
),
v.names = c("male", "ell", "sped", "pullouts", "disability", "score"),
times = 6:8,
timevar = "Grade",
direction = "long"
)
ld <- ld[order(ld$SID), ]
reshape(d, idvar="SID", direction="long", varying=2:19, sep="_g", timevar="Grade")
-reshape
has some magical guessing of variable names and times if you specify an appropriatesep=
– thelatemailsep
in your suggestion. – Daniel Anderson