I have been given a dataset where participants do seven trials in one of 16 possible conditions. The 16 conditions arise from a 2x2x2x2 design (that is, there are four manipulated variables each with two levels). Let’s say Var1 has levels ‘Pot’ and ‘Pan’. Var2 has levels ‘Hi’ and ‘Low’. Var 3 has levels ‘Up’ and ‘Down’. Var 4 has levels ‘One’ and ‘Two’.
The dataset includes columns for each observation in each condition for each participant – that is, for each row there are 112 (16*7) columns (along with some columns containing demographic stuff etc.), 105 (15*7) of which are empty. The conditions are encoded in the column labels, so the columns range from ‘PotHiUp1’ to ‘PanLowDown2’.
The data thus look like this:
Var1 <- c('Pot', 'Pan')
Var2 <- c('Hi', 'Low')
Var3 <- c('Up', 'Down')
Var4 <- c('One','Two')
Obs <- seq(1,7,1)
df <- expand.grid(Var1,Var2,Var3,Var4,Obs)
df <- df %>%
arrange(Var1,Var2,Var3,Var4)
x <- apply(df,1,paste,collapse="")
id <- seq(1,16,1)
age <- rep(20,16)
df <- as.data.frame(cbind(id, age))
for (i in 1:length(x)) {
df[,ncol(df)+1] <- NA
names(df)[ncol(df)] <- paste0(x[i])
}
j <- seq(3,ncol(df),7)
for (i in 1:nrow(df)) {
df[i,c(j[i]:(j[i]+6))] <- 10
}
I want to tidy this data frame so that for each row there are 4 columns (one for each variable) specifying the condition and 7 columns with the observations.
My solution is to filter the data using dplyr like so:
Df1 <- df %>%
filter(!is.na(PotHiUpOne1)) %>%
mutate(Var1 = 'pot', Var2 = 'hi', Var3 = 'up', Var4 = 'one')
then remove the NA columns like so:
Df1 <- Filter(function(x)!all(is.na(x)), Df1)
I do this 16 times (once for each condition) and then finally bind the 16 dataframes I’ve created back together after renaming the seven remaining observation columns so that they match.
I am wondering if anyone can suggest a more efficient approach, preferably using dplyr.
Edit: I should add that when I say "efficient" I really mean a more elegant approach code-wise rather than something that will run fast (the dataset is not large) - i.e., something that doesn't involve writing out more or less the same block of code 16 times.