I have seen this dplyr mutate/replace several columns on a subset of rows and this Conditional replacement of values in a data.frame
I want to conditionally replace certain values in my dataframe with other values in my dataframe, based on the nested structure of my data.
I have a dataframe which, in some cases, has multiple rows representing a single study. In these cases, I have data representing overall ratings of the study (where effect
= "All.outcomes.") followed by rows representing ratings of specific effects within the study (effect
= "Study.1..Effect 1", for example). The rows representing specific effects nested within a study always have missing data on a number of variables. In these cases, I would like to replace the NA values in the specific effect rows with data from the "All.outcomes" row for each StudyID.
Missing data is always on given variables: q1
, q2B
, q2C
, q2A
, and q4
. In the cases where there is missing data on these variables, I would like to fill in the data from the line where effects
= "All.outcomes."
Here's what it looks like now:
studyID effect q1 q11 q2B q2C q2a q4 q9
a s100 All.outcomes low NA low high low low NA
b s100 Study.1..Effect.1 NA high NA NA NA NA low
c s100 Study.1..Effect.2 NA high NA NA NA NA low
d s101 All.outcomes low low low high low low low
e s102 All.outcomes low low low high low high low
f s104 All.outcomes low NA low high low low NA
g s104 Study.1..Effect.1 NA low NA NA NA NA low
h s104 Study.2..Effect.1 NA high NA NA NA NA high
i s104 Study.3..Effect.1 NA low NA NA NA NA low
Here's how I want the dataframe to look:
studyID effect q1 q11 q2B q2C q2a q4 q9
a s100 All.outcomes low NA low high low low NA
b s100 Study.1..Effect.1 low high low high low low low
c s100 Study.1..Effect.2 low high low high low low low
f s104 All.outcomes low NA low high low low NA
g s104 Study.1..Effect.1 low low low high low low low
h s104 Study.2..Effect.1 low high low high low low high
i s104 Study.3..Effect.1 low low low high low low low
You can see that, for example, for s100 effect
= "Study.1..Effect.1" and "Study.1..Effect.2", I filled in q1
, q2B
, q2C
, q2A
, and q4
with values from effect
= `All.outcomes."
Note that here are studyIDs in the database where this pattern does not occur (i.e., they only have effect
= "All.outcomes") and that in cases where a given study does have multiple effects nested within it, the number and names of these nested effects vary (as can be seen comparing studyID
= s100 and s104 in the sample data).
Is there an efficient solution to this? Thank you in advance for your help.
#sample data
a <- c("s100", "All.outcomes", "low", "NA", "low", "high", "low", "low", "NA")
b <- c("s100", "Study.1..Effect.1", "NA", "high", "NA", "NA", "NA", "NA", "low")
c <- c("s100", "Study.1..Effect.2", "NA", "high", "NA", "NA", "NA", "NA", "low")
d <- c("s101", "All.outcomes", "low", "low", "low", "high", "low", "low", "low")
e <- c("s102", "All.outcomes", "low", "low", "low", "high", "low", "high", "low")
f <- c("s104", "All.outcomes", "low", "NA", "low", "high", "low", "low", "NA")
g <- c("s104", "Study.1..Effect.1", "NA", "low", "NA", "NA", "NA", "NA", "low")
h <- c("s104", "Study.2..Effect.1", "NA", "high", "NA", "NA", "NA", "NA", "high")
i <- c("s104", "Study.3..Effect.1", "NA", "low", "NA", "NA", "NA", "NA", "low")
df <- as.data.frame(rbind(a,b,c,d,e,f,g,h,i))
colnames(df)=c("studyID", "effect", "q1", "q11", "q2B", "q2C", "q2a", "q4", "q9")