I have two large data frames, which I have associated using a "1" to indicate a match between them. For example:
Name<-c("N1", "N2", "N3")
A1<-c(1, NA, 1)
A2<-c(NA, 1, NA)
A3<-c(NA, 1, 1)
df1<-data.frame(Name, A1, A2, A3)
> df1
Name A1 A2 A3
1 N1 1 NA NA
2 N2 NA 1 1
3 N3 1 NA 1
Var<-c("A1", "A2", "A3")
Detail<-c("Red", "Red, Blue", "Green, Red")
df2<-data.frame(Var, Detail)
> df2
Var Detail
1 A1 Red
2 A2 Red, Blue
3 A3 Green, Red
I am trying to aggregate all details associated with each variable from df2 with each of the names marked with a "1" for that particular variable from df1. In the end, I hope to have all unique entries of "Detail" as individual columns with 1 if a name is associated with them. df3 would be an example of the desired result.
Red<-c(1,1,1)
Blue<-c(0,1,0)
Green<-c(0,1,1)
df3<-data.frame(Name, Red, Blue, Green)
> df3
Name Red Blue Green
1 N1 1 0 0
2 N2 1 1 1
3 N3 1 0 1
After going through similar questions and trying for a LONG time, this is what I was trying to do:
I replaced cells with "1" in df1 with column name so that they are able to be matched with the corresponding rows in df2
ones <- which(df1=="1", arr.ind=T)
df1[ones]<-colnames(df1)[ones[,2]]
df1;
I know it is preferable to avoid loops, but i can't think of another way. I am trying to cycle through the columns and rows to compare them with the Var names from df2 and copy the corresponding details into that cell back into df1 using "which". But I keep getting error messages...
for(i in 2:ncol(df1)){
for(j in 1:nrow(df1)){
+ tmp <- df2[which(df2[,1]==df1[j,i]),]
+ df1[j,i] <- tmp[which(tmp[j,]==df1[j,i]),4]
}
}
If I try to replace it individually I don't get the desired result either
tmp <- df2[which(df2[,1]==df1[1,2]),]
df1[1,2] <- tmp[which(tmp[1,1]==df1[1,2]),2]
And this is where I am stuck. After this step I would aggregate all details into a single comma separated string in a single cell, make a list of all unique Details and turned them each into a column, then made another loop to add "1"s to each instance where a part of the string with the the details matches the new column headers.
There is probably a simpler way that avoids half (or all of) these steps, apologies if this is a basic question. Any advice is greatly appreciated!