0
votes

I have three data frames. First data frame is as follows

 ID      Col1    Col2    Col3
 XX:11   1       0       0
 YY:22   1       0       0
 ZZ:33   0       1       0
 AA:44   0       0       1
 BB:55   0       0       1

Second data frame is as follows

 ID      Col1
 YY:22   0
 AA:44   1
 XX:11   1
 DD:99   0

Third data frame is

 ColA    ColB    ColC

ColA,ColB and ColC correspond to Col1,Col2 and Col3 respectively from data frame 1.

I am trying to find ID`s from second data frame which exist in the first and if they exist, then increment the corresponding Col* in the third data frame.

For example taking ID YY:22 from the second data frame. This exists in the first data frame , then look to see which column(from Col1, Col2 and Col3) has a 1 and then accordingly increment the corresponding column in third data frame. IN this case YY:22 has a 1 in COl1 in the first data frame, so ColA needs to be incremented in the third data frame.

edit: expected output is

 ColA    ColB    ColC
    2       0       1

Col1 in Data Frame 2 is irrelevant

3
So the 2nd dataframe Col1 column is irrelevant? Maybe read about %in%, merge?zx8754

3 Answers

2
votes

Using dplyr we can do:

df3 = df1 %>%
    filter(ID %in% df2$ID) %>%
    summarize_at(vars(Col1:Col3), sum)

Output:

  Col1 Col2 Col3
1    2    0    1

If you need the column names to be A, B, C then you can just rename them now like akrun's answer:

names(df3) <- paste0("Col", LETTERS[1:3])
2
votes

You could give semi_join from dplyr a try:

library(dplyr)
df3 <- df1 %>% 
  semi_join(df2, by ="ID") %>%
  summarise_at(vars(Col1:Col3), funs(sum))
1
votes

We can use match between the 'ID's of the two dataset, subset the 'df1' based on that and get the sum of the 'Col' columns with colSums

df3 <- as.data.frame.list(colSums(df1[match(df2$ID, df1$ID, nomatch = 0),-1]))
names(df3) <- paste0("Col", LETTERS[1:3])