R: how to select data from a data frame based on specific rules and add the data as new columns to an existing data frame

Question

I have two data frames df1 and df2.

df1 <- data.frame(x1=c("A35", "A41", "A49"),
                  x2=c(8, 24, 33),
                  x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

I want to select the rows from the df2 based on the following three criteria

(1) The y1 of df2 matches x1 of df1;

(2) The y2 of df2 >= x2 of df1

(3) The y2 of df2 =< x3 of df1

The data meeting the criteria will be added to df1 as new columns. If the row(s) of df1 has more than one matches, those additional match(es) will be added as new columns as well.

The expected results are

data.frame(x1=c("A35", "A41", "A49"),
           x2=c(8, 24, 33),
           x3=c(15, 63, 54),
           z1 = c("A35", "A41", ""),
           z2 = c(9, 24,""),
           z3 = c("", "A41", ""),
           z4 = c("", 32, ""))

x1 x2 x3 z1 z2 z3 z4
A35 8 15 A35 9  
A41 24 63 A41 24 A41 32
A49 33 54

Thanks in advance!

Why does df2 have 5 rows, but the resultant data frame only has 4 new columns? — Tim Biegeleisen
df2 has 5 rows, but only 3 rows match to df1. The row 1 of df1 has one df2 match and row 2 of df1 has two df2 matches. The row 1 match and the first row2 match of df2 are added as the new columns (z1, z2). The second row 2 match is added as the new columns (z3, and z4). I hope I have clarified it. — mi3567
how can you compare y2 of df2 >= x2 of df1 if there are more y2 values than x2 values? you can't go element by element...or are you saying that condition1 holds then condition2 and 3 must also hold. or is that 1 or 2 or 3 must hold? — road_to_quantdom

Jonas Coussement Jonas Coussement · Accepted Answer · 2015-12-09T10:55:30

It is not really recommended to work with dataframes of unequal length, using Lists should work better for this purpose.

I created a code that gets the job done, even though I'm not sure it's the most efficient way.

First you need to compare every row from both dataframes. This can be done using an apply function within an apply function (basicly: for each row in df1, compare with each row in df2) and returning the matching values, along with their index.

This will be stored in a messy list full of empty elements that did not match. So after cleaning up the list, the resulting matches can be added to each individual row of df1 using the sapply function.

df1 <- data.frame(x1=c("A35", "A41", "A49"),
              x2=c(8, 24, 33),
              x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

matches <- apply(df2,1,function(x) apply(df1,1,function(y) 
  if(x[1]==y[1]&&x[2]>=y[2]&&x[2]<=y[3]){
    c(which(df1==x[1]),x[1:2])
  }))
addedelem <- t(array(unlist(matches), dim=c(3,length(unlist(matches))/3)))
result <- sapply(1:length(df1$x1), function(x) (c(as.matrix(df1[x,]),t(addedelem[which(addedelem[,1]==x),2:3]))))

The resulting list is what you're looking for. If it's necessary, you can re-transform this into a dataframe.

> result
[[1]]
[1] "A35" "8"   "15"  "A35" " 9" 

[[2]]
[1] "A41" "24"  "63"  "A41" "24"  "A41" "32" 

[[3]]
[1] "A49" "33"  "54"

R: how to select data from a data frame based on specific rules and add the data as new columns to an existing data frame

2 Answers