1
votes

I have two data frames df1 and df2.

df1 <- data.frame(x1=c("A35", "A41", "A49"),
                  x2=c(8, 24, 33),
                  x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

I want to select the rows from the df2 based on the following three criteria

(1) The y1 of df2 matches x1 of df1;

(2) The y2 of df2 >= x2 of df1

(3) The y2 of df2 =< x3 of df1

The data meeting the criteria will be added to df1 as new columns. If the row(s) of df1 has more than one matches, those additional match(es) will be added as new columns as well.

The expected results are

data.frame(x1=c("A35", "A41", "A49"),
           x2=c(8, 24, 33),
           x3=c(15, 63, 54),
           z1 = c("A35", "A41", ""),
           z2 = c(9, 24,""),
           z3 = c("", "A41", ""),
           z4 = c("", 32, ""))

x1 x2 x3 z1 z2 z3 z4
A35 8 15 A35 9  
A41 24 63 A41 24 A41 32
A49 33 54  

Thanks in advance!

2
Why does df2 have 5 rows, but the resultant data frame only has 4 new columns? - Tim Biegeleisen
df2 has 5 rows, but only 3 rows match to df1. The row 1 of df1 has one df2 match and row 2 of df1 has two df2 matches. The row 1 match and the first row2 match of df2 are added as the new columns (z1, z2). The second row 2 match is added as the new columns (z3, and z4). I hope I have clarified it. - mi3567
how can you compare y2 of df2 >= x2 of df1 if there are more y2 values than x2 values? you can't go element by element...or are you saying that condition1 holds then condition2 and 3 must also hold. or is that 1 or 2 or 3 must hold? - road_to_quantdom
@mi3567 is my answer not sufficient? - road_to_quantdom

2 Answers

0
votes

It is not really recommended to work with dataframes of unequal length, using Lists should work better for this purpose.

I created a code that gets the job done, even though I'm not sure it's the most efficient way.

First you need to compare every row from both dataframes. This can be done using an apply function within an apply function (basicly: for each row in df1, compare with each row in df2) and returning the matching values, along with their index.

This will be stored in a messy list full of empty elements that did not match. So after cleaning up the list, the resulting matches can be added to each individual row of df1 using the sapply function.

df1 <- data.frame(x1=c("A35", "A41", "A49"),
              x2=c(8, 24, 33),
              x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

matches <- apply(df2,1,function(x) apply(df1,1,function(y) 
  if(x[1]==y[1]&&x[2]>=y[2]&&x[2]<=y[3]){
    c(which(df1==x[1]),x[1:2])
  }))
addedelem <- t(array(unlist(matches), dim=c(3,length(unlist(matches))/3)))
result <- sapply(1:length(df1$x1), function(x) (c(as.matrix(df1[x,]),t(addedelem[which(addedelem[,1]==x),2:3]))))

The resulting list is what you're looking for. If it's necessary, you can re-transform this into a dataframe.

> result
[[1]]
[1] "A35" "8"   "15"  "A35" " 9" 

[[2]]
[1] "A41" "24"  "63"  "A41" "24"  "A41" "32" 

[[3]]
[1] "A49" "33"  "54" 
0
votes

if i am understanding your question correctly this should work:

### we use the matches to pick our values from df1
### we use our conditions to pick our values from df2
matches <- match(df2$y1,df1$x1)
matches <- matches[!is.na(matches)]
condition1 <- df2$y1 %in% df1$x1
condition2 <- df2$y2[condition1] >= df1$x2[matches]
condition3 <- df2$y2[condition1] <= df1$x3[matches]

### i create these tmp variables so you can see step by step
### what each line of code is doing
### here i am finding the values that meet all the conditions
### then i am pulling the associated y2 values
tmp <- data.frame(x1=df1$x1[matches],y2=df2$y2[condition1])
tmp <- tmp[condition2&condition3,]
tmp <- droplevels(tmp)

### now that we have the values we want
### we are organizing the data in the desired output you 
### specified. 
x <- split(tmp[-1], tmp[[1]])
tmp2 <- data.frame()
for(i in 1:length(x)){

  df <- data.frame(t(unlist(x[[i]], use.names=FALSE)))
  colnames(df) <- seq(1,nrow(x[[i]]))
  tmp2 <- rbind.fill(tmp2,df)

}
colnames(tmp2) <- paste(rep("z",ncol(tmp2)),1:ncol(tmp2),sep="")
res <- data.frame(df1[df1$x1 %in% names(x),],tmp2)
res <- rbind.fill(res,df1[!df1$x1 %in% names(x),])

>res
   x1 x2 x3 z1 z2
1 A35  8 15  9 NA
2 A41 24 63 24 32
3 A49 33 54 NA NA