I'm working with two dataframes in R:
df1 = data.frame(c("A", "B"), c(1, 21), c(17, 29))
colnames(df1) = c("location", "start", "stop")
df1
location start stop
A 1 17
B 21 29
df2 = data.frame(c("A", "A", "A", "A", "B"), c(1, 10, 20, 40, 20), c(10, 20, 30, 50, 30), c("x1", "x2","x4", "x5", "x3"))
colnames(df2) = c("location", "start", "stop", "out")
df2
location start stop out
A 1 10 x1
A 10 20 x2
A 20 30 x4
A 40 50 x5
B 20 30 x3
Now I want to check for each row of df1:
- is there a match between 'location' with a 'location' from df2
- if the 'start' value is in the range of start and stop from df2 or if the 'end' value is in the range of start and stop from df2, then the corresponding 'out' value from df2 should be pasted in a new column in df1
This is how the output would look in the case of this example
df1_new
location start stop out
A 1 17 x1,x2
B 21 29 x3
I've started in R, but I'm stuck at the point where I need to look in the complete dataframe of df2
for (i in nrow(df1)) {
if(df1$location[i] == df2$location # it needs to look for a match in the complete dataframe of df2. I don't know how to do this
& if (df1$start[i] %in% # it needs to check if the start value lies in the range between df2$start & df2$end
}