R: ifelse statement test involving multiple dataframes

Question

I am trying to create a new variable using ifelse by combining data from two data.frames (similar to this question but without factors).

My problem is that df1 features yearly data, whereas vars in df2 are temporally aggregated: e.g. df1 has multiple obs (1997,1998,...,2005) and df2 only has a range (1900-2001).

For illustration, a 2x2 example would look like

df1$id <- c("2","20")
df1$year <- c("1960","1870")

df2$id <- df1$id
df2$styear <- c("1800","1900")
df2$endyear <- c("2001","1950")

I want to combine both in such a way that the id (same variable exists in both) is matched, and further, the year in df1 is within the range of df2. I tried the following

df1$new.var <- ifelse(df1$id==df2$id & df1$year>=df2$styear & 
df1$year<df2$endyear,1,0)

Which ideally should return 1 and 0, respectively.

But instead I get warning messages:

1: In df1$id == df2$id : longer object length is not a multiple of shorter object length

2: In df1$year >= df2$styear : longer object length is not a multiple of shorter object length

3: In df1$year < df2$endyear : longer object length is not a multiple of shorter object length

For the record, the 'real' df1 has 500 obs and df2 has 14. How can I make this work?

Edit: I realised some obs in df2 are repeated, with multiple periods e.g.

id    styear    endyear
1      1800      1915
1      1950      2002
2      1912      1988
3      1817      2000

So, I believe what I need is something like a double-ifelse:

df1$new.var <- ifelse(df1$id==df2$id & df1$year>=df2$styear & 
df1$year<df2$endyear | df1$year>=df2$styear & 
df1$year<df2$endyear,1,0)

Obviously, this wouldn't work, but it is a way to get out of the duplicates-problem.

For example, if id=1 in df1$year=1801, it will pass the first year-range test (1801 is between 1800-1915), but fail the second one (1801 is not between 1950-2002), so it is only coded once and no extra rows are added (currently the duplicates add extra rows).

see: rdocumentation.org/packages/data.table/versions/1.9.6/topics/… — Bulat
@Bulat Hello, foverlaps was recommended by others too, I can't seem to get it to work - says "Duplicate columns are not allowed in overlap joins. This may change in the future." — user6550364

Stephen Stephen · Accepted Answer · 2016-10-05T20:43:06

df1$id <- c("2","20")
df1$year <- c("1960","1870")

df2$id <- df1$id
df2$styear <- c("1800","1900")
df2$endyear <- c("2001","1950")

library(dplyr)
df3 <- left_join(df1,df2,by = "id") %>% filter(year <= endyear,year >= startyear)

I highly recommend the dplyr package for data manipulation.

R: ifelse statement test involving multiple dataframes

4 Answers