Replace specific values based on another dataframe

Question

First, let's start with DataFrame 1 (DF1) :

DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016", 
                    "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                    "06/22/2016", "06/23/2016"),
                  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                  c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
                  c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                  c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                    "NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")

I also have DataFrame 2 (DF2) :

DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
                  c(1, 1, 2, 2),
                  c(9999, 8888, 777, 555),
                  c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")

For every rows in DF1, I have to look if there is a row in DF2 that has the same date and id. If yes, I have to replace the values in DF1 by the values in DF2.

DF2 will always have less columns than DF1. If a column is not in DF2, I must keep the original value that was in DF1 for that specific column.

The final output would like this:

results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
                        "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                        "06/22/2016", "06/23/2016"),
                      c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                      c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
                      c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                      c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                        "NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")

Do you have any suggestions?

Jaap Jaap · Accepted Answer · 2016-06-23T14:21:08

You could use the join functionality of the data.table-package for this:

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]

which gives:

> DF1
          date id sales cost city
 1: 06/19/2016  1  9999  101  LON
 2: 06/20/2016  1   150  102  MTL
 3: 06/21/2016  1   151  104  MTL
 4: 06/22/2016  1   152  107  MTL
 5: 06/23/2016  1   155   99  MTL
 6: 06/19/2016  2    84   55   NY
 7: 06/20/2016  2    83   55   NY
 8: 06/21/2016  2    80   56   NY
 9: 06/22/2016  2   777   57   QC
10: 06/23/2016  2   555   58   QC

When you have many columns in both datasets, it is easier to use mget instead off typing all the column names. For the used data in the question it would look like:

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]

When you want to construct a vector of columnnames that need to be added beforehand, you could do this as follows:

cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]

Replace specific values based on another dataframe

3 Answers