Original Request
The following is an option. It uses full_join, and then the fill function to impute the missing value.
library(tidyverse)
DB_final <- DB %>%
full_join(Hist, by = "Date") %>%
arrange(Date) %>%
fill(Index, .direction = "up") %>%
filter(!is.na(Value))
DB_final
# Value Date Index
# 1 20 2017-10-19 13.517,98
# 2 19 2017-10-23 13.404,58
# 3 19 2017-11-03 13.378,96
# 4 20 2017-11-10 13.206,35
However, the user needs to know the fill direction (up or down) in advance. It may not be useful if the user does not know that.
Impute Missing Value based on the Nearest Date
Here is another option, which I think is more robust. It will impute the missing value use the Index from the nearest date.
Step 1: Find the Nearest Date
# Collect all dates
Date_vec <- sort(unique(c(DB$Date, Hist$Date)))
# Create a distance matrix based on dates than convert to a data frame
dt <- Date_vec %>%
dist() %>%
as.matrix() %>%
as.data.frame() %>%
rowid_to_column(var = "ID") %>%
gather(ID2, Value, -ID) %>%
mutate(ID2 = as.integer(ID2)) %>%
filter(ID != ID2) %>%
arrange(ID, Value) %>%
group_by(ID) %>%
slice(1) %>%
select(-Value)
dt$ID <- Date_vec[dt$ID]
dt$ID2 <- Date_vec[dt$ID2]
names(dt) <- c("Date1", "Date2")
dt
# # A tibble: 5 x 2
# # Groups: ID [5]
# Date1 Date2
# <date> <date>
# 1 2017-10-19 2017-10-23
# 2 2017-10-23 2017-10-25
# 3 2017-10-25 2017-10-23
# 4 2017-11-03 2017-11-10
# 5 2017-11-10 2017-11-03
dt shows the nearest date of all the dates.
Step 2: Perform multiple join
Join DB and dt, and then join Hist twice based on different date columns.
DB2 <- DB %>% left_join(dt, by = c("Date" = "Date1"))
DB3 <- DB2 %>%
left_join(Hist, by = "Date") %>%
left_join(Hist, by = c("Date2" = "Date"))
DB3
# Value Date Date2 Index.x Index.y
# 1 20 2017-10-19 2017-10-23 13.517,98 <NA>
# 2 19 2017-10-23 2017-10-25 <NA> 13.404,58
# 3 19 2017-11-03 2017-11-10 13.378,96 13.206,35
# 4 20 2017-11-10 2017-11-03 13.206,35 13.378,96
Step 3: Finalize the Index
If there are values in Index.x, use that, otherwise, use the values in Index.y.
DB4 <- DB3 %>%
mutate(Index = ifelse(is.na(Index.x), Index.y, Index.x)) %>%
select(Value, Date, Index)
DB4
# Value Date Index
# 1 20 2017-10-19 13.517,98
# 2 19 2017-10-23 13.404,58
# 3 19 2017-11-03 13.378,96
# 4 20 2017-11-10 13.206,35
DB4 is the final output.
DATA
DB <- structure(list(Value = c(20L, 19L, 19L, 20L), Date = structure(c(17458,
17462, 17473, 17480), class = "Date")), class = "data.frame", .Names = c("Value",
"Date"), row.names = c(NA, -4L))
Hist <- structure(list(Date = structure(c(17480, 17473, 17464, 17458), class = "Date"),
Index = c("13.206,35", "13.378,96", "13.404,58", "13.517,98"
)), class = "data.frame", .Names = c("Date", "Index"), row.names = c(NA,
-4L))