How to merge a large dataset and a small dataset on POSIXct and Date respectively?

Question

Subject

I have two (simplified) datasets:

A dataset of 500 observations of some.value every hour (date.time variable as POSIXct)
A dataset of 10 daily temperatures (date variable as Date)

The objective is to add the temperature of the second dataset as a new variable to the first dataset where the variable date.time corresponds to the date variable.

I tried a data.table solution using setkey() and roll="nearest" according to : R – How to join two data frames by nearest time-date?

Unfortunately the temperature that gets merged is always the same value for the entire merged dataset.

A simplified example

Here is the exemple code that illustrates my problem and my solution attempt:

Setting random seed

set.seed(10)

Generating the two datasets

observations <- data.frame(date.time = seq(from=ymd_hms("2017-02-01 00:00:00"), length.out=500, by=60*60), some.value = runif(500,0.0,1.0))
daily.temperature <- data.frame(date = seq(from=as.Date("2017-02-01"), length.out = 10, by=1), temperature = runif(10,10,40))

Solution attempt using data.tables and roll="nearest"

# converting dataframes to datatables
library(data.table)
observations <- as.data.table(observations)
daily.temperature <- as.data.table(daily.temperature)

# setting the keys of the two datasets
setkey(observations,date.time)
setkey(daily.temperature,date)

# Combinding the datasets
combined <- daily.temperature[observations, roll = "nearest" ]
combined

Note that the temperature variable in the combined dataset is always the same regardless of date.

Notes regading the unsimplified (real) problem:

In my real problem the observations are recorded every minute instead of every hour.
In my real problem the daily.temperature dataset does not cover the entire range of observations. In that case, adding 'NA' or nothing at all as the temperature would be fine.

Antonis Antonis · Accepted Answer · 2018-04-24T09:10:21

Do you want something like this?

    set.seed(10)
library(dplyr)
    observations <- data.frame(date.time = seq(from=ymd_hms("2017-02-01 00:00:00"), length.out=500, by=60*60), some.value = runif(500,0.0,1.0))
    daily.temperature <- data.frame(date = seq(from=as.Date("2017-02-01"), length.out = 10, by=1), temperature = runif(10,10,40))
    observations$date<-as.Date(observations$date.time)
    combined<-left_join(observations,daily.temperature,by="date")
> head(combined)
            date.time some.value       date temperature
1 2017-02-01 00:00:00  0.8561467 2017-02-01    38.64702
2 2017-02-01 01:00:00  0.7820957 2017-02-01    38.64702
3 2017-02-01 02:00:00  0.2443390 2017-02-01    38.64702
4 2017-02-01 03:00:00  0.3138552 2017-02-01    38.64702
5 2017-02-01 04:00:00  0.1284753 2017-02-01    38.64702
6 2017-02-01 05:00:00  0.9299472 2017-02-01    38.64702

How to merge a large dataset and a small dataset on POSIXct and Date respectively?

1 Answers