How to backfill at equally-spaced timestamps within a Julia `TimeArray` object with unequally spaced observations

Question

Given a TimeArray variable with unequally spaced observations, I would like to insert the "missing" timestamps between timestamps for which there are no observations. Since observations are not available at these new timestamps, I want to replace them with the most recent available data point (backfill). How can I accomplish this in Julia? Thank you for any pointers!

My TimeSeries.TimeArray variable looks like this:

                      price
2011-08-14T14:14:00 | 10.4
2011-08-14T14:15:00 | 10.4
2011-08-14T14:21:00 | 10.5

Now what I want to generate is this

                      price
2011-08-14T14:14:00 | 10.4
2011-08-14T14:15:00 | 10.4
2011-08-14T14:16:00 | 10.4 (back-filled)
2011-08-14T14:17:00 | 10.4 (back-filled)
2011-08-14T14:18:00 | 10.4 (back-filled)
2011-08-14T14:19:00 | 10.4 (back-filled)
2011-08-14T14:20:00 | 10.4 (back-filled)
2011-08-14T14:21:00 | 10.5

Ideally you should show a few lines of code to show how you have tried to do this yourself. — Alexander Morley
There shouldn't be an automatic functionality for this. You can do it manually, by creating a new time series Range and filling in — Michael K. Borregaard

Colin T Bowers Colin T Bowers · Accepted Answer · 2017-05-06T04:42:27

As far as I know, this functionality is not available yet for TimeArray, although I suspect at some point it will be.

In essence, what you actually want boils down to the following general problem: Given a sorted, unique vector x, and a sorted, unique, reference vector m, for each element m[i] find the index j of the last observation in x such that x[j] <= m[i].

There are two ways to do this:

1) If m is small relative to x, then the fastest method will likely be just to call the base Julia function searchsortedlast on x for each element of m, i.e. you will make length(m) calls to the function.

2) If m is large relative to x, then it will probably be faster to loop over m and x and exploit the sort order so that only a single loop is required to find all relevant indices. As with many problems of this type, it is often easier to do it backwards, i.e. start at the end of x and m and work back up. The following function does this, but assumes both inputs are sorted and unique:

function linear_search_last_index{T}(x::Vector{T}, m::Vector{T})::Vector{Int}
    inds = zeros(Int, length(m))
    length(x) == 0 && return(inds)
    nx = length(x)
    nm = length(m)
    x[nx] < m[nm] && (nm = searchlast(m, x[nx]))
    nm == 0 && return(inds)
    while nx >= 1 && nm >= 1
        if x[nx] <= m[nm]
            inds[nm] = nx
            nm -= 1
        else
            nx -= 1
        end
    end
    return(inds)
end

You should be able to just wrap this with a few bells and whistles for the TimeArray case.

How to backfill at equally-spaced timestamps within a Julia `TimeArray` object with unequally spaced observations

1 Answers