7
votes

Question: How to locally interpolate over small lengths of NaNs?

I have a time series ("x" data sampled evenly at "t" times) that has blocks of NaNs. For example:

x = [ 1   2   4    2 3 15 10 NaN NaN NaN NaN 2 4 NaN 19 25]
t = [0.1 0.2 0.3 ...etc..]

I want to perform interpolation over the NaN.

The most elementary approach would be to just linearly interpolate from the left-most data point to the right-most data point. Eg. a line from x = 10 to x = 2 and the 4 NaN values will be assigned values from the line.

The length of the time series is ~1.5 million with ~10000 NaNs, so I don't want to incorporate data (in the interpolation) that is far away from the NaN locations. Some of the NaNs span a length of 1000-2000.

X(isnan(X)) = interp1(find(~isnan(X)), X(~isnan(X)), find(isnan(X)), 'linear'); 

will linearly interpolate over the NaN using the whole time series.

How would I interpolate locally? Linear should be sufficient. Perhaps linear interpolation incorporating a few points to the left and to the right of the NaN blocks (maybe 100-200 points). A natural neighbour or spline (?) algorithm might be more suitable; I must be careful in not adding anomalous behaviour to the time series (e.g. interpolation that adds fictitious "power" to a frequency).

UPDATE: The time series is a record of a minute-sampled temperature over a year long period. Linear interpolation is sufficient; I just need to fill in the ~6-7 hour length gaps of NaNs (I am provided with data before the NaN gaps and after the NaN gaps).

2
Linear interpolation only uses the values adjacent to the region being interpolated, so there is no need to worry about "using the whole time series". Or is the problem performance?Jonas
Ah silly me. I was under the impression that it was using a least squares linear fit then assigning points using the fit. If interp1 'linear' just joins the neighbouring left and right points and interpolates, what differences do 'cubic' and 'pchip' make? E.g. It doesn't fit a cubic over the data then interpolate?Justin
Are you asking what is the best method of interpolation? If so, then the best method really depends on your application. For example, for some applications, you might only want to interpolate using past data, as a method like linear interpolation implies you know ahead of time what the next non-NaN observation will be. At the other end of the spectrum you could apply an EM algorithm which replaces missing observations with their conditionally expected values, conditional on the joint distribution of every other observation. So it is hard to answer without knowing your application.Colin T Bowers
@JustinChiu: Cubic fits a spline, which uses two data points on either side of the region being interpolated to define the curve in between.Jonas

2 Answers

5
votes

I think this is (at least partially) what you seek:

% example data
x = [ 1   2   4    2 3 15 10 NaN NaN NaN NaN 2 4 NaN 19 25];
t = linspace(0.1, 10, numel(x));

% indices to NaN values in x 
% (assumes there are no NaNs in t)
nans = isnan(x);

% replace all NaNs in x with linearly interpolated values
x(nans) = interp1(t(~nans), x(~nans), t(nans));

note that you can easily switch interpolation method here:

% cubic splines
x(nans) = interp1(t(~nans), x(~nans), t(nans), 'spline');

% nearest neighbor
x(nans) = interp1(t(~nans), x(~nans), t(nans), 'nearest');
3
votes

Consider using inpaint_nans, a very nice tool designed to interpolate NaN elements in a 1-d or 2-d array using non-NaN elements. It can also extrapolate, as it does not use a triangulation of the data. It also allows different approaches to the interpolation.