3
votes

I am trying to use na.spline, part of the zoo package, to replace NA values in some imported speed data with a cubic spline interpolated values.

na.spline is modifying the NA values as it is supposed to; however, it is also modifying values that originally equaled 0.

ex <- data.frame(speed)
ex$speed2 <- na.spline(ex$speed)

My data set is ~1400 values. I have included the first ~40 values below. Here you see the original speed values and the incorrectly interpolated results in speed 2:

speed       speed2
NA          8.639277e-06
0.000000    0.000000e+00
0.000000    0.000000e+00
0.000000    0.000000e+00
0.000000    -1.694066e-21
0.000000    0.000000e+00
0.000000    -2.710505e-20
0.000000    0.000000e+00
0.000000    -4.336809e-19
0.000000    0.000000e+00
0.000000    6.938894e-18
0.000000    0.000000e+00
0.000000    1.110223e-16
2.661698    2.661698e+00
3.107128    3.107128e+00
7.319669    7.319669e+00
10.800864   1.080086e+01
17.855491   1.785549e+01
18.250267   1.825027e+01
28.587002   2.858700e+01
36.405397   3.640540e+01
38.467383   3.846738e+01
38.685956   3.868596e+01
43.917737   4.391774e+01
40.829615   4.082962e+01
43.519173   4.351917e+01
45.597497   4.559750e+01
43.252656   4.325266e+01
45.581646   4.558165e+01
48.258325   4.825832e+01
48.269969   4.826997e+01
50.905045   5.090505e+01
53.258165   5.325817e+01
58.391370   5.839137e+01
59.278440   5.927844e+01
58.720518   5.872052e+01
56.933438   5.693344e+01
62.062116   6.206212e+01
59.860849   5.986085e+01
60.183378   6.018338e+01

Has anyone seen a similar issue or have an alternative method to replace the NA value with interpolated data?

1
See here: stackoverflow.com/questions/18695335/… - though na.spline should just work.thelatemail
Actually, looking at the example data, the spline has worked fine. Have you tried plotting the two sets of data?thelatemail
In most applications I would be fine with the results, but I have to perform some vehicle simulations with the results and I need 0 speed to be exactly 0. Although something like 1.6 e-21 is very close to 0, it will be the difference between switch=ON and switch=OFF in the simulation and that is not good.bwhitzo
Try round(na.spline(df$speed), 6)Steven Beaupré
I agree with @StevenBeaupré - you need to set a tolerance for your output by which you determine how close to 0 something needs to be until you consider it effectively 0.thelatemail

1 Answers

3
votes

Internally it does this (which does not involve zoo):

y <- c(NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.661698, 3.107128, 
7.319669, 10.800864, 17.855491, 18.250267, 28.587002, 36.405397, 
38.467383, 38.685956, 43.917737, 40.829615, 43.519173, 45.597497, 
43.252656, 45.581646, 48.258325, 48.269969, 50.905045, 53.258165, 
58.39137, 59.27844, 58.720518, 56.933438, 62.062116, 59.860849, 
60.183378)

x <- xout <- seq_along(y)
na <- is.na(y)

splinefun(x[!na], y[!na])(xout)

giving:

 [1]  8.639280e-06  0.000000e+00  0.000000e+00  0.000000e+00  3.388132e-21
 [6]  0.000000e+00 -2.710505e-20  0.000000e+00  4.336809e-19  0.000000e+00
[11]  6.938894e-18  0.000000e+00  0.000000e+00  2.661698e+00  3.107128e+00
[16]  7.319669e+00  1.080086e+01  1.785549e+01  1.825027e+01  2.858700e+01
[21]  3.640540e+01  3.846738e+01  3.868596e+01  4.391774e+01  4.082961e+01
[26]  4.351917e+01  4.559750e+01  4.325266e+01  4.558165e+01  4.825832e+01
[31]  4.826997e+01  5.090505e+01  5.325816e+01  5.839137e+01  5.927844e+01
[36]  5.872052e+01  5.693344e+01  6.206212e+01  5.986085e+01  6.018338e+01

Also note that this would zero out output components corresponding to zero values in the input:

na.fill(y != 0, 1) * na.spline(y)