How to keep "true" NAs while aggregating hierarchical or grouped time series using aggts?

Question

I am using the the aggts() function from the hts package to aggregate my hierarchical time series. The function replaces NAs by zeros before the time series are aggregated. This is useful if at least one of the observations is not NA. But if all observations for the given time are NA I want to keep NA instead of 0.

Edit (working example):

library(hts)

df <- data.frame(
  AB = c(5, 10, 15, NA, 25, 30, NA, 40)
  , AA = c(10, 20, 30, NA, 50, 60, 70, 80)
)

hts_object <- hts(df)

> aggts(hts_object)
Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4     0  0  0
5    75 25 50
6    90 30 60
7    70  0 70
8   120 40 80

But what I need is:

> aggts(hts_object)
Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4    NA NA NA
5    75 25 50
6    90 30 60
7    70 NA 70
8   120 40 80

Edit2 (after updating 'hts' package):

> aggts(hts_object)
Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4    NA NA NA
5    75 25 50
6    90 30 60
7    NA NA 70
8   120 40 80

This is not what I was expecting. Maybe this will be more clear with some background information. Due to Covid-19 I have to flag several monthly data points as outliers. If the observations across all hierarchy levels are NAs, I would like to keep the NAs after aggregating the time series. But if not all observations at a specific hierarchy level are NAs the sum is required.

My real life business examples are:

global outliers for all hierarchy levels (like for Covid-19)

--> all aggregated time series should contain NA if all bottom time series are NA
products with different market entry time (some time series have leading NAs)

--> aggregated levels require sum(na.rm = TRUE)
classic missing observations

--> aggregated levels require sum(na.rm = TRUE) and maybe interpolating is required beforehand

Rob Hyndman Rob Hyndman · Accepted Answer · 2020-09-09T00:20:52

Now fixed in the dev version at https://github.com/earowang/hts

If the NAs are actually zeros, then replace them with zeros.

If they are really missing, then by definition the sum must be NA because you cannot know its value. You could replace them with estimates, or proceed with the data containing NAs. Some models, including ARIMA models, will handle NAs without a problem.

How to keep "true" NAs while aggregating hierarchical or grouped time series using aggts?

2 Answers