0
votes

I am using the the aggts() function from the hts package to aggregate my hierarchical time series. The function replaces NAs by zeros before the time series are aggregated. This is useful if at least one of the observations is not NA. But if all observations for the given time are NA I want to keep NA instead of 0.

Edit (working example):

library(hts)

df <- data.frame(
  AB = c(5, 10, 15, NA, 25, 30, NA, 40)
  , AA = c(10, 20, 30, NA, 50, 60, 70, 80)
)

hts_object <- hts(df)

> aggts(hts_object)
Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4     0  0  0
5    75 25 50
6    90 30 60
7    70  0 70
8   120 40 80

But what I need is:

> aggts(hts_object)
Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4    NA NA NA
5    75 25 50
6    90 30 60
7    70 NA 70
8   120 40 80

Edit2 (after updating 'hts' package):

> aggts(hts_object)
Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4    NA NA NA
5    75 25 50
6    90 30 60
7    NA NA 70
8   120 40 80

This is not what I was expecting. Maybe this will be more clear with some background information. Due to Covid-19 I have to flag several monthly data points as outliers. If the observations across all hierarchy levels are NAs, I would like to keep the NAs after aggregating the time series. But if not all observations at a specific hierarchy level are NAs the sum is required.

My real life business examples are:

  • global outliers for all hierarchy levels (like for Covid-19)

    --> all aggregated time series should contain NA if all bottom time series are NA

  • products with different market entry time (some time series have leading NAs)

    --> aggregated levels require sum(na.rm = TRUE)

  • classic missing observations

    --> aggregated levels require sum(na.rm = TRUE) and maybe interpolating is required beforehand

2

2 Answers

1
votes

Now fixed in the dev version at https://github.com/earowang/hts

If the NAs are actually zeros, then replace them with zeros.

If they are really missing, then by definition the sum must be NA because you cannot know its value. You could replace them with estimates, or proceed with the data containing NAs. Some models, including ARIMA models, will handle NAs without a problem.

0
votes

You can consider pre-processing the dataframe such that NA values are converted to zeros UNLESS the entire row contains only NAs:

library(dplyr)

df %>%
  
  # label which rows contain only NAs
  plyr::adply(1, 
              .fun = function(x) ifelse(all(is.na(x)), TRUE, FALSE)) %>%
  
  # in all columns with numeric data, convert NAs to zeros, UNLESS the row contains only NAs
  mutate(across(where(is.numeric),
                function(x) ifelse(is.na(x) & !V1, 0, x))) %>%
  
  # remove column of NA-only labels created in first step
  select(-V1) %>%
  
  hts() %>%
  aggts()

Result using dev version of hts::aggts:

Time Series:
Start = 1 
End = 8 
Frequency = 1 
  Total AB AA
1    15  5 10
2    30 10 20
3    45 15 30
4    NA NA NA
5    75 25 50
6    90 30 60
7    70  0 70
8   120 40 80