Redundant dates in time series

Question

I'm setting up a time series from a data Frame ("TotalGuirvidig") in which I have 3 main columns of interest: "Date", "Animals", and a third column I created called "Daily_Animals", a sum of all animal counts on the same day. I created "Daily_Animals" unsing

TotalGuirvidig <- Guirvidig %>% group_by(Date) %>% mutate(Daily_Animals = sum(Animals))

Great, but lets say the date is 07-11-2017, and we observed 40, 23, and 17 animals. Now, in the Daily_Animals column, we get the sum of 80 3 times, once for each original observation.

I didn't think much of this at first, but as I'm shifting the data frame into a time series

GTS <- zoo(TotalGuirvidig$Daily_Animals, order.by=as.Date(TotalGuirvidig$Date, format='%Y/%m/%d'))
ts(GTS)

I'm noticing that the dates with multiple observations are showing up multiple times on the same date because each observation on the same date has the same Daily_Animal sum.

I'm planning on doing some forecasting and I'm concerned that these repeats of the daily sums with throw off my forecasts. Is there some way I can get R to ignore the repeated sums on the same date when establishing my time series?

Edit: Here is an example of the data frame

Year  Week       Date  Location Animals           From            To     Notes Daily_Animals
<int> <int>     <date>     <chr>   <int>          <chr>         <chr>     <chr>         <int>
1  2010    31 2010-08-01 GUIRVIDIG     580 Bongor – Tchad BANKI NIGERIA       RAS           580
2  2010    32 2010-08-08 GUIRVIDIG     780  Glenden Tchad BANKI NIGERIA       RAS           780
3  2010    33         NA GUIRVIDIG      NA           <NA>          <NA>      <NA>            NA
4  2010    34 2010-08-22 GUIRVIDIG     680 Bongor – Tchad BANKI NIGERIA   2 voles           680
5  2010    34 2010-08-23 GUIRVIDIG     880  Glenden Tchad BANKI NIGERIA       RAS           880
6  2010    35 2010-08-29 GUIRVIDIG     495 Bongor – Tchad BANKI NIGERIA       RAS           495
7  2010    35 2010-08-30 GUIRVIDIG     506  Glenden Tchad BANKI NIGERIA 1 malades           506
8  2010    36 2010-09-06 GUIRVIDIG     262   kijabe-tchad BANKI NIGERIA       RAS           262
9  2010    37 2010-09-13 GUIRVIDIG      70  Glenden Tchad BANKI NIGERIA       RAS            70
10  2010    38         NA GUIRVIDIG      NA           <NA>          <NA>      <NA>            NA
# ... with 484 more rows

Edit2: Below is a made up example that actually demonstrates how the new column is being written

Year  Week       Date  Location Animals           From            To     Notes Daily_Animals
<int> <int>     <date>     <chr>   <int>          <chr>         <chr>     <chr>         <int>
1  2010    31 2010-08-01 GUIRVIDIG     40 Bongor – Tchad BANKI NIGERIA       RAS           80
2  2010    32 2010-08-01 GUIRVIDIG     23  Glenden Tchad BANKI NIGERIA       RAS           80
3  2010    34 2010-08-21 GUIRVIDIG     17 Bongor – Tchad BANKI NIGERIA   2 voles           80

Sarah Sarah · Accepted Answer · 2017-07-12T06:40:56

If you don't want to keep a line for each of the individual animal counts then maybe you want to use dplyr::summarise instead of mutate, as mutate will just add a column but will keep all rows?

?summarise 
TotalGuirvidig <- Guirvidig %>% group_by(Date) %>% summarise(Daily_Animals = sum(Animals))

Or you could use dplyr::distinct after mutate and choose which columns you want to keep distinct values from.

?distinct
TotalGuirvidig <- Guirvidig %>% group_by(Date) %>% mutate(Daily_Animals = sum(Animals)) %>%
          distinct(Date, .keep_all = TRUE)

Its a little hard to see exactly what is best as in your example data there are no duplicated Daily_Animals, so if the above don't work maybe show a section with duplicates and the desired output?

Redundant dates in time series

1 Answers