0
votes

I have a problem with a time series which I don“t know to solve.

I have a tibble with 4 different variables. In my real dataset there are over 10.000 Documents.

document   date             author            label
1          2018-04-05       Mr.X                    1
2          2018-02-05       Mr.Y                    0
3          2018-04-17       Mr.Z                    1

So now my problem is that in the first step I want to count my articles which are occur in a specific month and a specific year for every month in my time series.I know that I can filter for a specific month in a year like this:

tibble%>%
filter(date > "2018-02-01" && date < "2018-02-28")

Result out of this would be a tibble with 1 Observation, but my problem is that I have 360 different time periods in my data. Can I write a function for this to solve this problem or do I need to make 360 own calculations?

The best solution for me would be a table with 360 different columns where in every column the amount of articles which are counted in this month are represented. Is this possible?

Thank you so much in advance.

2
Sorry I need every item as a seperate list as best with the integer how much documents are counted in the specific month. - Sylababa
Can you please include desired outcome in respect of sample data given? I am unable to understand - AnilGoyal
Though I have given a solution to split it for different time periods. Doesn't it is as per your requirement? - AnilGoyal

2 Answers

1
votes

If you want each result into a separate list, you can do something like this

suppressMessages(library(dplyr))

df %>% mutate(date = as.Date(date)) %>%
  group_split(substr(date, 1, 7), .keep = F)

<list_of<
  tbl_df<
    document: integer
    date    : date
    author  : character
    label   : integer
  >
>[2]>
[[1]]
# A tibble: 1 x 4
  document date       author label
     <int> <date>     <chr>  <int>
1        2 2018-02-05 Mr.Y       0

[[2]]
# A tibble: 2 x 4
  document date       author label
     <int> <date>     <chr>  <int>
1        1 2018-04-05 Mr.X       1
2        3 2018-04-17 Mr.Z       1

you can further use list2env() to save each item of this list as a separate item.

1
votes

To count the number of rows for each month-year combination, in tidyverse you can do :

library(dplyr)
library(tidyr)

df %>%
  mutate(date = as.Date(date), 
         year_mon = format(date, '%Y-%m')) %>%
  select(year_mon) %>%
  pivot_wider(names_from = year_mon, values_from = year_mon, 
              values_fn = length, values_fill = 0)

#   `2018-04` `2018-02`
#      <int>     <int>
#1         2         1

and in base R :

df$date <- as.Date(df$date)
table(format(df$date, '%Y-%m'))