0
votes

I'm trying to create a monthly time series in ggplot for time series analysis. This is my data:

rdata1 <- read_table2("date  sales_revenue_incl_credit 
                                    2017-07 56,037.46
                                    2017-08 38333.9
                                    2017-09 48716.92
                                    2017-10 65447.67
                                    2017-11 134752.57
                                    2017-12 116477.39
                                    2018-01 78167.25
                                    2018-02 75991.44
                                    2018-03 42520.93
                                    2018-04 70489.92
                                    2018-05 121063.35
                                    2018-06 76308.47
                                    2018-07 118085.7
                                    2018-08 96153.38
                                    2018-09 82827.1
                                    2018-10 109288.83
                                    2018-11 145774.52
                                    2018-12 141572.77
                                    2019-01 123055.83
                                    2019-02 104232.24
                                    2019-03 435086.33
                                    2019-04 74304.96
                                    2019-05 117237.82
                                    2019-06 82013.47
                                    2019-07 99382.67
                                    2019-08 138455.2
                                    2019-09 97301.99
                                    2019-10 137206.09
                                    2019-11 109862.44
                                    2019-12 118150.96
                                    2020-01 140717.9
                                    2020-02 127622.3
                                    2020-03 134126.09")

I now use the below code to change the class of date and then plot with breaks and labels much easier using date_labels and date_breaks.

rdata1 %>%
  mutate(date = ymd(date)) %>%
  ggplot(aes(date, sales_revenue_incl_credit)) +
  geom_line() +
  scale_x_date(date_labels = "%b %Y", date_breaks = "1 month")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, vjust=0.5), 
        panel.grid.minor = element_blank())

I get the following error:

Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) : 'from' must be a finite number

2
It seems the ymd() function didn't pick up your dates properly. Try mutate(date = ymd(paste0(date, "-01"))).teunbrand
+1 @teunbrand. Test ymd(rdata$date[1]) and you'll see you get NA as the result. Even if you specify via as.Date(rdata$date[1], format="%Y-%m")` it fails to work, since the Date format needs to specify day too. The suggestion would be to just add "-01" to the end of each day in your column and then ymd() will work and so would the as.Date() function if you specify format="%Y-%m-%d").chemdork123
thank you guys . it worksaxel_p
just one last question dont want to start another thread for it how do i give rownames for my monthly time series data ? for eg if i had yearly data rownames(data) <- seq(from=1927, to=2016) any idea about month ?axel_p

2 Answers

1
votes

Putting all these concerns together, I performed some data preparation to obtain your desired output. First, as noted in the comments, I appended the first day of the month to each "year-month" so you can work with a proper date variable in R. Next, I used the column_to_rownames() function on the month_year column. I appended the year to the month name because duplicate (non-unique) row names are not permitted. I should caution you against using row labels. Quoting from the documentation (see ?tibble::rownames_to_column):

While a tibble can have row names (e.g., when converting from a regular data frame), they are removed when subsetting with the [ operator. A warning will be raised when attempting to assign non-NULL row names to a tibble. Generally, it is best to avoid row names, because they are basically a character column with different semantics than every other column.

You can manipulate the row names below with different naming conventions. Just make sure the labels are unique! See the R code below:

# Loading the required libraries

library(tibble)
library(ggplot2)
library(dplyr)
library(lubridate)

df <- tribble( 
  ~date, ~sales_revenue_incl_credit,
  "2017-07", 56037.46,
  "2017-08", 38333.9,
  "2017-09", 48716.92,
  "2017-10", 65447.67,
  "2017-11", 134752.57,
  "2017-12", 116477.39,
  "2018-01", 78167.25,
  "2018-02", 75991.44,
  "2018-03", 42520.93,
  "2018-04", 70489.92,
  "2018-05", 121063.35,
  "2018-06", 76308.47,
  "2018-07", 118085.7,
  "2018-08", 96153.38,
  "2018-09", 82827.1,
  "2018-10", 109288.83,
  "2018-11", 145774.52,
  "2018-12", 141572.77,
  "2019-01", 123055.83,
  "2019-02", 104232.24,
  "2019-03", 435086.33,
  "2019-04", 74304.96,
  "2019-05", 117237.82,
  "2019-06", 82013.47,
  "2019-07", 99382.67,
  "2019-08", 138455.2,
  "2019-09", 97301.99,
  "2019-10", 137206.09,
  "2019-11", 109862.44,
  "2019-12", 118150.96,
  "2020-01", 140717.9,
  "2020-02", 127622.3,
  "2020-03", 134126.09
  )

# Data preparation

df %>%
  mutate(date = ymd(paste0(date, "-01")),
         month_year = paste(month(date, label = TRUE), year(date), sep = "-")
         ) %>%
  column_to_rownames("month_year") %>%  # sets the column labels to row names
  head()

# Preview of the data frame with row names (e.g., Jul-2017, Aug-2017, Sep-2017, etc.)

               date sales_revenue_incl_credit
Jul-2017 2017-07-01                  56037.46
Aug-2017 2017-08-01                  38333.90
Sep-2017 2017-09-01                  48716.92
Oct-2017 2017-10-01                  65447.67
Nov-2017 2017-11-01                 134752.57
Dec-2017 2017-12-01                 116477.39

# Reproducing your plot

df %>%
  ggplot(aes(x = date, y = sales_revenue_incl_credit)) +
  geom_line() +
  scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5), 
        panel.grid.minor = element_blank())
0
votes

A simpler version of @Tom's answer is to use a tsibble object and the feasts package:

# Loading the required libraries

library(tibble)
library(dplyr)
library(ggplot2)
library(lubridate)
library(tsibble)
library(feasts)

# Data preparation

df <- tribble( 
    ~date, ~sales_revenue_incl_credit,
    "2017-07", 56037.46,
    "2017-08", 38333.9,
    "2017-09", 48716.92,
    "2017-10", 65447.67,
    "2017-11", 134752.57,
    "2017-12", 116477.39,
    "2018-01", 78167.25,
    "2018-02", 75991.44,
    "2018-03", 42520.93,
    "2018-04", 70489.92,
    "2018-05", 121063.35,
    "2018-06", 76308.47,
    "2018-07", 118085.7,
    "2018-08", 96153.38,
    "2018-09", 82827.1,
    "2018-10", 109288.83,
    "2018-11", 145774.52,
    "2018-12", 141572.77,
    "2019-01", 123055.83,
    "2019-02", 104232.24,
    "2019-03", 435086.33,
    "2019-04", 74304.96,
    "2019-05", 117237.82,
    "2019-06", 82013.47,
    "2019-07", 99382.67,
    "2019-08", 138455.2,
    "2019-09", 97301.99,
    "2019-10", 137206.09,
    "2019-11", 109862.44,
    "2019-12", 118150.96,
    "2020-01", 140717.9,
    "2020-02", 127622.3,
    "2020-03", 134126.09
  ) %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index=date)

# Reproducing your plot

df %>% autoplot(sales_revenue_incl_credit) +
  scale_x_yearmonth(breaks=seq(1e3)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5), 
        panel.grid.minor = element_blank())

Created on 2020-06-19 by the reprex package (v0.3.0)