0
votes

I've created three datasets (banks2016, banks2017, banks2018 filtered by year. I've made a single plot with the three datasets, So three different lines with different colours.

The issue I have is that given there are weekly transactions I have four points showing for each month on for that particular month. e.g if I've got paymentd 1-1-16 , 8-1-16, 15-1-16, 22-1-16 they are all showing on the January line. Ideally I'd like to have the line and point going between Jan and February.

I've tried a few different things including using scales package date_breaks. I've tried to change the way I use lubridate but to no avail. Any suggestions?

Below is my code.

ggplot(rbind(banks2016,banks2017,banks2018), 
       aes(month(Date, label=TRUE, abbr=TRUE), Balance, 
       group = factor(year(Date)), colour=factor(year(Date)))) +  
  geom_line() +
  geom_point() +
  labs(x="Month", colour="Year") +
  theme_classic()

and a dput for banks2016 . I want to plot the total balance based on the Date. So one continious line week to week but the x lab being month. Looking more closely at the data now the dates are not always on a weekly basis as I initially thought. I may have to rework the data.

structure(list(Date = structure(c(17038, 17038, 17038, 17031, 17029, 17024, 17022, 17017, 17017, 17014, 17009, 17008, 16996, 16989, 16989, 16987, 16987, 16987, 16983), class = "Date"), Debits = c(NA, NA, 1686451.25, NA, NA, 3111755.91, NA, NA, 25100, 3.66, NA, NA, 313.26, NA, 1566.27, NA, NA, NA, 0.8), Credits = c(14693.48, 10250, NA, 409.25, 5655863.07, NA, 2304.45, 2443, NA, NA, 300, 122, NA, 8716.45, NA, 30000, 25000, 5993.6, NA), Balance = c(15824841.24, 15810147.76, 15799897.76, 17486349.01, 17485939.76, 11830076.69, 14941832.6, 14939528.15, 14937085.15, 14962185.15, 14962188.81, 14961888.81, 14961766.81, 14962080.07, 14953363.62, 14954929.89, 14924929.89, 14899929.89, 14893936.29)), row.names = c(NA, -19L ), class = "data.frame")

1
can you add your data (eg the output from dput(head(banks201x)) into the question as an edit? (Not in the comments please)morgan121
Also, could you pls clarify what you'd like to be represented by the Jan and Feb points? Totals? Averages? First or last of the four points?Jon Spring

1 Answers

0
votes

It sounds like you want the x-axis to show January - December, and each line to show the balance over time for a separate calendar year; is that right? If so, one technique (described in this excellent answer) is to create a new date column that puts all the dates in the same year, and plot that, but group by the year in the real date. Here's how that would look for your dataset:

library(ggplot2)
library(lubridate)
library(dplyr)

# Posted dataset.
banks = structure(list(Date = structure(c(17038, 17038, 17038, 17031, 17029, 17024, 17022, 17017, 17017, 17014, 17009, 17008, 16996, 16989, 16989, 16987, 16987, 16987, 16983), class = "Date"), Debits = c(NA, NA, 1686451.25, NA, NA, 3111755.91, NA, NA, 25100, 3.66, NA, NA, 313.26, NA, 1566.27, NA, NA, NA, 0.8), Credits = c(14693.48, 10250, NA, 409.25, 5655863.07, NA, 2304.45, 2443, NA, NA, 300, 122, NA, 8716.45, NA, 30000, 25000, 5993.6, NA), Balance = c(15824841.24, 15810147.76, 15799897.76, 17486349.01, 17485939.76, 11830076.69, 14941832.6, 14939528.15, 14937085.15, 14962185.15, 14962188.81, 14961888.81, 14961766.81, 14962080.07, 14953363.62, 14954929.89, 14924929.89, 14899929.89, 14893936.29)), row.names = c(NA, -19L ), class = "data.frame")
# The posted dataset is for only one year (2016).  Duplicate it for 2017 and
# 2018, and change the balances a bit, so we can see the grouping.
banks = bind_rows(
  banks,
  banks %>%
    mutate(Date = Date + years(1),
           Balance = Balance * 1.1),
  banks %>%
    mutate(Date = Date + years(2),
           Balance = Balance * 1.2)
)

# Add a utility "date for plotting" field that puts all the dates in the year
# 2000.
banks = banks %>%
  mutate(DateToPlot = Date - years(year(Date) - 2000))

# Plot Balance as a function of DateToPlot.  Group/color by year.  Make the
# x-axis labels look pretty.
ggplot(banks, 
       aes(x = DateToPlot, y = Balance,
           group = factor(year(Date)), colour=factor(year(Date)))) +  
  geom_line() +
  geom_point() +
  scale_x_date(date_breaks = "1 month",
               date_labels = "%B") +
  labs(x="Month", colour="Year") +
  theme_classic()