1
votes

I want to plot a line graph showing no. of articles published over time. X axis should be date, Y axis should be no. of articles published that day. My dataframe looks something like

Article_title   Date_Published
Title 1         2016-08-11
Title 2         2016-08-11
Title 3         2016-08-11
Title 4         2016-08-12
Title 5         2016-08-13
Title 6         2016-08-13
Title 7         2016-08-14
Title 8         2016-08-14
Title 9         2016-08-14
Title 10        2016-08-14

So what I'm looking for is to count the data by date so I get a graph which looks like enter image description here

Not bothered with the actual formatting, I want to know how to plot the aes() section in ggplot().

Thanks in advance.

2
ggplot(dat, aes(x=Date_Published)) + stat_count(geom='line', aes(y=..count..)), assuming your data frame is called dat. ..count.. is an internal variable that stat_count creates to store the count values.eipi10
Thanks, this worked perfectly!jceg316
Note a line graph is misleading here, since there's no definition of publication count between the time points. Should probably use bars instead.Spacedman
Hey @eipi10, if I wanted to add geom_smooth, putting +geom_smooth() on the end results in an error (missing y aesthetic). Is there a better way to add geom_smooth? Thanks.jceg316

2 Answers

3
votes

Use table to get the counts. Given sample data thus:

> data
   Article_title Date_Published
1        Title 1     2016-08-11
2        Title 2     2016-08-11
3        Title 3     2016-08-11
4        Title 4     2016-08-12
5        Title 5     2016-08-13
6        Title 6     2016-08-13
7        Title 7     2016-08-14
8        Title 8     2016-08-14
9        Title 9     2016-08-14
10      Title 10     2016-08-14

table on the date column does the counts for you:

> table(data$Date_Published)

2016-08-11 2016-08-12 2016-08-13 2016-08-14 
         3          1          2          4 

Wrap table in some other stuff to get a neat data frame suitable for gigglyplot:

> setNames(data.frame(table(data$Date_Published)),c("Date","Count"))
        Date Count
1 2016-08-11     3
2 2016-08-12     1
3 2016-08-13     2
4 2016-08-14     4

Then do a line plot with that, using Date as x aesthetic and Count as y.

You might also want to convert that column to actual data objects. As posted, you've not given a reproducible example.

2
votes

This is count data, so we'll use a poisson model. Compared with my comment, I've also changed over to pre-summarizing the data in the initial call to ggplot since we need to do that for geom_smooth anyway. I've also changed to a point geom, which makes more sense if you're adding a smoothing line. The poisson model assumes the counts are independent. If they're serially correlated, a time series model would be more appropriate.

library(tidyverse)

ggplot(dat %>% count(Date_Published), aes(x = Date_Published, n)) + 
  geom_point() +
  geom_smooth(method = "glm", method.args = list(family = poisson)) +
  scale_y_continuous(limits = c(0,9), breaks = seq(0,9,2)) +
  labs(x = "Publication Date")

enter image description here