0
votes

I have a data.frame referencing scientific papers on a specific subjects. All these papers have been published between 1985 and 2015.

The structure of my data.frame (my_db) is:

"Authors" , "Title"       , "Year"
"Doe, J." , "Hello world" , 1985
"Foo, B." , "Baz"         , 2013
...

I want to plot an histogram showing the count of entries per year. I also want the histogram to use one line, and not beams. Therefor, I plot my graph as:

ggplot(my_db, aes(Year)) + stat_bin(geom = "line", binwidth = 1, center = 1980)

I get following graph

enter image description here

The line starts at year 1985 (what is tho oldest entry I have). However, I'd like the line to plot 0s for years before 1985 (see red line), so that it is more obvious there are no data prior 1985. I've tried coord_cartesian(xlim = c(1980,2015)) : it changes the graph span, but the line remains the same...

1
wouldn't it be better to use geom_density() instead of stat_bin()?? - PavoDive
@PavoDive ggplot(my_db, aes(Year)) + geom_density(aes(window = "rectangular",from = 1980), stat = "count") is indeed a great start, but the "binwidth" seems too wide... - ebosi

1 Answers

1
votes

Considering PavoDive suggestion, following graph did the job:

ggplot(my_bd, aes(Year))  + geom_freqpoly(aes(y = ..count..),binwidth = 1, center = 1980) + scale_x_continuous(limits = c(1980,2015))

result