0
votes

Using ggplot2 in R i want to plot histogram histograms starting strictly at the minimal value of the dataset and it must end strictly at the maximal value of dataset.

When adding vertical lines on minimums and maximums, bins of histogram are overlapping that values. I have tried to shrink bins, or to change their quantity, and also to reduce space between them. But nothing helped.

bins = 5
bwidth =  (max(data$deltaQ)-min(data$deltaQ))/bins
ggplot(data=data ) +
  geom_histogram(
    mapping=aes(x=data$deltaQ)
    , binwidth = bwidth 
    , na.rm = TRUE
    , fill = "yellow"
    , color = "black" 
    , position="stack"   #identity, dodge, stacked
    , boundary=0
  )+
  geom_vline(xintercept = min(data$deltaQ) , color = "green" , na.rm = TRUE, mapping=aes(size=5)  )+
  geom_vline(xintercept = max(data$deltaQ) , color = "green" , na.rm = TRUE, mapping=aes(size=5))+
  geom_vline(mapping=aes(size=5)  , xintercept = min(data$deltaQMin) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = max(data$deltaQMin) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = max(data$deltaQMax) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = min(data$deltaQMax) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  xlim(-50,50)

Current hist() or geom_histogram have bin center in minimum and maximum which causes overlapping. I need to exclude possibility of bin crossing the minimal or maximal value.

1
We don't have your data, so we can't run this code, and we can't see any output, so we don't know what exactly is the problem you're saying doesn't work. See here on making an R post that easy to help with.camille
Also, you can use xintercept inside the aes of geom_vline. Instead of using the same geom 6 times, you'd probably be better off reshaping your data to fit the ggplot paradigm and only calling that geom oncecamille

1 Answers

1
votes

Try to set your boundary argument to the min() or max() of the data in your call to geom_histogram.

Using the diamonds dataset from ggplot2, you can see that setting the boundary to min(diamonds$carat) gives you boundaries at the minimum and maximum values of the data. max(diamonds$carat) does the same.

library(tidyverse)

data(diamonds)
diamonds <- filter(diamonds, carat <= 1)

ggplot(diamonds, aes(x = carat)) +
  geom_histogram(boundary = min(diamonds$carat)) +
  geom_vline(aes(xintercept = min(carat)), color = 'red') +
  geom_vline(aes(xintercept = max(carat)), color = 'red')

enter image description here