1
votes

I want to compare two histograms in a graph in R, but couldn't imagined and implemented. My histograms are based on two sub-dataframes and these datasets divided according to a type (Action, Adventure Family) My first histogram is:

split_action <- split(df, df$type)
dataset_action <- split_action$Action
hist(dataset_action$year)

split_adventure <- split(df, df$type)
dataset_adventure <- split_adventure$Adventure
hist(dataset_adventure$year)

I want to see how much overlapping is occured, their comparison based on year in the same histogram. Thank you in advence.

2
Please give a minimal reproducible example in your question!jogo

2 Answers

2
votes

Using the iris dataset, suppose you want to make a histogram of sepal length for each species. First, you can make 3 data frames for each species by subsetting.

irissetosa<-subset(iris,Species=='setosa',select=c('Sepal.Length','Species'))
irisversi<-subset(iris,Species=='versicolor',select=c('Sepal.Length','Species'))
irisvirgin<-subset(iris,Species=='virginica',select=c('Sepal.Length','Species'))

and then, make the histogram for these 3 data frames. Don't forget to set the argument "add" as TRUE (for the second and third histogram), because you want to combine the histograms.

hist(irissetosa$Sepal.Length,col='red')
hist(irisversi$Sepal.Length,col='blue',add=TRUE)
hist(irisvirgin$Sepal.Length,col='green',add=TRUE)

you will have something like this You can set the histogram breaks up to you, in this picture I set it as 15

Then you can see which part is overlapping... But, I know, it's not so good. Another way to see which part is overlapping is by using density function.

plot(density(irissetosa$Sepal.Length),col='red')
lines(density(irisversi$Sepal.Length),col='blue')
lines(density(irisvirgin$Sepal.Length,col='green'))    

Then you will have something like this You can set the bandwidth if you want to

Hope it helps!!

0
votes

You don't need to split the data if using ggplot. The key is to use transparency ("alpha") and change the value of the "position" argument to "identity" since the default is "stack".

Using the iris dataset:

library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
  geom_histogram(binwidth=0.2, alpha=0.5, position="identity") +
  theme_minimal()

enter image description here

It's not easy to see the overlap, so a density plot may be a better choice if that's the main objective. Again, use transparency to avoid obscuring overlapping plots.

ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
  geom_density(alpha=0.5) +
  xlim(3.9,8.5) +
  theme_minimal()

enter image description here

So for your data, the command would be something like this:

ggplot(data=df, aes(x=year, fill=type)) +
  geom_histogram(alpha=0.5, position="identity")