6
votes

My question is very similar to

Normalizing y-axis in histograms in R ggplot to proportion by group

Except, I need density plots and I would like to have the y-axis as a rate like x counts per 1000 patients.

I have multiple groups of data of different sizes, and I would like that each proportion is relative to its group size instead of the total size.

To make it clearer, let's say I have two sets of data in a data frame

example data:

dataA<-rnorm(10000,3,sd=2)
dataB<-rnorm(40000,5,sd=3)
bp_combi<-data.frame(dataset=c(rep('A',length(dataA)),rep('B',length(dataB))),
                     value=c(dataA,dataB))

I can plot the distributions together relative to the total size, but not to the relative size.

combi_dens = ggplot(bp_combi, 
                    aes(x=value, 
                        number_of_cases=nrow(bp_combi),
                        y=(..count..)/number_of_cases*1000, fill=dataset)) +
               geom_density(bw = 1, alpha=0.4, size = 1.5 )

is it possible to have it relative to each group size?

Thanks!

1

1 Answers

5
votes

For those still interested. The answer is rather simple. First create a separate column with the relative group sizes and use that column in ggplot.

unique_episodes = bp_combi %>% group_by(dataset) %>% count(dataset)
data2 = merge(x = bp_combi, y = unique_episodes, by = "dataset", all.x = TRUE)


combi_dens = ggplot(bp_combi, 
                    aes(x=value,,
                        y=(..count..)/n*1000, fill=dataset)) +
  geom_density(bw = 1, alpha=0.4, size = 1.5 )