1
votes

I am trying to achieve a histogram with two main properties, and have managed to generate each individually but have no idea how to combine the two methods to produce what I want.

I am aiming for a stacked histogram plotting Age (in 5 year bins) along the X axis, split into two stacked bars (Admission=1, Admission=0), displayed as a proportion/percentage of the bin admitted/not admitted.

dataset:

> dput(head(example_data))
structure(list(GAPS = c(26L, 16L, 21L, 15L, 17L, 13L), Age = c(62L, 
62L, 62L, 58L, 70L, 70L), Admitted = c(0L, 1L, 1L, 0L, 0L, 0L
)), row.names = c(NA, 6L), class = "data.frame")

I am working in R, but the dataset originates from a pandas dataframe and if an easier solution exists in python matplotlib etc, I am happy to use that instead.

So far, I can generate a bar chart for each age with the proportions on the Y axis as desired by using the below code:

myTable<-table(dataset$Admitted, dataset$Age)
myTable
myTable2<-prop.table(myTable, 2)

barplot(myTable2)
barplot(myTable2,legend=rownames(myTable2), xlab="Age", col=c(7, 4))

bar_proportions I can also easily create a simple binned histogram, by simply using

hist(dataset$Age)

My question is how I can go about adapting the barplot method to instead incorporate a histogram, as the result is too busy including every age possibility.

1
You need to provide data, please add output of dput(dataset) as an edit to your question. - jay.sf
It is an extremely large dataset that can't be displayed by that function, would it be acceptable to include head(dataset) instead perhaps? - purpleeggshells
Yes but dput is important, you could use dput(head(dataset)). You could make a smaller example though, no need to provide all the data. Read: stackoverflow.com/questions/5963269/… - jay.sf
Ok, I've modified the dataset so the important variables are included in dput(head(dataset)), I hope that's helpful - purpleeggshells

1 Answers

0
votes

You could create bins, e.g. every ten years. For this use the cut() function.

dat$age.bins <- as.numeric(as.character(cut(dat$age, breaks=(0:10)*10, labels=(1:10)*10)))
myTable3 <- with(dat, prop.table(table(adm, age.bins), 2))
barplot(myTable3, legend=rownames(myTable3), xlab="Age", col=c(7, 4))

Result

enter image description here

Data

set.seed(42)
dat <- data.frame(age=rnbinom(1e4, 5, mu=30),
                  adm=rbinom(1e4, 1, 2/6))