1
votes

Using R, I am trying to make a simple stacked bar graph of the counts of different settlement types by date. I have 3 ways of accounting for date. Below is an example of my database

ID Settlement Start End Mid 01 Urban 200 400 300 02 Rural 450 850 650 03 Military 1300 1400 1350 04 Castle 2 1000 501

so far I have

count(ratData, vars = "Settlement")

which returns

Settlement freq 1 78 2 Castle 25 3 Cave 3 4 Fortification 5 5 Hill Fort 2 6 Industrial (quarry) 1 7 Manor 2 8 Military 4 9 Military camp 1 10 Military Camp 3 11 Military site 1 12 Mining 1 13 Monastic 15 14 Monastic/Rural? 1 15 Port 5 16 River-site 2 17 Roman fort 1 18 Roman Fort 1 19 Roman settlement 3 20 Rural 22 21 Settlement 2 22 urban 1 23 Urban 123 24 Villa 4 25 Wic 13

Then to plot

ggplot(v, aes(x=Settlement, y=freq)) + geom_bar(stat='identity', fill='lightblue', color='black')

This however shows settlement type on the x axis instead of stacking the settlement types. This is missing date data. I would like to bin them into 100 year bins from 1-1500 and make a stacked bar graph of settlement types per bin to illustrate presence over time.

1
We need some more information here. What is the variable that you would like to use to bin? Start? End? Mid?JmeCS
Mid would be most appropriateMathew James

1 Answers

1
votes

This should do the trick. The cut function is very useful in situations like this where you need to create a categorical variable based on some range of a continuous variable. I've gone the Tidyverse route but there are base R options as well.

library(dplyr)
library(ggplot2)

# Some dummy data that resembles your problem
s <- data.frame(ID = 1:100,
                Settlement = c(rep('Urban', 50), rep('Rural', 20), rep('Military', 10), rep('Castle', 20)),
                Start = signif(rnorm(100, 500, 100), 2),
                End = signif(rnorm(100, 1000, 100), 2))
s$Mid <- s$Start + ((s$End - s$Start) / 2)

# Find the range of the mid variable to decide on cut locations
r <- range(s$Mid)

# Make a new factor variable based year bins - you will need to change to match your actual data
s$group <- cut(s$Mid, 5, labels = c('575-640', '641-705', '706-770', '771-835', '836-900'))

# Frequency count per factor level
grouped <- s %>%
  group_by(group) %>%
  count(Settlement)

# You'll need to clean up axis labels, etc.
ggplot(grouped, aes(x = group, y = n, fill = Settlement)) +
  geom_bar(stat = 'identity')

enter image description here