0
votes

I'm new to R and I'm trying to figure some stuff out.

I've got a dataset with headers City, Year and Population which I've imported into RStudio.

My sample data looks like:

(Melbourne, 2005, 5000)
(Melbourne, 2010, 4000) 
(Adelaide, 2005, 3000) 
(Adelaide, 2010, 9000)

I want to be able to create another dataset that shows growth rates grouped by city between the years 2005 and 2010. For example, if the 2005 population in Melbourne is 5000 and the 2010 population in Melbourne is 4000 then the growth rate is ((4000-5000)/5000) = -0.2. I want to create another dataset which works out the growth rate for each city.

I'm not sure how to go about implementing this formula to my data.

Could anyone help me out?

Thanks.

1

1 Answers

3
votes

You could use the package dplyr:

df <- data.frame(city = c("Melbourne", "Melbourne", "Adelaide", "Adelaide"), 
                 year = c(2005, 2010, 2005, 2010), 
                 pop = c(5000,4000,3000,9000))

df %>% 
  group_by(city) %>%
  arrange(year) %>%
  mutate(growth = (pop-lag(pop))/lag(pop))



# A tibble: 4 x 4
# Groups:   city [2]
  city       year   pop growth
  <chr>     <dbl> <dbl>  <dbl>
1 Melbourne  2005  5000   NA  
2 Adelaide   2005  3000   NA  
3 Melbourne  2010  4000   -0.2
4 Adelaide   2010  9000    2  

(%>% is called a pipe. We basically "pipe" the result of the former expression into the next one.)