0
votes

I have dataframe with country, gender, 2013,2014,2014,2015 column names.

City           Gender  2013  2014  2015
Aberdeen       Female   30    40    50
Aberdeen       Male     20    15    16
Aberdeenshire  Female   60    80    70
Aberdeenshire  Male     50    40    15
.....Includes 425 records.

I want to perform female to male ratio (dividing Female/male for each city) for each city, so this is how i tried to get,

City           2013_ratio  2014_ratio  2015_ration
Aberdeen        1.5        2.66        2.5
Aberdeenshire   1.2        2           4.66

can anyone help me to solve this. I have tried grouping by city but I don't know how to do by getting value by rows in gender.

3
Transform the DF into a long DF using gather(). Then you can group_by(City, Year) and calculate the ratio.Robert

3 Answers

1
votes

The code from Rob's suggested solution would be (with an additional spread() step:

# data
df = data.frame(City = c("a", "a", "b", "b"),
                Gender = c("Female", "Male", "Female", "Male"),
                `2013` = c(30, 20, 60, 50),
                `2014` = c(40, 15, 80, 40),
                `2015` = c(50, 16, 70, 15))

# Actual process
library("dplyr")
library("tidyr")
df %>%
    # Transform wide table into tidy
    gather("Year", "Number", X2013:X2015) %>%
    # Reshape gender columns for easier summaries
    spread("Gender", "Number") %>%
    # Compute ratios
    group_by(City, Year) %>%
    summarise(ratio = Female/(Male + Female))
#> # A tibble: 6 x 3
#> # Groups:   City [?]
#>   City  Year  ratio
#>   <fct> <chr> <dbl>
#> 1 a     X2013 0.6  
#> 2 a     X2014 0.727
#> 3 a     X2015 0.758
#> 4 b     X2013 0.545
#> 5 b     X2014 0.667
#> 6 b     X2015 0.824

Created on 2018-10-10 by the reprex package (v0.2.1)

To get exactly your result you can apply back the function spread() to spread the ratios over years, (spread(Year, ratio))

2
votes

You can more easily calculate the ratio if the Male and Female are in different columns, which you can change the structure by using tidyr

library(dplyr)
library(tidyr)

df %>% 
  gather(Year, Value, -City, - Gender) %>% 
  spread(Gender, Value) %>% 
  mutate(Ratio = Female/Male, Year = paste0(Year, "_Ratio")) %>% 
  select(-Female, -Male) %>% 
  spread(Year, Ratio)
1
votes

With tidyverse:

 df = read.table(text="City           Gender  2013  2014  2015
 Aberdeen       Female   30    40    50
 Aberdeen       Male     20    15    16
 Aberdeenshire  Female   60    80    70
 Aberdeenshire  Male     50    40    15", header = T)
> library(tidyverse)
> 
> df %>%
   group_by(City) %>%
   arrange(City, Gender) %>%
   summarise_at(vars(X2013:X2015), .funs = funs(ratio = first(.)/last(.)))
# A tibble: 2 x 4
  City          X2013_ratio X2014_ratio X2015_ratio
  <fct>               <dbl>       <dbl>       <dbl>
1 Aberdeen              1.5        2.67        3.12
2 Aberdeenshire         1.2        2           4.67

or

df %>%
  group_by(City) %>%
  arrange(City,Gender) %>%
  summarise_at(vars(X2013:X2015), .funs = funs(ratio = .[Gender == "Female"]/.[Gender != "Female"]))