I am working on a dataframe where I am using group_by and summarise to get some results using dplyr. However, one of the variables I intend to generate in summarise needs to access a second dataframe value based on the value of the grouping variable, and I cannot guess how to do that. Here's an example.
These are my 2 df:
ExampleData <- structure(list(country = structure(c(5L, 5L, 5L, 1L, 1L, 1L,
4L, 4L, 4L, 2L, 2L, 2L), .Label = c("Bolivia", "Colombia", "Ecuador",
"Peru", "Venezuela"), class = "factor"), area = c(21962759.1957539,
6116515271.82745, 4420526.44962988, 950155731.837125, 3284949253.71748,
13008533744.7177, 181171.153229255, 724458.059924146, 545485754.118267,
646585511.365563, 5586512056.6131, 4025165194.1968)), .Names = c("country",
"area"), row.names = c(0L, 1L, 2L, 87L, 88L, 89L, 117L, 118L,
country.areas <- structure(list(country = c("Bolivia", "Colombia", "Ecuador",
"Peru", "Venezuela"), area = c(1090353, 1141962, 256932, 1296912,
916560.5)), .Names = c("country", "area"), row.names = c(NA,
5L), class = "data.frame")
> head(ExampleData)
country area
0 Venezuela 21962759
1 Venezuela 6116515272
2 Venezuela 4420526
87 Bolivia 950155732
88 Bolivia 3284949254
89 Bolivia 13008533745
> head(country.areas)
country area
1 Bolivia 1090353.0
2 Colombia 1141962.0
3 Ecuador 256932.0
4 Peru 1296912.0
5 Venezuela 916560.5
Now, I want to work with ExampleData, by group_by
the country
field and summarise
to generate a variable PercOfCountry
which is the sum areas for each country divided by the total area of the country, as taken from country.areas
. I am trying with:
by.country <- ExampleData %>%
group_by(country) %>%
summarise(km2.country = sum(area)/1000000,
PercOfCountry = km2.country/country.ares$area[country.areas$country == country])
where the last country
(the very last word) wants to refer to area of the country which is being considered in the group_by as taken from the df country.areas (e.g.: 1090353.0 for Bolivia). The km2.country
part works as expected... I just want to divide that value by the area of the country, so I get a percentage.
Of course, I could do that on a next step quite easily... but I am trying to learn dplyr, and it is still difficult to understand to me which are the capabilities of the group_by
function, which seems powerful.
Thanks!
inner_join
command. – MrFlick