1
votes

Hi I am struggling about a problem since coupple of days and haven't found any answer yet.

Supposed I am having a dataset with columns: Country, Population. The Country is incoded in Numbers, so the raw dataset looks like this:

df <- data.frame(country=c(1,2,3,4,5,6), population=c(10000,20000,30000,4000,50000,60000))
df
  country population
1       1      10000
2       2      20000
3       3      30000
4       4       4000
5       5      50000
6       6      60000

I want country to be a factor with the following levels: France, Germany, Canada, USA, India, China and Europe, America, Asia. So to say a factor combinig:

df$country <- factor(df$country, labels = c("France", "Germany", "Canada", "USA", "India", "Asia"))
df
  country population
1  France      10000
2 Germany      20000
3  Canada      30000
4     USA       4000
5   India      50000
6    Asia      60000

and

df$country <- cut(df$country, breaks = c(0,2,4,6),labels = c("Europe", "America", "Asia"))
df
  country population
1  Europe      10000
2  Europe      20000
3 America      30000
4 America       4000
5    Asia      50000
6    Asia      60000

My aim is to do something like:

tapply(df$population, df$country, sum)

with a result like this:

France Germany Canada  USA India China Europe America    Asia 
 10000   20000  30000 4000 50000 60000 30000    34000  110000 

Is there a way to this, without creating a third column in the data frame? I hope it is understandble, what my problem is. I already tried interaction() but thats not what I want.

1

1 Answers

0
votes

So the following function from the plyr-Package divides your data frame into sub-data-frames (one sub-data-frame per country) and then sums up the population values. The t function just transverses your data frame.

> library(plyr)
> neu <- ddply(df, .(country), Summe = sum(population))
> t(neu)