Using R's dcast to aggregate by mean with missing entries

Question

I'm new to using reshape2 and its functionality. I have a data table, d, for which I'm trying to aggregate data on species counts at stations in years, to get the mean count for each species over all stations for each year:

d<-data.table(station=c(1,1,4,3),year=c(2000,2000,2001,2000),
   species=c("cat","dog","dog","owl"),abundance=c(10,20,30,10))
d

>   station year species abundance
 1:       1 2000     cat        10
 2:       1 2000     dog        20
 3:       4 2001     dog        30
 4:       3 2000     owl        10

I use dcast to aggregate on abundance, but what I seem to get is a sum which ignores the NaN results generated, rather than a mean:

dm<-dcast(d, year~ species,value.var="abundance",fun.aggregate = mean)
dm
>   year cat dog owl
 1: 2000  10  20  10
 2: 2001 NaN  30 NaN

What I want is:

>   year  cat   dog   owl
 1: 2000  3.33  6.67  3.33
 2: 2001  0     30    0

Using the argment fill=0 just results in the NaNs being replaced by 0s.

I'd be very grateful for any advice. I've read the documentation and looked for tutorials, but haven't been able to solve this.

amatsuo_net amatsuo_net · Accepted Answer · 2017-06-09T10:24:40

The use of the term "mean" is not particularly standard. I think creating a new variable called mean_abundance would be the best solution.

d[, mean_abundance := abundance/length(abundance), by = year]

dm <- dcast(d, year~ species,value.var="mean_abundance")
dm[is.na(dm)] <- 0

Using R's dcast to aggregate by mean with missing entries

2 Answers