0
votes

The data is a data.frame with 3199 rows and 30 columns. I have diffrent plots from diffrent locations and yearly data for specific years with many variables. I now want to calculate the mean of a specific variable by location and year. What the data looks like:

year  location variable1 variable2 ...
1923   1013    
1924   1013
1925   1013  
 .      .
 .      .
1930   1014 

So I first splitted the data by year and location. Now I want to calculate the mean. This is what following code does:

data<-lapply(data, function(x){lapply(x,function(y)
{m<-sum(variable1)/length(y$variable1) 
calculatedmean<-rbind(calculatedmean,m)})}) 

After that I want to have the result back into a data.frame, thats what following code should do:

calculatedmean<-rbind.fill(lapply(calculatedmean,function(x)
{as.data.frame(t(x),stringsAsFactors=FALSE)}))

I need to use rbind.fill from the plyr-package because the results differ in length. What I get is a data.frame with lists in it. It looks like:

    colname1                     colname2                     colname3  ...
    list(x0.00029 = 0.00029)     NULL                         NULL 
    list(X0.000313 = 0.000313)   NULL                         NULL  
    list(X0.000272 = 0.000272)   list(X0.000625 = 0.000625)   NULL 
        .                         .                             .
        .                         .                             .   

I want to replace the list elements with the calculated mean for data[1,1] e.g. it's 0.00029. I want to keep the format of the data.frame where NULL should be NA. I tried it with:

t(as.data.frame(sapply(caluculatedmean,function(x) unname(unlist(x))))) 

But that doesn't work because of the diffrent length of the columns. I think the solution shoulndn't be to complicated, but I just can't figure it out right now..

1
It will help significantly if you will post reproducible example with your data here. Your task looks like a good work for ddply instead of working with lapply. Should you post the data it will help significantly to provide you reproducible answer.Cron Merdek
I used ddply two get the results: means<-ddply(data, c("location", "year"),summarise, mean = mean(variable1, na.rm=TRUE)). But I now have a pretty similar problem: The means are stored in one column, but I want to have them splitted by location into different columns. As they are of diffrent length, I tried to use cbind.fill, but that didn't work. (Note: I edited the question and gave some example for the data, hope that helps)dementation

1 Answers

0
votes

I finally did it, I think the way is a bit complicated, but it worked for me:

I first splitted the data by location and year with the help of the the plyr package.

means<-ddply(data, c("location", "year"),summarise, mean = mean(variable1, na.rm=TRUE))

After that I wanted to have the means in a data.frame each year in one column. So splitted it by location.

a<-split(a, a$location)

I used the zoo package to make timeseries-objects and then put them into a data.frame.

 a<-lapply(a, function(x){ 
    assign(paste(x$location[1]),zoo(x$mean, x$year))

    })


     a<-do.call("merge", a)