3
votes

I got following data frame,df, (fragment displayed here):

    H2475  H2481  H2669  H2843  H2872  H2873  H2881  H2909
E1 94.470 26.481 15.120 18.490 16.189 11.422 14.886  0.512
E2  1.016  0.363  0.509  1.190  1.855  0.958  0.771  0.815
E3  9.671  0.637  0.571  0.447  0.116  0.452  0.403  0.003
E4  3.448  2.826  2.183  2.607  4.288  2.526  2.820  3.523
E5  2.548  1.916  1.126  1.553  1.089  1.228  0.887  1.065

what I want to do is to compute mean values of each row after removing two extreme values. For whole rows I used plyr:

library(plyr)
df.my_means <- adply(df, 1, transform, my_means = mean(as.matrix(df[i,]) ) )

It should be also OK to create some temporary data frame/matrix with min and max values replaced by NAs, but as a beginner I am not able to do it.

Thanks a lot for your help

EDIT 1

I was obviously unaware that mean has a trim option. I would like to have a solution where instead of mean I can plug in any other function. I.e.:

library(plyr)
library(e1071)
df.my_means <- adply(df, 1, transform, my_skew = skewness(as.matrix(df[i,]), , 3 ) )

I apologize if this breaks the question posting rules, but then having separate questions for mean, median etc. is counter-intuitive.

EDIT 2 Partial solution without plyr:

df.my_means <- apply(df ,1, function(x){y=x[order(x)]; (y[2:(length(y)-1)])})

This break the connection between column values.

1
What if you have multiple occurences of min/max, do you also want to remove them?themel
If you want to calculate row means, then you probably should be using a matrix or transposing your data frame.Richie Cotton
@themel: good point. Original data is already filtered from rows likely to containing multiple zeros, but with ca 10k rows & 30 columns this may happen. I will stay with removing just one min and one max value from the row.darked89

1 Answers

5
votes

You can use the trim argument to mean:

apply(x,1,mean,trim=1/NCOL(x))
#         E1         E2         E3         E4         E5 
# 17.0980000  0.8765000  0.4376667  2.9583333  1.3295000