Split a data.frame, sort and subset from a list of data.frames

Question

I have a large data.frame that looks like this:

   Statistic1    fdr1     Value1   Statistic2  fdr2   Value2
       2        0.0001    Signif      1.8      0.001   Signif 
      0.3        0.13       0          5        0.5      0
      1.5        0.01     Signif      0.4      0.009   Signif

I would like to split the data frame every 3 columns, for example Statistic1, fdr1 and Value1. Then sort each splitted data.frame by Statistic* column in descending order and take the first 20 row names of each sorted data.frames corresponding to the Signif label in column Value* of the sorted data.frame.

Desired output

>       df1         

>        Statistic1    fdr1     Value1   
>            2        0.0001    Signif            
>           1.5        0.01     Signif     

>        Statistic2    fdr2     Value2
>           1.8        0.001    Signif 
>           0.4        0.009    Signif

From each single data.frame I will take the first 20 row names.

Can anyone help me please?

Sotos Sotos · Accepted Answer · 2019-04-24T07:09:31

You can split the data frame by using split.default. Loop over the list and do the required actions. Translating your requirements would give,

lapply(split.default(df, gsub('\\D+', '', names(df))), function(i) 
                                                {i <- i[i[3] != 0,];
                                                 i <- i[order(i[1], decreasing = TRUE),]; 
                                                 i[1:20,]})

However, note that since your example only has 3 rows, then doing the last condition (1:20) will result in NA rows

Split a data.frame, sort and subset from a list of data.frames

3 Answers