2
votes

My data frame looks like this

       Model           w0        p0          w1          p1       w2      p.value

1   Null_model 3.950000e-05 0.7366921 0.988374029 0.000000e+00 1.296464 
2     alt_test 1.366006e-02 0.4673263 0.139606503 3.049244e-01 1.146653 
3     alt_ref  2.000000e-07 0.4673263 0.000846849 3.049244e-01 1.635038  5.550000e-15 

8   Null_model 2.790000e-05 0.7240479 0.987016439 0.000000e+00 1.263556  
9     alt_test 7.550000e-09 0.7231176 0.991768899 1.060000e-13 1.369259   
10     alt_ref 2.770000e-05 0.7231176 0.995373167 1.060000e-13 1.192839  3.073496e-01

            ...      ...          ...         ...          ...       ...        ...

What I want is to subset my data.frame in a way that keeps every case where p.value < 0.05 but it also keeps the previous rows to these cases.

So ideally my output will be something like this

      Model       w0          w1       w2
2   alt_test  1.4e-0.2 0.139606503 1.146653
3   alt_ref   2.00e-07 0.000846849 1.635038

I've tried the following but it doesn't work quite right:

subset(v, p.value < 0.05, select = c(Model,w0,w1,w2))

the output doesn't have the alt_test row.

I have also tried

with(v, ifelse(p.value < 0.05, paste(dplyr::lag(c(w0,w1,w2),1)), ""))

and the output in this case looks like

  [1] NA            NA            NA            NA            "0.013660056" NA            NA            NA            NA            ""           
 [11] NA            NA            NA            NA            ""            NA            NA            NA            NA            ""           
 [21] NA            NA            NA            NA            ""            NA            NA            NA            NA            ""           
 [31] NA            NA            NA            NA            ""            NA            NA            NA            NA            ""           
 [41] NA            NA            NA            NA            ""            NA            NA            NA            NA            ""           
 [51] NA            NA            NA            NA            "1.34e-11"    NA            NA            NA            NA            ""    ...       

I also tried

subset(v, p.value < 0.05, select = c(w0, w1,w2, w0-1, w1-1, w2-1))

but this gives the previous column, so I was wondering if something similar can give previous rows instead?

Thank you

1
Try w = which(with(DF, my_condition)); DF[rep(w, each=2)-1:0, my_cols]Frank
@Rafael I don't see why the data.table tag should be added here.Frank
@Frank, it is not strictly necessary, but it helps getting attention from people that uses data.table, which is quite handy for data manipulation and I know it's possible to solve this question using data.tablerafa.pereira
@Rafael Ok, just fyi, tags should not be used to get attention and should reflect the original poster's intentions, not what's used in the answers.Frank

1 Answers

0
votes

If your data.frame always has alternating structure as alt_test and alt_ref, then you can manually construct the subset index as below:

library(data.table)
setDT(myDf)
myDf[Reduce(function(x,y) ifelse(!is.na(x), x, ifelse(!is.na(y), y, F)),
     shift(p.Value < 0.05, n = 0:1, type = "lead")), .(Model,w0,w1,w2)]