2
votes

This is quite complicated for me and I would be really grateful if someone could tell me how to go about this problem. My dataframe has two columns:

dat <- structure(list(day = 172:208,
                      x = c(0.14, 0.02, 0.09, 3.06, 3.21, 
                            4.15, 6.24, 6.27, 3.31, 6.28, 
                            16.9, 20.1, 20.29, 20.45, 17.52, 
                            6.22, 1.14, 0.84, 0.68, 0.49, 
                            0.22, 0.01, 0.01, 0.6, 0.64, 0.64, 
                            0.66, 0.69, 0.15, 0.15, 3.16, 
                            3.44, 3.42, 3.37, 3.51, 2.77, 3.51
                      )),
                 .Names = c("day", "x"),
                 class = "data.frame", row.names = c(NA,-37L))

dat
  day      x 
 172    0.14
 173    0.02
 174    0.09
 175    3.06
 176    3.21
 177    4.15
 178    6.24
 179    6.27
 180    3.31
 181    6.28
 182    16.90
 183    20.10
 184    20.29
 185    20.45
 186    17.52
 187    6.22
 188    1.14
 189    0.84
 190    0.68
 191    0.49
 192    0.22
 193    0.01
 194    0.01
 195    0.60
 196    0.64
 197    0.64
 198    0.66
 199    0.69
 200    0.15
 201    0.15
 202    3.16
 203    3.44
 204    3.42
 205    3.37
 206    3.51
 207    2.77
 208    3.51

What I want to do is this:

1) In column x, look for values greater than 2.3

which(x>2.3)

2) For the day where x is greater than 2.3, calculate the percentage change in x for next 3 days. For example, for 175 day, x is 3.06 (>2.3), therefore for next 3 consecutive values of x (3.21 - day 176, 4.15 - day 177, 6.24 - day 178), do this:

(3.21 - 3.06)*100/3.06 = 4.9
(4.15 - 3.21)*100/3.21 = 0.29
(6.24 - 4.15)*100/4.15 = 50.36

and if the all the above three values are greater than -30, then store the middle day from 176, 177 and 178 in a separate vector (in this case, store 177).

3) If the three values are less than -30, then start again from 179 (>2.3 mm) and repeat step 2 for day 180, 181 and 182.

   (3.31 - 6.27)*100/6.27 = -47.2
   (6. 28 - 3.31)*100/3.31 = 89.72
   (16.9 - 6.28) * 100/6.28 = 169.1

If all the values are greater than -30, then store the middle day (181). In this case, one of the values is less than -30, therefore do not store anything and start again fromfrom 183 (>2.3 mm) and repeat again for 184, 185 and 186. If a value out of 3 values above is again less than -30, start from day 187 (x > 2.3) and repeat step 2 for day 188,189 and 190. If again a single value out of three is less than -30, then start from 202 (since for 202, x > 2.3)

I am really sorry I do not have much programming experience here in r therefore posting this question which has bogged me down for quite a time.

Thanks a lot

1
OK, so would you repeat this for every day which has x > 2.3? Even if they are consecutive?AndrewMacDonald
no i wont repeat it for each day. say, if a day is >2.3, then the next three days, do the percentage thing. and then move to the fourth day. I dont need to get back to the three days to test again for x>2.3.user3013423
Sorry, I wasn't clear with my question. In your example, days 176, 177 and 178 all have x > 2.3. So you begin the 3-day-interval-percentage thing starting on day 176 and so on. Once that is done, do you then want to being another 3-day-interval algorithm, starting with day 177? Or is it excluded because it was contained in the intervals which started on 176?AndrewMacDonald
Hi thanks. 177 will be excluded since it was contained in the intervals.user3013423
Oh. I wish I had known that; it looks like I just solved a different problem than the one you have! So, in your example, you only need to define groups of 3 starting with day 175! So there are only 11 groups of 3. Is that right?AndrewMacDonald

1 Answers

4
votes

We can solve this without any loops, by using the new(ish) dplyr package:

library(dplyr)
library(magrittr)

Let's calculate percent change first, as a new column:

dat <- dat %>%
  mutate(change = (x-lag(x))/lag(x)*100)

Next, make a vector of subscripts that indicates which groups of 3 should be included in your answer. In this chunk of code, we fine the first (min) value that matches your condition, count from there in intervals of 4, and then make those the "starting points" for our subscripts. The last line simply makes this into a dataframe:

grps <- which(dat$x > 2.3) %>% 
  min %>%
  seq(from = ., to = nrow(dat), by = 4) %>%
  lapply('+',1:3) %>%
  do.call(c,.) %>%
  function(l) data.frame(group = gl(length(l)/3,3), ss = l)

Then you use those group subscripts (ss) to pull out the necessary rows of dat. Let's look at the top of the new dataframe before proceeding:

grps %>%
  do(data.frame(.,dat[.$ss,])) %>%
  head

   group ss day     x     change
5      1  5 176  3.21   4.901961
6      1  6 177  4.15  29.283489
7      1  7 178  6.24  50.361446
9      2  9 180  3.31 -47.208931
10     2 10 181  6.28  89.728097
11     2 11 182 16.90 169.108280

As you can see, the values for days 176, 177 and 178 exactly match those in your example. This is a group for which we will want the middle number, since all of change is greater than -30. We won't use 181, however, because one value of change in that group ("group 2") is less than -30. Again, this matches the original question.

Then group by group (which keeps data in sets of 3). Finally you filter the data by change > -30 and select only the middle row:

grps %>%
  do(data.frame(.,dat[.$ss,])) %>%
  group_by(group) %>%
  filter(all(change > -30)) %>%
  do(.[2,])

  group ss day     x     change
1     1  6 177  4.15 29.2834891
2     3 14 185 20.45  0.7885658
3     6 26 197  0.64  0.0000000
4     8 34 205  3.37 -1.4619883