For each person, there are two types of visits and for each visits, there are date records. The dataset looks like below.
p <-c(1,1,1,2,2,2,2,3,3,3,4)
type <- c(15,20,20,15,20,15,20,20,15,15,15)
date <- as.Date.factor(c("2014-02-03","2014-02-04","2014-02-06","2014-01-28","2014-02-03","2014-03-03","2014-03-13","2014-04-03","2014-04-09","2014-12-03","2014-04-05"))
d <- data.frame(p,type,date)
So now the dataset looks like this.
> d
p type date
1 1 15 2014-02-03
2 1 20 2014-02-04
3 1 20 2014-02-06
4 2 15 2014-01-28
5 2 20 2014-02-03
6 2 15 2014-03-03
7 2 20 2014-03-13
8 3 20 2014-04-03
9 3 15 2014-04-09
10 3 15 2014-12-03
Now, I'd like to create three new columns.
indicating whether a type 20 visit happens in 7 days after the type 15 visit, if yes then the indicator is 1, otherwise 0.(for example, for p2, in the line 4, this value should be 1, and in the line 6, this value should be 0)
What is the first date of type 20 visit happened in 7 days after the type 15 visit. If there is no type 20 visit in 7 days after the type 15, then keep it blank. (for example, for p1, the value should be 2014-02-04 instead of 2014-02-06)
How many days is between the type 15 visit and type 20 visit happened in 7 days from it. If there is no type 20 visit in 7 days after the type 15, then keep it blank.(for example, the value in line 1 should be 1)
I'm a super newbie in R, and basically have no idea of what to do. I tried a for loop within group, but it never works.
group_by(p)%>%
for(i in i:length(date)){
*if(type[i]== 15 && date[i]+7 >= date[i+1:length(date)]){
indicator = 1
first_date =
days =* #Have no idea how to check in this part
} else {
indicator = 0
first_date = NA
days = NA
}
The expected output is as below.
p type date ind first_date days
1 1 15 2014-02-03 1 2014-02-04 1 # = 2014-02-04 - 2014-02-03
2 1 20 2014-02-04 NA <NA> NA
3 1 20 2014-02-06 NA <NA> NA
4 2 15 2014-01-28 1 2014-02-03 6 # = 2014-02-03 - 2014-01-28
5 2 20 2014-02-03 NA <NA> NA
6 2 15 2014-03-03 0 <NA> NA # since (2014-03-13 - 2014-03-03) > 7
7 2 20 2014-03-13 NA <NA> NA
8 3 20 2014-04-03 NA <NA> NA #I don't care about the value for type 20 lines
9 3 15 2014-04-09 0 <NA> NA
10 3 15 2014-12-03 0 <NA> NA
So I come up with a new idea. What if we group records by p and type == 15.Then we can use subtraction within groups as days, and the rest will be easy.
I found one way in doing this:
d[,group:= cumsum(type ==15)]
However, this will count group when encountering a new type 15 record. How to add p as another grouping condition?
data.frame()
, instead ofdata.frame(cbind())
– Axeman