5
votes

I have a data frame called dat_new, essentially it is clinic visit data, hrn being a patient ID, and dov being date of visit (multiple visits per person). Then I have a data frame called event with dated hospital admissions (multiple admissions per person).

What I want to do, is for each clinic visit, I want to sum the hospital admissions that occurred prior to that clinic visit, simple.

This works with ddply from plyr, takes a bit of time but works well.

temp <- ddply(dat_new, .(hrn,dov), summarise,
              dka2 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==2),
              dka3 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==3),
              dka8 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==8)
)

Now, trying to rewrite in dplyr, I get an error

Error: binding not found: 'event_code'

I have it coded like this:

temp2 <- group_by(dat_new, hrn, dov)
temp3 <- summarise(temp2,
                   dka2 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==2))

Obviously event_code is not in the temp2 data frame. Is it a case of dplyr can not work with 'other' data frames when 'summarising'? If there is a far better way to be doing the 'lookup'/sum I'm doing I'm all ears.

I did try this a few times trialing loading packages on a vanilla R in different orders to try and eliminate any namespace issues.

Thanks

EDIT - REPRODUCIBLE EXAMPLE

This is a quick and dirty example just to illustrate the issue. If we make a 'lookup' data.frame that has 2 of each car, with a mpg around 500, we can then try and go through the original data.frame, looking up in the new data.frame and summing the two mpgs together. plyr gives the expected, figures around 1000. dplyr errors.

# add the model names as a column so they're easier to get at
mtcars$models <- row.names(mtcars)

# create a 'lookup' table
xtra <- data.frame(models = rep(row.names(mtcars),2),
                    newmpg = rnorm(2*nrow(mtcars),500,10)
)
xtra <- xtra[sample(row.names(xtra)), ]

library(plyr)
ddply(mtcars, .(models), summarise,
        revisedmpg = sum(xtra$newmpg[models==xtra$models]) )
# great, one row per car, with both mpgs added together
library(dplyr)

temp2 <- group_by(mtcars, models)
temp3 <- summarise(temp2,
                   revisedmpg = xtra$newmpg[models==xtra$models] )
# error
1
I experienced similar issue two few weeks ago and I think this is related to github.com/hadley/dplyr/issues/170 . I do hope that there's an elegant way to achieve this kind of task using dplyr. Looking forward to see the answer to this question. You question is really intersting so please make some effort to make it reproductible if you people to help you. Use the mtcars data set for example.dickoa
Okay done, as crude as it is :)nzcoops
Thks. Let hope that a dplyr guru will find a workaround now.dickoa
I think what you want here is a cross join, as in stackoverflow.com/questions/19552104, which dplyr doesn't currently support. This is the third time this problem has come up so I'll think about it for a future release, github.com/hadley/dplyr/issues/197.hadley
Perhaps @hadley. It's probably a poor choice of words but I prefer to think of this (my current working plyr version) as more of a lookup than any sort of join. As I eluded to in the comment on Troy's answer below, I don't like the idea of a 'join' given you get n1 x n2 records in the resulting data frame/table (that is then manipulated).nzcoops

1 Answers

2
votes

How about:

merge(mtcars,xtra,by="models") %.% group_by(models) %.% summarise(sum(newmpg)) 

EDIT sorry I think this is what you want;

# from what I can tell of your data:
dat_new<-data.frame(hrn=c("P1","P2"),dov=42000)
event<-data.frame(hrn=sample(dat_new$hrn,20,T),doa=41990+sample(1:20,20),event_code=sample(2:8,20,T))


merge(dat_new,event,by="hrn") %.%
filter(doa<=dov) %.% 
group_by(hrn,dov) %.%
summarise(dka2=length(event_code[event_code==2]),
          dka3=length(event_code[event_code==3]),
          dka8=length(event_code[event_code==8]))

Source: local data frame [2 x 5]
Groups: hrn

  hrn   dov dka2 dka3 dka8
1  P1 42000    2    1    0
2  P2 42000    1    0    1

And apologies - I'd mixed up doa & dov before the edit - you may need to tweak the merge(,by=c("x",..)) call depending on what else is in your tables