0
votes

This question is related to the recent question I have posted here

I want to compute exposure averages of the preceding years 1, 2, 3. etc. based on the exact date of entry into the study.

Example for the first person (entered study in Feb 2002) the one year average is computed from the number of days in 2002(37) and 2001(328). Exposure for 2002 was 18 and for 2001 it was 19. This gives a one year exposure of 18.88569 which is calculated from formula (37*18)/365.25 and (328*19)/365.25.

A two year average for the same person includes exposure data from years 2002, 2001 and 2000 and will be: contribution of 2002 (37*18) / 365.25
contribution of 2001: 19
contribution of 2000: (328*18)/365.25 ) / 2 which gives 18.49384

Thanks Fake data is found following this link: https://drive.google.com/file/d/0B_4NdfcEvU7La1ZCd2EtbEdaeGs/view?usp=sharing

1

1 Answers

1
votes

Once again, here is one idea. There will be better approaches. Anyway, I modified my previous answer for your previous question. This is not an elegant answer either. But, this does what you are after. The first step was to rearrange your data frame. test in ana has the number of the days for each subject's (id) entry year (e.g., 37 for id 1) as well as 365 for the other years. In the second step, I split the data by subject (id) and removed rows which years are larger than the entry year. In the final step, I did math as you described in your question. I chose the entry year row for each subject and created a data frame.

library(reshape2)
library(lubridate)
library(stringi)
library(dplyr)
library(tidyr)

### Arrange the data frame.
ana <- mutate(mydf,id = 1:n()) %>%
       melt(id.vars = c("id","entry")) %>%
       mutate(variable = stri_extract_last(variable, regex = "\\d+"),
              entry = as.Date(entry, format = "%d%b%Y"),
              entryYear = as.character(format(entry, "%Y")),
              check = ifelse(variable == entryYear, "Y", "N"),
              test = ifelse(variable == entryYear, yday(entry) - 1, 365)) %>%
              arrange(id)


### Get rid of rows which have larger year numbers than the entry year

bob <- lapply(split(ana, ana$id), function(x) {

                    indx <- which(x$check == "Y")

                    if(indx < nrow(x)){

                        x[-c(indx + 1: nrow(x)),]

                    }else{x}
                }
            )


### Get one-year and two-year averages

cathy <- lapply(bob, function(x){

    x$one <- ((x[nrow(x),4]) * (x[nrow(x),7]) / 365.25) +
             ((x[nrow(x)-1,4]) * ((x[nrow(x)-1,7]) - (x[nrow(x),7])) / 365.25)


    x$two <- (((x[nrow(x),4]) * (x[nrow(x),7]) / 365.25) +
             (x[nrow(x)-1,4]) +
             ((x[nrow(x)-2,4]) * ((x[nrow(x)-2,7]) - (x[nrow(x),7])) / 365.25)) / 2 

    subset(x, check == "Y")
})

### Create a data frame. unnest is available in the dev version of tidyr.
unnest(cathy)

#  id      entry variable value entryYear check test       one       two
#1  1 2002-02-07     2002    18      2002     Y   37 18.885695 18.493840
#2  2 2002-06-06     2002    16      2002     Y  156 16.561259 16.780630
#3  3 2003-04-16     2003    14      2003     Y  105 15.414100 15.707050
#4  4 2003-05-26     2003    16      2003     Y  145 16.591376 17.096851
#5  5 2003-06-11     2003    13      2003     Y  161 13.549624 14.054073
#6  6 2004-02-20     2004     3      2004     Y   50  2.997947  3.430185
#7  7 2004-07-25     2004     2      2004     Y  206  1.998631  2.216975
#8  8 2004-08-19     2004     4      2004     Y  231  3.997262  4.182067
#9  9 2004-12-19     2004     5      2004     Y  353  5.029432  5.481862