0
votes

This is a follow up to this question: Return values in sucessive years

@akrun asked me to post a new question

A sample of my data can be found here: https://dl.dropboxusercontent.com/u/5699030/SuccessiveYears.xlsx

Or here it is in a dataframe

id <- c(1,1,1,2,3,4,4,5,5,5,6,7,7,8,8,8,8,9,9,9,10,10,11,12,12,12,13,13,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15)
Year <- c(2010,2011,2012,2010,2013,2012,2013,2010,2011,2012,2013,2011,2013,2011,2012,2014,2015,2010,2011,2013,2010,2012,2013,2010,2011,2013,2011,2012,2011,2012,2013,2014,2015,2010,2010,2011,2011,2012,2012,2013,2013,2014,2014,2015,2015)
Class <- c("c","c","c","c","a","c","b","c","c","c","b","c","b","d","f","f","f","c","c","c","c","c","b","c","c","b","c","c","c","c","b","c","b","e","d","e","d","e","d","e","d","e","d","e","d")
ValueA <- c(0.1,0.12,0.15,0.21,0.09,0,0.04,0.05,0.27,0.1,0,0,0,0.22,0.18,0,0.11,0,0,0.06,0,0,0,0.19,0.23,0.18,0,0.08,0,0,0.02,0.08,0.09,0.51,0.99,0.71,0.87,0.97,0.95,1.38,0.84,0.53,1.05,1.04,0.77)
ValueB <- c(0.42,0.58,0.38,0.26,0.28,0.28,0.27,0.22,0.27,0.2,0.11,0,0.05,0.34,0.27,0,0.11,0,0,0.06,0,0,0,0.38,0.31,0.35,0.15,0.17,0.13,0.04,0.07,0.08,0.28,0.92,1.34,0.98,1.18,1.06,1.29,1.74,1.15,0.75,1.49,1.33,1.07)
ValueC <- c(1.3,0.81,1.07,1.16,3.04,0.28,0.18,1.09,1.42,1.19,0.11,0.2,0.45,1.79,1.08,0.59,0.66,0.35,0.3,0.17,0.44,0.57,0.65,2.32,2.45,2.01,3.24,2.45,0.73,0.5,0.4,0.56,0.56,5.96,5.88,5.27,5.01,5.44,6.09,5.5,6.08,3.92,4.98,5.7,5.12)
df <- data.frame(id, Year, Class, ValueA, ValueB, ValueC)
df

For every combination of id and class, where there are values for two successive years I would like the ValueA, ValueB and ValueC returned in a row for each of those two successive years. All of the column headers for the subsequent year values should be prefixed with "Y2".

Each id can have values for more than one class in any one year, hence why we need to look at the combination of id and class.

The output should look as follows:

id <- c(1,1,5,5,8,9,13,14,15,15,15,15,15,15,15,15,15,15)
Year <- c(2010,2011,2010,2011,2014,2010,2011,2011,2010,2010,2011,2011,2012,2012,2013,2013,2014,2014)
Class <- c("c","c","c","c","f","c","c","c","e","d","e","d","e","d","e","d","e","d")
ValueA <- c(0.1,0.12,0.05,0.27,0,0,0,0,0.51,0.99,0.71,0.87,0.97,0.95,1.38,0.84,0.53,1.05)
ValueB <- c(0.42,0.58,0.22,0.27,0,0,0.15,0.13,0.92,1.34,0.98,1.18,1.06,1.29,1.74,1.15,0.75,1.49)
ValueC <- c(1.3,0.81,1.09,1.42,0.59,0.35,3.24,0.73,5.96,5.88,5.27,5.01,5.44,6.09,5.5,6.08,3.92,4.98)
Y2Year <- c(2011,2012,2011,2012,2015,2011,2012,2012,2011,2011,2012,2012,2013,2013,2014,2014,2015,2015)
Y2Class <- c("c","c","c","c","f","c","c","c","e","d","e","d","e","d","e","d","e","d")
Y2ValueA <- c(0.12,0.15,0.27,0.1,0.11,0,0.08,0,0.71,0.87,0.97,0.95,1.38,0.84,0.53,1.05,1.04,0.77)
Y2ValueB <- c(0.58,0.38,0.27,0.2,0.11,0,0.17,0.04,0.98,1.18,1.06,1.29,1.74,1.15,0.75,1.49,1.33,1.07)
Y2ValueC <- c(0.81,1.07,1.42,1.19,0.66,0.3,2.45,0.5,5.27,5.01,5.44,6.09,5.5,6.08,3.92,4.98,5.7,5.12)
df1 <- data.frame(id, Year, Class, ValueA, ValueB, ValueC, Y2Year, Y2Class, Y2ValueA, Y2ValueB, Y2ValueC)
df1
1
Rather than link to a file and try to explain what you want (certainly do that as well), it's best if we can see a small example dataset here, along with the output you're after. Your description seems OK, but I'm personally not getting a clear picture of what you're after. Seeing a small sample of the output you'd like to see will make that much clearer.rosscova
The output is in the second file that I linked; I thought that would have been an OK way to present it. is that a no no? dl.dropboxusercontent.com/u/5699030/Output.xlsxcookie
@rosscova Original post edited to include data and outputs as dataframes. Hope that is better etiquette.cookie
An easier way to include the data would be to read your Excel file into R, e.g. df <- read_excel("~/Downloads/SuccessiveYears.xlsx") from the readxl package, "print" the data using dput(df), and then copy-paste the output structure(...) into your question. See the FAQ here.Weihuang Wong
@WeihuangWong Thanks for explaining how to include the data here. It took me quite a while to create that dataframe manually, so that shortcut is most appreciatedcookie

1 Answers

0
votes

First I use dlply to split the dataset into groups identified by id and Class:

library(plyr)
grouped <- dlply(df, .(id, Class), function(x) return(x[order(x$Year), ]))

Then, apply the consecutive years logic, using merge to join data from consecutive years, and returning the result as a list:

out <- lapply(grouped, function(x) {
    if (nrow(x)>1) {
        out <- lapply(1:(nrow(x)-1), function(i) {
            if ((x$Year[i]+1)==(x$Year[i+1])) {
                merge(x[i,], x[i+1,], by=c("id", "Class"))
            } else {
                NULL
            }
        })      
    }
})

Finally, bind the results into a dataframe:

final <- do.call(rbind, lapply(out, function(x) {
    if(length(x)>=1) {
        do.call(rbind, x)   
    } else {
        x
    }
}))

And finish up by renaming the columns.

cols <- c("Year", paste0("Value", LETTERS[1:3]))
names(final) <- c("id", "Class", cols, paste0("Y2", cols))
rownames(final) <- 1:nrow(final)

Result:

> final
   id Class Year ValueA ValueB ValueC Y2Year Y2ValueA Y2ValueB Y2ValueC
1   1     c 2010   0.10   0.42   1.30   2011     0.12     0.58     0.81
2   1     c 2011   0.12   0.58   0.81   2012     0.15     0.38     1.07
3   5     c 2010   0.05   0.22   1.09   2011     0.27     0.27     1.42
4   5     c 2011   0.27   0.27   1.42   2012     0.10     0.20     1.19
5   8     f 2014   0.00   0.00   0.59   2015     0.11     0.11     0.66
6   9     c 2010   0.00   0.00   0.35   2011     0.00     0.00     0.30
7  12     c 2010   0.19   0.38   2.32   2011     0.23     0.31     2.45
8  13     c 2011   0.00   0.15   3.24   2012     0.08     0.17     2.45
9  14     c 2011   0.00   0.13   0.73   2012     0.00     0.04     0.50
10 15     d 2010   0.99   1.34   5.88   2011     0.87     1.18     5.01
11 15     d 2011   0.87   1.18   5.01   2012     0.95     1.29     6.09
12 15     d 2012   0.95   1.29   6.09   2013     0.84     1.15     6.08
13 15     d 2013   0.84   1.15   6.08   2014     1.05     1.49     4.98
14 15     d 2014   1.05   1.49   4.98   2015     0.77     1.07     5.12
15 15     e 2010   0.51   0.92   5.96   2011     0.71     0.98     5.27
16 15     e 2011   0.71   0.98   5.27   2012     0.97     1.06     5.44
17 15     e 2012   0.97   1.06   5.44   2013     1.38     1.74     5.50
18 15     e 2013   1.38   1.74   5.50   2014     0.53     0.75     3.92
19 15     e 2014   0.53   0.75   3.92   2015     1.04     1.33     5.70

I ended up with one more row than your output. Perhaps you missed a row?