15
votes

I have 2 time series and I am using ccf to find the cross correlation between them. ccf(ts1, ts2) lists the cross-correlations for all time lags. How can I find the lag which results in maximum correlation without manually looking at the data?

4
Why don't you put that as an answer and credit the posters from the R help mailing list? - Roman Luštrik
yes I would have done so, but I do not have enough reputation points to answer my own question. - tan
Revisit the question when you have. :) - Roman Luštrik
@tan You can also mark your own answer as the correct one. And, as well as the link, I personally think it is nice to summarize what the answer was, to save Stackoverflowers an extra click. (I've edited your answer to show what I mean; no offence taken if you want to edit it back :-) - Darren Cook

4 Answers

20
votes

Posting the answer http://r.789695.n4.nabble.com/ccf-function-td2288257.html

Find_Max_CCF<- function(a,b)
{
 d <- ccf(a, b, plot = FALSE)
 cor = d$acf[,,1]
 lag = d$lag[,,1]
 res = data.frame(cor,lag)
 res_max = res[which.max(res$cor),]
 return(res_max)
} 
11
votes

I thought I'd redo the above function but have it find the absolute max correlation that returns the original correlation (positive or negative). I also maxed out (nearly) the number of lags.

Find_Abs_Max_CCF<- function(a,b)
{
 d <- ccf(a, b, plot = FALSE, lag.max = length(a)-5)
 cor = d$acf[,,1]
 abscor = abs(d$acf[,,1])
 lag = d$lag[,,1]
 res = data.frame(cor,lag)
 absres = data.frame(abscor,lag)
 absres_max = res[which.max(absres$abscor),]
 return(absres_max)
}
2
votes

Because 3 is more than 4, I also had a stab at modifying this function, this time by implementing an idea from here:

ccfmax <- function(a, b, e=0)
{
 d <- ccf(a, b, plot = FALSE, lag.max = length(a)/2)
 cor = d$acf[,,1]
 abscor = abs(d$acf[,,1])
 lag = d$lag[,,1]
 res = data.frame(cor, lag)
 absres = data.frame(abscor, lag)
 maxcor = max(absres$abscor)
 absres_max = res[which(absres$abscor >= maxcor-maxcor*e &
                        absres$abscor <= maxcor+maxcor*e),]
 return(absres_max)
}

Essentially an "error" term is added, so that if there are several values close to the maximum, they all get returned, eg:

ayy <- jitter(cos((1:360)/5), 100)
bee <- jitter(sin((1:360)/5), 100)

ccfmax(ayy, bee, 0.02)
           cor lag
348  0.9778319  -8
349  0.9670333  -7
363 -0.9650827   7
364 -0.9763180   8

If no value for e is given it is taken to be zero, and the function behaves just like the one nvogen posted.

1
votes

I've modified the original solution as well, in order to loop over the function and output the values corresponding to a character vector of indices (x):

abs.max.ccf <- function(x,a,b) {
  d <- ccf(a, b, plot=FALSE, lag.max=length(a)-5)
  cor <- d$acf[,,1]
  abscor <- abs(d$acf[,,1])
  lag <- d$lag[,,1]
  abs.cor.max <- abscor[which.max(abscor)]
  abs.cor.max.lag <- lag[which.max(abscor)]
  return(c(x, abs.cor.max, abs.cor.max.lag))
}

I removed the data.frame part within the function, as it is unnecessarily slow. To loop over each column in a data.frame and return the results to a new data.frame, I use this method:

max.ccf <- lapply(colnames(df), function(x) unlist(abs.max.ccf(x, df$y, df[x])))
max.ccf <- data.frame(do.call(rbind, max.ccf))
colnames(max.ccf) <- c('Index','Cor','Lag')