Find best correlation between two vectors of data

Question

I am trying to find the best correlation (i.e. highest r squared value) between two lists of data within a specified range (i.e. find the range of 'x' values that have the best correlation with their corresponding 'y' values). Basically I am looking for the linear range in the data. This is what I have so far:

 #Example data - actually have a much more complicated data set
    x <- c(1,2,3,4,5,6,7,8,9)
    y <- c(0.25,1.5,3,4,5,6,6.5,7,7.5)
    data.range <- 0 #create a new variable which will contain the 
    r.sq <- 0
    for (i in 1:length(x)) {
      r.sq[i] <- round(cor(x[i:(i+5)], y[i:(i+5)],4)
      data.range[i] <- paste(x[i], x[i+5], sep = " - ")
      output <- data.frame(na.omit(cbind(data.range, r.sq)))
    }
#Example read out
head(output)
  data.range    r.sq
  1 - 6         0.9963
  2 - 7         0.9906
  3 - 8         0.9885
  4 - 9         0.9839

Here, I have the output set to give me a dataframe containing the ranges of 'x' data that are being correlated with the associated 'y', and the cor() value corresponding to that range of 'x' data. Right now, I am projecting the correlation between 'x' and 'y' using 5 points (hence the i+5), but in the end I don't want to have to define the "5" as the linear range may span 6 or 8 points. So I want to do all possible correlations of 'x' and 'y' and result a list of the ranges of data (data.range) with the corresponding cor() value (r.sq).

data.range     r.sq        
1 - 4          0.9999
1 - 5          0.9808
1 - 6          0.9805
1 - 7          etc...
1 - 8
1 - 9
2 - 5
2 - 6
2 - 7
2 - 8
etc....

Any advice is welcome!

you're already using a loop, why not use a nested loop for the second value in the range? — jwells
I feel like that's the solution, but I can't seem to make the code work. I'm pretty new to R, would you mind giving me an idea of how it would look? — Dorton

jwells jwells · Accepted Answer · 2017-04-08T10:35:13

Sure. You have an i loop that goes from 1 to the length(x). So:

for (i in 1:length(x)) {
    for (j in desired_start:desired_finish) {
        r.sq[i] <- cor(x[i:j], y[i:j], n)

You get the rest. There are more ways to do this, but if you're new this is a really nice start and you seem to have a nice grasp on loops. This will loop through i first and capture each possible value of j for each i

Find best correlation between two vectors of data

1 Answers