I am trying to find the best correlation (i.e. highest r squared value) between two lists of data within a specified range (i.e. find the range of 'x' values that have the best correlation with their corresponding 'y' values). Basically I am looking for the linear range in the data. This is what I have so far:
#Example data - actually have a much more complicated data set
x <- c(1,2,3,4,5,6,7,8,9)
y <- c(0.25,1.5,3,4,5,6,6.5,7,7.5)
data.range <- 0 #create a new variable which will contain the
r.sq <- 0
for (i in 1:length(x)) {
r.sq[i] <- round(cor(x[i:(i+5)], y[i:(i+5)],4)
data.range[i] <- paste(x[i], x[i+5], sep = " - ")
output <- data.frame(na.omit(cbind(data.range, r.sq)))
}
#Example read out
head(output)
data.range r.sq
1 - 6 0.9963
2 - 7 0.9906
3 - 8 0.9885
4 - 9 0.9839
Here, I have the output set to give me a dataframe containing the ranges of 'x' data that are being correlated with the associated 'y', and the cor() value corresponding to that range of 'x' data. Right now, I am projecting the correlation between 'x' and 'y' using 5 points (hence the i+5), but in the end I don't want to have to define the "5" as the linear range may span 6 or 8 points. So I want to do all possible correlations of 'x' and 'y' and result a list of the ranges of data (data.range) with the corresponding cor() value (r.sq).
data.range r.sq
1 - 4 0.9999
1 - 5 0.9808
1 - 6 0.9805
1 - 7 etc...
1 - 8
1 - 9
2 - 5
2 - 6
2 - 7
2 - 8
etc....
Any advice is welcome!