I have an 100X100 correlation matrix with zip codes as the column and row names. I also have a data frame that contains the latitude and longitude for all zipcdes and a function that calculates the distance based on lat and long.
Here is a snippet of the correlation matrix
08846 48186 90621 92602 92701 92702 92703 92705 92706 92712
08846 1.00000000 -0.18704668 0.17631080 -0.0195590 -0.08640209 -0.09109788 -0.04251868 -0.1586506 -0.0778115 -0.0572327
48186 -0.18704668 1.00000000 -0.09365048 0.1616530 0.20468051 0.17682056 0.18009911 0.1417840 0.1958971 0.1938676
90621 0.17631080 -0.09365048 1.00000000 0.5880756 0.75200501 0.74694849 0.76071605 0.6593806 0.7640519 0.7657806
92602 -0.01955900 0.16165299 0.58807565 1.0000000 0.88187818 0.88947447 0.89310793 0.9615530 0.8926566 0.8926482
92701 -0.08640209 0.20468051 0.75200501 0.8818782 1.00000000 0.99314798 0.98011569 0.9294281 0.9827633 0.9886139
92702 -0.09109788 0.17682056 0.74694849 0.8894745 0.99314798 1.00000000 0.98791442 0.9470895 0.9853157 0.9933086
92703 -0.04251868 0.18009911 0.76071605 0.8931079 0.98011569 0.98791442 1.00000000 0.9321385 0.9938496 0.9981231
92705 -0.15865058 0.14178399 0.65938061 0.9615530 0.92942815 0.94708954 0.93213849 1.0000000 0.9268797 0.9357917
92706 -0.07781150 0.19589706 0.76405191 0.8926566 0.98276329 0.98531570 0.99384961 0.9268797 1.0000000 0.9948550
92712 -0.05723270 0.19386757 0.76578065 0.8926482 0.98861389 0.99330864 0.99812312 0.9357917 0.9948550 1.0000000
Here is snippet of the table of zip codes
zip city state latitude longitude
1 00210 Portsmouth NH 43.0059 -71.0132
2 00211 Portsmouth NH 43.0059 -71.0132
3 00212 Portsmouth NH 43.0059 -71.0132
4 00213 Portsmouth NH 43.0059 -71.0132
5 00214 Portsmouth NH 43.0059 -71.0132
6 00215 Portsmouth NH 43.0059 -71.0132
And here is the function taht calculates distance bwteen lat and long.
Calc_Dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}
My goal here is to subset the correlation matrix to only include zip codes that are more than 500 miles apart (right now the distance calculation outputs in kilometers but that can be easily changed and is immaterial right now). The less expensive the better as I may have to do this with larger correlation matrices (~10000 x 10000). Any suggestions?
Thanks in advance, Ben