1
votes

G'day All,

I have two sets of point data, one with 24 locations and the other with ~16,000. I want to calculate the distance from each of the 24 points to the 16,000 points. This is easy to do with pointDistance() in R raster package, but I can't tell which pairs are being compared. I would like to create a data.frame which has the location names for each comparison so I can combine it with the distances.

a <- data.frame('lon' = c(1,5,55,31), 'lat' = c(3,7,20,22), 'loc' = c('a', 'b', 'c', 'd'))
b <- data.frame('lon' = c(4,2,8,65), 'lat' = c(50,-90,20,32), 'loc' = c('e', 'f', 'g', 'h'))   
dist <- function(x, y){
 for( i in 1:length(a$lon)){
    my_vector <- vector(mode = "numeric", length = 0)
    d <- pointDistance(cbind(x[i,'lon'], x[i,'lat']), cbind(y$lon, y$lat), lonlat=TRUE)
    my_vector <- c(my_vector, d)
    }
    my_vector
 }

I am having a very slow morning and can't figure out why my function above is not outputting each combination. It isn't adding each iteration of d to my_vector.

At the same time I would like to include the pairwise combination of locations that produced each distance measure, eg:

  loc1   loc2   dist
     a      e     10
     a      f     16
     a      g     12
     a      h     19
     b      e     15
     b      f     17
     b      g     14
     b      h     13
     c      e     11
   etc    etc    etc

I am sorry to bother you with this, any help would be really appreciated.

Thanks in advance.

Adam

2

2 Answers

0
votes

You are over-writing the result with a zero-length vector with every iteration. Declare your variable outside the loop:

dist <- function(x, y){my_vector <- vector(mode = "numeric", length = 0)
 for( i in 1:length(a$lon)){
    d <- pointDistance(cbind(x[i,'lon'], x[i,'lat']), 
                       cbind(y$lon, y$lat), lonlat=TRUE)
    my_vector <- rbind(my_vector, d)
    }
    my_vector
 }
> dist(a,b)
     [,1]     [,2]    [,3]    [,4]
d 5239684 10352713 2039490 7401111
d 4787647 10797991 1482960 6785859
d 5571477 12245144 4899364 1666956
d 3909092 12467783 2398381 3534050

I think rbind is a better "binder" in this case than is c()

0
votes

plyr Assuming that the values of "loc" are unique...

fdist <- function(p1,p2) { pointDistance(cbind(a$lon[a$loc==p1], a$lat[a$loc==p1]),
                                     cbind(b$lon[b$loc==p2], b$lat[b$loc==p2]), lonlat=TRUE)}
d <- ddply(expand.grid(a=a$loc,b=b$loc),.(a,b),mutate,dist=fdist(a,b))

produces:

   a b      dist
1  a e  47.09565
2  a f  93.00538
3  a g  18.38478
4  a h  70.26379
5  b e  43.01163
6  b f  97.04638
7  b g  13.34166
8  b h  65.00000
9  c e  59.16925
   etc...

The values are different because I used the flat Pythagorian distance for pointDistance.