0
votes

I'm trying to create a new table with a list of each address in another table and it's 10 nearest addreses. Each record has latitude and longitude. Clearly running all against all is much more work than is needed (several million records).

However it occurred to me if I resorted/indexed the table using latitude, longitude it would then sort by distance automatically and I could just step through each record doing ID+/- 25.

However this still leaves me with the question; what would the query to find the distance between two records each having latitude/longitude look like?

1
Why are you creating a new table with a hundred million records, instead of just querying the ten nearest for a given address when needed? Changes to the address table will result in mass recomputation of the derived table.Mike DeSimone

1 Answers

0
votes

If I were you, I'd have indexes on the latitude and longitude. Then I'd do something like SELECT COUNT(*) FROM table WHERE (table.latitude BETWEEN (query_lat - x) AND (query_lat + x)) AND (table.longitude BETWEEN (query_long - y) AND (query_long + y));, refining x and y each time until the count got down around 100 or so records. (Maybe I'd store the final x and y values into the record for the query address so I don't have to do so many queries next time.) Then I'd run the query for all the columns, not just COUNT(*), to get the actual latitude and longitude and maybe use a heap queue to get the 25 shortest distances.

The real difficult thing here is that the relationship between longitude and distance depends on latitude as well; they're not independent. I'm not sure putting a spherical coordinate transformation into the query is a great idea.

Your use-latitude/longitude-for-id trick isn't going to work because it just won't sort right. For example, that sort would tell you Houston is closer to San Antonio than Austin is.