61
votes

I want to be able to get a estimate of the distance between two (latitude, longitude) points. I want to undershoot, as this will be for A* graph search and I want it to be fast. The points will be at most 800 km apart.

8
Should we infer these points lie on a sphere?phs
Yes, on earth, but speed. AFAIK complex math is not fast enough.fread2281
I suggest you measure first before concluding it's not fast enough.phs
Sometimes it's possible to know enough about an implementation and algorithm to know performance won't be good enough even prior to benchmarking. For instance, one case where the haversine distance method isn't appropriate is when attempting to match large datasets on proximity, as the haversine algorithm doesn't allow any predicate pushdowns or partition matching in most querying engines. We found that leveraging approximate distances with pushdowns to produce a cartesian clustering base took ~1/50th the time on a 250k record dataset. The accepted answer would take over a week to run here.bsplosion

8 Answers

118
votes

The answers to Haversine Formula in Python (Bearing and Distance between two GPS points) provide Python implementations that answer your question.

Using the implementation below I performed 100,000 iterations in less than 1 second on an older laptop. I think for your purposes this should be sufficient. However, you should profile anything before you optimize for performance.

from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371* c
    return km

To underestimate haversine(lat1, long1, lat2, long2) * 0.90 or whatever factor you want. I don't see how introducing error to your underestimation is useful.

42
votes

Since the distance is relatively small, you can use the equirectangular distance approximation. This approximation is faster than using the Haversine formula. So, to get the distance from your reference point (lat1/lon1) to the point your are testing (lat2/lon2), use the formula below. Important Note: you need to convert all lat/lon points to radians:

R = 6371  // radius of the earth in km
x = (lon2 - lon1) * cos( 0.5*(lat2+lat1) )
y = lat2 - lat1
d = R * sqrt( x*x + y*y )

Since 'R' is in km, the distance 'd' will be in km.

Reference: http://www.movable-type.co.uk/scripts/latlong.html

8
votes

One idea for speed is to transform the long/lat coordinated into 3D (x,y,z) coordinates. After preprocessing the points, use the Euclidean distance between the points as a quickly computed undershoot of the actual distance.

4
votes

If the distance between points is relatively small (meters to few km range) then one of the fast approaches could be

from math import cos, sqrt
def qick_distance(Lat1, Long1, Lat2, Long2):
    x = Lat2 - Lat1
    y = (Long2 - Long1) * cos((Lat2 + Lat1)*0.00872664626)  
    return 111.319 * sqrt(x*x + y*y)

Lat, Long are in radians, distance in km.

Deviation from Haversine distance is in the order of 1%, while the speed gain is more than ~10x.

0.00872664626 = 0.5 * pi/180,

111.319 - is the distance that corresponds to 1degree at Equator, you could replace it with your median value like here https://www.cartographyunchained.com/cgsta1/ or replace it with a simple lookup table.

3
votes

For maximal speed, you could create something like a rainbow table for coordinate distances. It sounds like you already know the area that you are working with, so it seems like pre-computing them might be feasible. Then, you could load the nearest combination and just use that.

For example, in the continental United States, the longitude is a 55 degree span and latitude is 20, which would be 1100 whole number points. The distance between all the possible combinations is a handshake problem which is answered by (n-1)(n)/2 or about 600k combinations. That seems pretty feasible to store and retrieve. If you provide more information about your requirements, I could be more specific.

1
votes

You can use cdist from scipy spacial distance class:

For example:

from scipy.spatial.distance import cdist 
df1_latlon = df1[['lat','lon']]
df2_latlon = df2[['lat', 'lon']]
distanceCalc = cdist(df1_latlon, df2_latlon, metric=haversine)
0
votes

To calculate a haversine distance between 2 points u can simply use mpu.haversine_distance() library, like this:

>>> import mpu
>>> munich = (48.1372, 11.5756)
>>> berlin = (52.5186, 13.4083)
>>> round(mpu.haversine_distance(munich, berlin), 1)
>>> 504.2
-1
votes

Please use the following code.

def distance(lat1, lng1, lat2, lng2):
    #return distance as meter if you want km distance, remove "* 1000"
    radius = 6371 * 1000 

    dLat = (lat2-lat1) * math.pi / 180
    dLng = (lng2-lng1) * math.pi / 180

    lat1 = lat1 * math.pi / 180
    lat2 = lat2 * math.pi / 180

    val = sin(dLat/2) * sin(dLat/2) + sin(dLng/2) * sin(dLng/2) * cos(lat1) * cos(lat2)    
    ang = 2 * atan2(sqrt(val), sqrt(1-val))
    return radius * ang