0
votes

i have data set which contain gps (lon,lat) and poi(lon,lat)locations and i want(python code) to find the nearest points for all gps points to all poi locations within region 500,1000,2000 meter .

i tried the following:

knn in python but it is time consuming(convert the ponits to utm)

geopy.distance (but i can not do it within set of gps point and poi locations)

i found this solution in sql: but i want the solution in python my problem is how to get nearest distance between set of gps points and collection of Poi

SELECT z.zip,
        z.primary_city,
        z.latitude, z.longitude,
        p.distance_unit
                 * DEGREES(ACOS(COS(RADIANS(p.latpoint))
                 * COS(RADIANS(z.latitude))
                 * COS(RADIANS(p.longpoint) - RADIANS(z.longitude))
                 + SIN(RADIANS(p.latpoint))
                 * SIN(RADIANS(z.latitude)))) AS distance_in_km
  FROM zip AS z
  JOIN (   /* these are the query parameters */
        SELECT  42.81  AS latpoint,  -70.81 AS longpoint,
                50.0 AS radius,      111.045 AS distance_unit
    ) AS p ON 1=1
  WHERE z.latitude
     BETWEEN p.latpoint  - (p.radius / p.distance_unit)
         AND p.latpoint  + (p.radius / p.distance_unit)
    AND z.longitude
     BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
         AND p.longpoint + (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
  ORDER BY distance_in_km
  LIMIT 15
1

1 Answers

0
votes

to speed things up, it might be a good idea to do some preparations on your data. Note that there are several ways to tackle this. I'm assuming you want a simple solution without the use of third party libraries (PostGIS and geospatial indexes can be of great help here).

  1. If the amount of points is relatively small, you could consider pre-calculating all distances, store it in the table with both points as the primary key and the distance. Fast, but takes a lot of space, and it's not handy if your dataset is very dynamic.

  2. Another approach that you could take is to group your points into tiles. If you store that information along with your points, it allows you to consider nearby tiles only. Therefore, you'll have much fewer points to calculate.

  3. You can speed up the calculations themselves. In your specific code I see that you convert your coordinates to radians all the time. So you could consider storing them as radians so you don't need to do that same calculation again and again.