3
votes

I have a Django/PostgreSQL application that shows which users are nearest a particular user. It uses the PostGIS 2.0 KNN (K Nearest Neighbors) <-> operator in the ORDER BY clause to list the users, nearest first. What I've found with my initial dataset is that two of the search results are out of order (all distances are measured from Los Angeles, CA):

Member, City, State, Distance (miles)

user1, North Las Vegas, NV, 239
user2, Phoenix, AZ, 365
user3, Provo, UT, 568
user4, Twin Falls, ID, 630
user5, Albuquerque, NM, 673
user6, Portland, OR, 828
user7, Bozeman, MT, 896
user8, Seattle, WA, 962
user9, Boulder, CO, 834       <- Out of order!
user10, Laramie, WY, 862      <- Out of order!
user11, Naperville, IL, 1756

The Member name is just the username column from Django's contrib.auth.models User class. The UserAccount class which contains the geometry information is defined as follows:

class UserAccount(models.Model):
    user = models.OneToOneField(User, primary_key=True, unique=True)
    address_line_1 = models.CharField(max_length=30)
    address_line_2 = models.CharField(max_length=30, blank=True)
    city = models.CharField(max_length=30)
    region = models.CharField(max_length=30, blank=True)
    postal_code = models.CharField(max_length=10, blank=True)
    country = models.ForeignKey('Country')
    measurement_sys = models.CharField(max_length=5)  # US or Metric

    # User's home (default) and current longitude and latitude
    home_lon = models.FloatField(default=0.0)
    home_lat = models.FloatField(default=0.0)
    current_lon = models.FloatField(default=0.0)
    current_lat = models.FloatField(default=0.0)

    # GeoDjango-specific fields 
    home_point = models.PointField(srid=4326)
    current_point = models.PointField(srid=4326)
    objects = models.GeoManager()

Here's the query in my Django view:

def members(request, template):
    """View all members of the website."""
    uid = request.session['uid']   # PK from User table

    # Get the current user's lon/lat and measurement system
    try:
        ua = UserAccount.objects.get(user_id=uid)
        lon = ua.current_lon
        lat = ua.current_lat
        measurement_sys = ua.measurement_sys
    except UserAccount.DoesNotExist as e:
        return HttpResponseRedirect(reverse('unable-to-display-members'))

    # Define the proximity query.
    if measurement_sys == 'US':
        multiplier = 0.000621371  # Convert to miles
    else:
        multiplier = 0.001  # Convert to kilometers

    query = "SELECT \
                ua.user_id, \
                au.username, \
                ua.city, \
                ua.region, \
                ST_Distance( \
                    ua.current_point::geography, \
                    ST_GeographyFromText( \
                        'SRID=4326;POINT(" \
                            + str(lon) \
                            + " " \
                            + str(lat) + \
                        ")' \
                    ) \
                )*" + str(multiplier) + " AS distance \
            FROM \
                user_account ua \
                INNER JOIN \
                auth_user au \
                ON (ua.user_id = au.id) \
            WHERE ua.user_id != %s \
            ORDER BY \
                ua.current_point::geometry \
                <-> \
                'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry \
            LIMIT 250;"

    # Run the proximity query
    raw_queryset = UserAccount.objects.raw(query, [uid])

    # Paginate results
    user_list = [user for user in raw_queryset]
    list_size = len(list(user_list))
    paginator = Paginator(user_list, 10, 4)
    paginator._count = list_size

    page = request.GET.get('page')
    try:
        users = paginator.page(page)
    except PageNotAnInteger:
        users = paginator.page(1)
    except EmptyPage:
        users = paginator.page(paginator.num_pages)
    return render(request, template, {'users': users})

Am I doing anything wrong in my query? Can the KNN operator sometimes "hiccup" and return some results out of order? I ask this because when I tried taking the two out-of-order records out of my table but then adding additional records for users with addresses in that are even farther away (i.e. in IL, LA, MI, NC, PA, NY and ME), all the results were in the correct order.

By the way, my inputs are located here.

Thanks!

1
That is very odd. Could you maybe post some data, so we can test?John Powell
Sure. Here's the test data: pastebin.com/sGyiipWsJim

1 Answers

3
votes

UPDATED ANSWER:

Postgis has two approximate solutions for kNN neighbors functionality, since September 2011:

  • Using the <-> operator, you get the nearest neighbour using the centers of the bounding boxes to calculate the inter-object distances.
  • Using the <#> operator, you get the nearest neighbour using the bounding boxes themselves to calculate the inter-object distances.

Your problem is, that both are approximate, so they are not perfect. So, if you want the best 250 results, you can use any of them to retrieve for example the best 1000 results and then order the same results by ST_DISTANCE and LIMIT 250 to get the best 250 results out of those approximate 1000.

Example:

SELECT * FROM 
    (SELECT *,ST_DISTANCE(current_point::geography, 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geography ) AS st_dist
    FROM ua
    ORDER BY current_point::geometry <-> 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry 
    LIMIT 1000) AS s
    ORDER BY st_dist LIMIT 250;