0
votes

I have a table of users, each with latitude and longitude. I have something like a "distance tolerance" that each user can define. An easy way to think about this is to frame it as a dating app:

Alice will only date users within 30km

Using PostGIS, it's not too hard to set up a query that finds users nearby using a GIST index of the users location. Such an approach is outlined here: https://ngauthier.com/2013/08/postgis-and-rails-a-simple-approach.html, but here's the basics:

=> CREATE EXTENSION postgis;

=> CREATE TABLE users (
    id uuid DEFAULT public.gen_random_uuid() NOT NULL,
    latitude numeric(9,6),
    longitude numeric(9,6),
    distance int DEFAULT 20000
);

=> INSERT INTO users (latitude, longitude, distance) (
  SELECT
    40.7128 + x.lat AS latitude,
    -74.006 + x.lon AS longitude,
    10000 * (1 + floor(random() * 3)) AS distance
  FROM (
    SELECT random() * 2 as lat, random() * 2 as lon
    FROM generate_series(1,1000000)
  ) AS x);

=> CREATE INDEX index_on_users_location ON users
  USING gist (st_geographyfromtext((((('SRID=4326;POINT('::text
  || longitude) || ' '::text) || latitude) || ')'::text)));

At this point, it's easy enough to query for nearby users based on Alice's preference:

=> SELECT COUNT(*) FROM users WHERE
   ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' ||
      users.longitude || ' ' || users.latitude || ')'),
    ST_GeographyFromText('SRID=4326;POINT(' ||
      '-74.006 40.7128)'),
    30000);

Postgres uses the index and finds the rows. The problem is, if you want to also account for Bob's preferences, you need to use the minimum distance of the two to evaluate whether or not both users are close enough to each other.

Alice will only date users within 30km
Bob will only date users within 60km

My (naive) query to find Alice's possible matches now reads:

=> SELECT COUNT(*) FROM users WHERE
   ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' ||
      users.longitude || ' ' || users.latitude || ')'),
    ST_GeographyFromText('SRID=4326;POINT(' ||
      '-74.006 40.7128)'),
    LEAST(users.distance, 30000));

At which point, Postgres decides the index is no longer useful, and switches over to sequentially scanning the table. This is presumably because I need to compare each row and use whoever has the smaller distance tolerance. Is there a way I can store the distance along side the point? Maybe using a btree_gist? I'm relative sure that I can't "include" the distance along side, from the docs...

...A non-key column cannot be used in an index scan search qualification...

https://www.postgresql.org/docs/11/sql-createindex.html

2

2 Answers

3
votes

It won't use the index because the distance value is (at least could be) different for every row.

You can query with Alice's distance - and make use of the index, then refine using the other user distances.

something like:

SELECT COUNT(*) 
FROM users 
WHERE ST_DWithin(user.geog, alice.geog, 3000)
 AND  ST_DWithin(user.geog, alice.geog, user.distance)

If it doesn't work, you may have to split the query in two, using a materialized CTE first for Alice's distance, then for the user distance

PS: do yourself a favor and store the geography in its own column...

2
votes

Perhaps the best solution is to just apply two conditions anded together.

SELECT COUNT(*) FROM users WHERE
   ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' || users.longitude || ' ' || users.latitude || ')'),
              ST_GeographyFromText('SRID=4326;POINT('|| '-74.006 40.7128)'),
               30000) 
   AND 
   ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' || users.longitude || ' ' || users.latitude || ')'),
           ST_GeographyFromText('SRID=4326;POINT('|| '-74.006 40.7128)'),
           users.distance);

(But really you should probably be using ST_makepoint, rather converting into and out of text all the time).

This will use the index on the condition which has the hard-coded distance to get a list of candidates, and then will iterate over them to individually filter out ones which fail to meet the other condition.

Is there a way I can store the distance along side the point?

You could index a circle which has a radius of a user-defined amount around each user's point location, but it is not clear what you do with that index. It would be useful for some queries, but probably not the one you want to run.

CREATE INDEX index_on_users_datable_circle ON users
  USING gist (st_buffer(ST_GeographyFromText('SRID=4326;POINT(' || users.longitude || ' ' || users.latitude || ')'),users.distance));