How can I set up an index in Postgres using PostGIS that calculates proximity efficiently?

Question

I have a table of users, each with latitude and longitude. I have something like a "distance tolerance" that each user can define. An easy way to think about this is to frame it as a dating app:

Alice will only date users within 30km

Using PostGIS, it's not too hard to set up a query that finds users nearby using a GIST index of the users location. Such an approach is outlined here: https://ngauthier.com/2013/08/postgis-and-rails-a-simple-approach.html, but here's the basics:

=> CREATE EXTENSION postgis;

=> CREATE TABLE users (
    id uuid DEFAULT public.gen_random_uuid() NOT NULL,
    latitude numeric(9,6),
    longitude numeric(9,6),
    distance int DEFAULT 20000
);

=> INSERT INTO users (latitude, longitude, distance) (
  SELECT
    40.7128 + x.lat AS latitude,
    -74.006 + x.lon AS longitude,
    10000 * (1 + floor(random() * 3)) AS distance
  FROM (
    SELECT random() * 2 as lat, random() * 2 as lon
    FROM generate_series(1,1000000)
  ) AS x);

=> CREATE INDEX index_on_users_location ON users
  USING gist (st_geographyfromtext((((('SRID=4326;POINT('::text
  || longitude) || ' '::text) || latitude) || ')'::text)));

At this point, it's easy enough to query for nearby users based on Alice's preference:

=> SELECT COUNT(*) FROM users WHERE
   ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' ||
      users.longitude || ' ' || users.latitude || ')'),
    ST_GeographyFromText('SRID=4326;POINT(' ||
      '-74.006 40.7128)'),
    30000);

Postgres uses the index and finds the rows. The problem is, if you want to also account for Bob's preferences, you need to use the minimum distance of the two to evaluate whether or not both users are close enough to each other.

Alice will only date users within 30km
Bob will only date users within 60km

My (naive) query to find Alice's possible matches now reads:

=> SELECT COUNT(*) FROM users WHERE
   ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' ||
      users.longitude || ' ' || users.latitude || ')'),
    ST_GeographyFromText('SRID=4326;POINT(' ||
      '-74.006 40.7128)'),
    LEAST(users.distance, 30000));

At which point, Postgres decides the index is no longer useful, and switches over to sequentially scanning the table. This is presumably because I need to compare each row and use whoever has the smaller distance tolerance. Is there a way I can store the distance along side the point? Maybe using a btree_gist? I'm relative sure that I can't "include" the distance along side, from the docs...

...A non-key column cannot be used in an index scan search qualification...

https://www.postgresql.org/docs/11/sql-createindex.html

JGH JGH · Accepted Answer · 2020-09-23T20:00:39

It won't use the index because the distance value is (at least could be) different for every row.

You can query with Alice's distance - and make use of the index, then refine using the other user distances.

something like:

SELECT COUNT(*) 
FROM users 
WHERE ST_DWithin(user.geog, alice.geog, 3000)
 AND  ST_DWithin(user.geog, alice.geog, user.distance)

If it doesn't work, you may have to split the query in two, using a materialized CTE first for Alice's distance, then for the user distance

PS: do yourself a favor and store the geography in its own column...

How can I set up an index in Postgres using PostGIS that calculates proximity efficiently?

2 Answers