I have a table of users, each with latitude and longitude. I have something like a "distance tolerance" that each user can define. An easy way to think about this is to frame it as a dating app:
Alice will only date users within 30km
Using PostGIS, it's not too hard to set up a query that finds users nearby using a GIST index of the users location. Such an approach is outlined here: https://ngauthier.com/2013/08/postgis-and-rails-a-simple-approach.html, but here's the basics:
=> CREATE EXTENSION postgis;
=> CREATE TABLE users (
id uuid DEFAULT public.gen_random_uuid() NOT NULL,
latitude numeric(9,6),
longitude numeric(9,6),
distance int DEFAULT 20000
);
=> INSERT INTO users (latitude, longitude, distance) (
SELECT
40.7128 + x.lat AS latitude,
-74.006 + x.lon AS longitude,
10000 * (1 + floor(random() * 3)) AS distance
FROM (
SELECT random() * 2 as lat, random() * 2 as lon
FROM generate_series(1,1000000)
) AS x);
=> CREATE INDEX index_on_users_location ON users
USING gist (st_geographyfromtext((((('SRID=4326;POINT('::text
|| longitude) || ' '::text) || latitude) || ')'::text)));
At this point, it's easy enough to query for nearby users based on Alice's preference:
=> SELECT COUNT(*) FROM users WHERE
ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' ||
users.longitude || ' ' || users.latitude || ')'),
ST_GeographyFromText('SRID=4326;POINT(' ||
'-74.006 40.7128)'),
30000);
Postgres uses the index and finds the rows. The problem is, if you want to also account for Bob's preferences, you need to use the minimum distance of the two to evaluate whether or not both users are close enough to each other.
Alice will only date users within 30km
Bob will only date users within 60km
My (naive) query to find Alice's possible matches now reads:
=> SELECT COUNT(*) FROM users WHERE
ST_DWithin(ST_GeographyFromText('SRID=4326;POINT(' ||
users.longitude || ' ' || users.latitude || ')'),
ST_GeographyFromText('SRID=4326;POINT(' ||
'-74.006 40.7128)'),
LEAST(users.distance, 30000));
At which point, Postgres decides the index is no longer useful, and switches over to sequentially scanning the table. This is presumably because I need to compare each row and use whoever has the smaller distance tolerance. Is there a way I can store the distance along side the point? Maybe using a btree_gist
? I'm relative sure that I can't "include" the distance along side, from the docs...
...A non-key column cannot be used in an index scan search qualification...