2
votes

I have postgres (13.2) based API with RLS enabled (I use postgraphile) and it's extremely slow. User sends JWT from Google OAuth. Access to tables are based on roles (there are 2: person, admin) + RLS. I have 2 tables for users auth: person, person_groups

CREATE TABLE IF NOT EXISTS myschema.person_groups (
    id serial PRIMARY KEY,
    person_id citext NOT NULL REFERENCES myschema.person (id),
    google_id text NOT NULL REFERENCES myschema_private.person_account (google_id),
    group_id serial NOT NULL REFERENCES myschema.groups (id),
    updated_at timestamp DEFAULT now(),
    CONSTRAINT unq_person_id_group_id UNIQUE (person_id, group_id)
);

CREATE INDEX persongroups_google_group_idx ON myschema.person_groups (google_id, group_id);

For RLS to check I have function specified as:

CREATE OR REPLACE FUNCTION myschema.is_in_group (group_id int[])
  RETURNS boolean
  AS $$
  SELECT
    CASE WHEN current_setting('role', FALSE) = 'admin' THEN
      TRUE
    WHEN EXISTS (
      SELECT
        1
      FROM
        myschema.person_groups
      WHERE
        person_groups.group_id = ANY ($1) AND person_groups.google_id = current_setting('user.sub', TRUE)) THEN
      TRUE
    ELSE
      FALSE
    END
$$
LANGUAGE SQL
STABLE
STRICT
SECURITY DEFINER;

I have table: "gate_enterlogs", which user wants to access. RLS for this table is:

CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog
  FOR SELECT TO person
    USING (myschema.is_in_group (ARRAY[6, 1]));

If I use such code:

BEGIN;
SET local ROLE person;
SET local "user.sub" TO 'yyy';
EXPLAIN ANALYZE VERBOSE
SELECT COUNT(id) FROM myschema.gate_enterlog;
COMMIT;

I end up with:

Aggregate  (cost=23369.00..23369.01 rows=1 width=8) (actual time=2897.487..2897.487 rows=1 loops=1)
  Output: count(id)
  ->  Seq Scan on myschema.gate_enterlog  (cost=0.00..23297.08 rows=28769 width=4) (actual time=2897.484..2897.484 rows=0 loops=1)
        Output: id, person_id, checkpoint_time, direction, place
        Filter: is_in_group('{6,1}'::integer[])
        Rows Removed by Filter: 86308
Planning Time: 0.626 ms
Execution Time: 2897.567 ms

If I disable RLS policy:

CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog FOR SELECT TO person USING (TRUE);

Aggregate  (cost=1935.85..1935.86 rows=1 width=8) (actual time=17.671..17.672 rows=1 loops=1)
  Output: count(id)
  ->  Seq Scan on myschema.gate_enterlog  (cost=0.00..1720.08 rows=86308 width=4) (actual time=0.008..7.364 rows=86308 loops=1)
        Output: id, person_id, checkpoint_time, direction, place
Planning Time: 0.594 ms
Execution Time: 17.737 ms

Do you have any thoughts how can I optimize RLS so postgres would "remember" that user has privileges to access table. My only idea is to end up with USING (TRUE) for select and grant access once before calling query, but before going that way I hope that somebody can give me a hint what I did wrong

1
Hm, it's certainly weird that postgres doesn't realise is_in_group('{6,1}'::integer[]) is a constant expression that evaluates to FALSE, then skips scanning the table…Bergi
I don't know what kind of authentication mechanism you use, but for whole-table permissions (where the policy does not depend on values from the individual roles) normal user/group permissions seem to be more suited than RLS. Can you switch the database role depending on whether the user is part of groups 1 or 6? Then give only that role access (GRANT SELECT …) to the table, and deny it for normal users altogether.Bergi
I'm using this approach for security: graphile.org/postgraphile/security . Basically user fetch Google JWT token, sends it to NodeJS app, app checks JWT validity and if valid sends user_id to postgres which suppose to check is_in_group With this approach admin user can add groups, can add user to groups via api. With role approach as far as I can understand admin can't create 'roles' (as it's not normal table) via api. Anyway, I'm very curious what I'm doing wrong that after first run of is_in_group('{6,1}'::integer[]) value it's not cached against other rows.pszafer
I mean postgraphile doesn't only send the user_id to the database, it also gets the role from the JWT and uses that for the session. I don't know how you configure this with Google, but adding a user to a group would then need to be done via that (Google) API that issues the tokens. I don't see how adding new groups is relevant here, since the membership check is hardcoded anyway in your case. But yeah, this approach gets a bit more complicated if you also need dynamic groupsBergi
I had more dynamic approach of checking if user_sub is in group which admin applied, but as I'm trying to narrow down performance issue I created hard-coded group permissions and I'm out of ideas.pszafer

1 Answers

0
votes

I figured it out somehow. It seems that for some reason boolean function aren't optimised. I changed my auth function to:

CREATE OR REPLACE FUNCTION myschema.auth_group (group_id int[])
  RETURNS SETOF int
  AS $$
BEGIN
  IF current_setting('role', FALSE)  = 'admin' THEN
    RETURN QUERY SELECT 1;
  ELSIF EXISTS (SELECT 1 FROM myschema.person_groups
      WHERE person_groups.google_id = current_setting('user.sub', TRUE) AND person_groups.group_id = ANY ($1)) THEN
    RETURN QUERY SELECT 1;
  END IF;
END;
$$
LANGUAGE plpgsql
STABLE STRICT
SECURITY DEFINER;
CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog
  FOR SELECT TO person USING (EXISTS (SELECT myschema.auth_group (ARRAY[6, 1])));

With such function planner is efficient:

Aggregate  (cost=1827.97..1827.98 rows=1 width=8) (actual time=6.005..6.006 rows=1 loops=1)
  Output: count(gate_enterlog.id)
  InitPlan 1 (returns $0)
    ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=4) (actual time=0.158..0.159 rows=0 loops=1)
          Output: auth_group(current_setting('role'::text, false), current_setting('user.sub'::text, true), '{6,1}'::integer[])
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Seq Scan on mychema.gate_enterlog  (cost=0.00..1720.08 rows=43154 width=4) (actual time=6.002..6.002 rows=0 loops=1)
        Output: gate_enterlog.id, gate_enterlog.person_id, gate_enterlog.checkpoint_time, gate_enterlog.direction, gate_enterlog.place
        Filter: $0
        Rows Removed by Filter: 86308
Planning Time: 0.500 ms
Execution Time: 6.100 ms

Cost is pretty much the same as USING(TRUE) in RLS.