Postgres jsonb array join

Question

I have a jsonb document in a table. This document has an array of cameraIds in the document. I am trying to join this data with the cameras table that is a normal table where cameraId is a column, and return unique rows from the table with the jsonb column (why I am using a group by in my query).

Any advice on how to optimize this query for performance would be greatly appreciated.

JSONB Col Example:

{
  "date": {
    "end": "2018-11-02T22:00:00.000Z", 
    "start": "2018-11-02T14:30:00.000Z"
  }, 
  "cameraIds": [100, 101], 
  "networkId": 5, 
  "filters": [], 
  "includeUnprocessed": true, 
  "reason": "some reason", 
  "vehicleFilter": { 
    "bodyInfo": "something", 
    "lpInfo": "something"
  }
}

Query:

select ssr.id,
                a.name                                               as user_name,
                ssr.start_date,
                ssr.end_date,
                ssr.created_at,
                ssr.payload -> 'filters'                       as pretty_filters,
                ssr.payload -> 'reason'                              as reason,
                ssr.payload -> 'includePlates'                as include_plates,
                ssr.payload -> 'vehicleFilter' -> 'bodyInfo'         as vbf,
                ssr.payload -> 'vehicleFilter' -> 'lpInfo' as lpInfo,
                array_agg(n.name) filter (where n.organization_id = ${orgId})  as network_names,
                array_agg(c.name) filter (where n.organization_id = ${orgId})  as camera_names
from
    ssr
    cross join jsonb_array_elements(ssr.payload -> 'cameraIds') camera_id
         inner join cameras as c on c.id = camera_id::int
         inner join networks as n on n.id = c.network_id
         inner join accounts as a on ssr.account_id = a.id
where n.organization_id = ${someId}
and ssr.created_at between ${startDate} and ${endDat}
group by 1,2,3,4,5,6,7,8,9,10
order BY ssr.created_at desc
    OFFSET 0
LIMIT 25;

You're saying the query is working exactly as expected but performance is bad and you want us to help optimising that? — Bergi
For performance questions, please include the definitions of the involved tables and the indices on them as well as the query plan of your statement (ANALYZE). — Bergi
Your existing query requires reading every ssr.payload column to work. Since ssr has a created_at column, I suggest adding a table, ssr_camera with ssr_id, camera_id, and created_at as columns, and maintain it when you insert into ssr . Join through the new ssr_camera table in your query. — Mike Organek

Erwin Brandstetter Erwin Brandstetter · Accepted Answer · 2020-12-08T01:46:03

Your query says:

where n.organization_id = ${someId}

But then the aggregate FILTER says:

where n.organization_id = ${orgId}

... which is a contradiction. The aggregated arrays would always be empty - except where ${orgId} happens to be the same as ${someId}, but then the FILTER clause is useless noise. IOW, the query doesn't seem to make sense as given.

The query might make sense after dropping the aggregate FILTER clauses:

SELECT s.id
     , a.name                                      AS user_name
     , s.start_date
     , s.end_date
     , s.created_at
     , s.payload ->> 'filters'                     AS pretty_filters
     , s.payload ->> 'reason'                      AS reason
     , s.payload ->> 'includePlates'               AS include_plates
     , s.payload -> 'vehicleFilter' ->> 'bodyInfo' AS vbf
     , s.payload -> 'vehicleFilter' ->> 'lpInfo'   AS lpInfo
     , cn.camera_names
     , cn.network_names
FROM   ssr      s
JOIN   accounts a ON a.id = s.account_id  -- assuming referential integrity
CROSS  JOIN LATERAL (
   SELECT array_agg(c.name) AS camera_names   -- sort order?
        , array_agg(n.name) AS network_names  -- same order? distinct?
   FROM   jsonb_array_elements_text(ssr.payload -> 'cameraIds') i(camera_id)
   JOIN   cameras  c ON c.id = i.camera_id::int
   JOIN   networks n ON n.id = c.network_id
   WHERE  n.organization_id = ${orgId}
   ) cn
WHERE  s.created_at BETWEEN ${startDate} AND ${endDate}  -- ?
ORDER  BY s.created_at DESC NULLS LAST
LIMIT  25;

Key is the LATERAL subquery, which avoids duplication of rows from ssr, so we can also drop the outer GROUP BY. Should be considerably faster.

Also note ->> instead of -> and jsonb_array_elements_text(). See:

How to turn JSON array into Postgres array?

I left some question marks at more dubious spots in the query. Notably, BETWEEN is almost always the wrong tool for timestamps. See:

Subtract hours from the now() function

Postgres jsonb array join

1 Answers