0
votes

I have a jsonb document in a table. This document has an array of cameraIds in the document. I am trying to join this data with the cameras table that is a normal table where cameraId is a column, and return unique rows from the table with the jsonb column (why I am using a group by in my query).

Any advice on how to optimize this query for performance would be greatly appreciated.

JSONB Col Example:

{
  "date": {
    "end": "2018-11-02T22:00:00.000Z", 
    "start": "2018-11-02T14:30:00.000Z"
  }, 
  "cameraIds": [100, 101], 
  "networkId": 5, 
  "filters": [], 
  "includeUnprocessed": true, 
  "reason": "some reason", 
  "vehicleFilter": { 
    "bodyInfo": "something", 
    "lpInfo": "something"
  }
}

Query:

select ssr.id,
                a.name                                               as user_name,
                ssr.start_date,
                ssr.end_date,
                ssr.created_at,
                ssr.payload -> 'filters'                       as pretty_filters,
                ssr.payload -> 'reason'                              as reason,
                ssr.payload -> 'includePlates'                as include_plates,
                ssr.payload -> 'vehicleFilter' -> 'bodyInfo'         as vbf,
                ssr.payload -> 'vehicleFilter' -> 'lpInfo' as lpInfo,
                array_agg(n.name) filter (where n.organization_id = ${orgId})  as network_names,
                array_agg(c.name) filter (where n.organization_id = ${orgId})  as camera_names
from
    ssr
    cross join jsonb_array_elements(ssr.payload -> 'cameraIds') camera_id
         inner join cameras as c on c.id = camera_id::int
         inner join networks as n on n.id = c.network_id
         inner join accounts as a on ssr.account_id = a.id
where n.organization_id = ${someId}
and ssr.created_at between ${startDate} and ${endDat}
group by 1,2,3,4,5,6,7,8,9,10
order BY ssr.created_at desc
    OFFSET 0
LIMIT 25;
1
You're saying the query is working exactly as expected but performance is bad and you want us to help optimising that?Bergi
For performance questions, please include the definitions of the involved tables and the indices on them as well as the query plan of your statement (ANALYZE).Bergi
Your existing query requires reading every ssr.payload column to work. Since ssr has a created_at column, I suggest adding a table, ssr_camera with ssr_id, camera_id, and created_at as columns, and maintain it when you insert into ssr . Join through the new ssr_camera table in your query.Mike Organek
So do you have your answer?Erwin Brandstetter

1 Answers

0
votes

Your query says:

where n.organization_id = ${someId}

But then the aggregate FILTER says:

where n.organization_id = ${orgId}

... which is a contradiction. The aggregated arrays would always be empty - except where ${orgId} happens to be the same as ${someId}, but then the FILTER clause is useless noise. IOW, the query doesn't seem to make sense as given.

The query might make sense after dropping the aggregate FILTER clauses:

SELECT s.id
     , a.name                                      AS user_name
     , s.start_date
     , s.end_date
     , s.created_at
     , s.payload ->> 'filters'                     AS pretty_filters
     , s.payload ->> 'reason'                      AS reason
     , s.payload ->> 'includePlates'               AS include_plates
     , s.payload -> 'vehicleFilter' ->> 'bodyInfo' AS vbf
     , s.payload -> 'vehicleFilter' ->> 'lpInfo'   AS lpInfo
     , cn.camera_names
     , cn.network_names
FROM   ssr      s
JOIN   accounts a ON a.id = s.account_id  -- assuming referential integrity
CROSS  JOIN LATERAL (
   SELECT array_agg(c.name) AS camera_names   -- sort order?
        , array_agg(n.name) AS network_names  -- same order? distinct?
   FROM   jsonb_array_elements_text(ssr.payload -> 'cameraIds') i(camera_id)
   JOIN   cameras  c ON c.id = i.camera_id::int
   JOIN   networks n ON n.id = c.network_id
   WHERE  n.organization_id = ${orgId}
   ) cn
WHERE  s.created_at BETWEEN ${startDate} AND ${endDate}  -- ?
ORDER  BY s.created_at DESC NULLS LAST
LIMIT  25;

Key is the LATERAL subquery, which avoids duplication of rows from ssr, so we can also drop the outer GROUP BY. Should be considerably faster.

Also note ->> instead of -> and jsonb_array_elements_text(). See:

I left some question marks at more dubious spots in the query. Notably, BETWEEN is almost always the wrong tool for timestamps. See: