0
votes

I have a postgres table with a jsonb column that gets inserted/updated as soon as a status is reached. I would like to query the latest status with it's date.

Given I have the following jsonb status column.

{
 "pending": "2018-01-12T12:34:41.785945+00:00",
 "started":"2018-01-10T15:52:41.785945+00:00",
 "processed":"2018-01-18T12:52:41.785945+00:00"
}

What would be the query to get?

"processed":"2018-01-18T12:52:41.785945+00:00"...

Basically select the property with the max date, Is this possible at all? And if so how would look the query?

EDIT: the Status JSON can have different properties at different times. So it not always the 3 'pending','started','processed'. There could only be pending or in future others.... The question is the latest status with it's date.

1

1 Answers

0
votes

Let's create the table with couple of sample data.

DROP TABLE IF EXISTS test;

CREATE TABLE test(
id integer,  
status json  
);

INSERT INTO test VALUES (1, '{
 "pending": "2018-01-12T12:34:41.785945+00:00",
 "started":"2018-01-10T15:52:41.785945+00:00",
 "processed":"2018-01-18T12:52:41.785945+00:00"
}');
INSERT INTO test VALUES (2, '{
 "pending": "2018-01-12T12:31:41.785945+00:00",
 "started":"2018-01-20T15:55:41.785945+00:00",
 "processed":"2018-01-10T12:20:41.785945+00:00"
}');

I managed to get the max date property by extracting values from json to a separate table, and applying typical max function and case statement to get corresponding property of interest. This probably defeats the purpose of using json type (maybe try json_type(json) instead?)

--- unpack values and put them into different columns
WITH t1 AS(
SELECT id, status->>'pending' AS pending, 
      status->>'started' AS started, 
      status->>'processed' AS processed       
FROM test),
--- find maximum time corresponding to each_id
t2 AS (
SELECT id, GREATEST(t1.pending, t1.processed, t1.started) AS latest
FROM t1
)
--- join to get matching column that gave latest time
SELECT t1.id, CASE WHEN t1.pending = t2.latest THEN 'pending'
                  WHEN t1.started = t2.latest THEN 'started'
                  WHEN t1.processed = t2.latest THEN 'processed' 
                  END AS max_status,
              t2.latest AS max_time
FROM t1
JOIN t2
ON t1.id = t2.id
;

You can carry along other columns of interest for each id. This results

 id | max_status |             max_time             
----+------------+----------------------------------
  1 | processed  | 2018-01-18T12:52:41.785945+00:00
  2 | started    | 2018-01-20T15:55:41.785945+00:00
(2 rows)

EDIT:: After your edit, I made some changes to take care of arbitrary presence/absence of particular keys in your json field. Let's use json_each(json) to unpack key, value for each id to different columns

SELECT id, (json_each(status)).* 
FROM test
WHERE id=1;

 id |    key    |               value                
----+-----------+------------------------------------
  1 | pending   | "2018-01-12T12:34:41.785945+00:00"
  1 | started   | "2018-01-10T15:52:41.785945+00:00"
  1 | processed | "2018-01-18T12:52:41.785945+00:00"
(3 rows)

Not quite what we want but almost there. First thing to notice is the column names key, value which we can use to transform data. Second, values are json objects and therefore we have to take care of double quotations after casting to text. trim handles this nicely. Rest is to partition your data by id, get maximum time, and filter rows to get corresponding status.

WITH t1 AS(
    SELECT id, key as status, trim(both '"' from value::text) as time_of 
    FROM test, json_each(status)
),
t2 as(
SELECT id, status,
  to_timestamp(time_of, 'YYYY-MM-DD"T"HH24:MI:SS') as time_of,
  MAX(to_timestamp(time_of, 'YYYY-MM-DD"T"HH24:MI:SS')) 
        OVER(PARTITION by id) AS max_time       
FROM t1)
SELECT id, status, max_time 
FROM t2
WHERE time_of = max_time;