Retrieving multiple values from a large jsonb field faster (postgresql 9.4)

Question

tl;dr

Using PSQL 9.4, is there a way to retrieve multiple values from a jsonb field, such as you would with the imaginary function:

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

With the hope of speeding up the otherwise almost linear time required to select multiple values (1 value = 300ms, 2 values = 450ms, 3 values = 600ms)

Background

I have the following jsonb table:

CREATE TABLE "public"."analysis" (
  "date" date NOT NULL,
  "name" character varying (10) NOT NULL,
  "country" character (3) NOT NULL,
  "x" jsonb,
  PRIMARY KEY(date,name)
);

With roughly 100 000 rows where each rows has a jsonb dictionary with 90+ keys and corresponding values. I'm trying to write an SQL query to select a few (< 10) key+values in a fairly quick way (< 500 ms)

Index and querying: 190ms

I started by adding an index:

CREATE INDEX ON analysis USING GIN (x);

This makes querying based on values in the "x" dictionary fast, such as this:

SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

This takes ~190 ms (acceptable for us)

Retrieving dictionary values

However, once I start adding keys to return in the SELECT part, execution time rises almost linear:

1 value: 300ms

select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

Takes 366ms (+175ms)

select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

Takes 300ms (+110ms)

3 values: 600ms

select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

Takes 600ms (+410, or +100 for each value selected)

select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

Takes 600ms (+410, or +100 for each value selected)

Retrieving more values faster

Is there a way to retrieve multiple values from a jsonb field, such as you would with the imaginary function:

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

Which could possibly speed up these lookups. It can return them either as columns or as an list/array or even a json object.

Retrieving an array using PL/Python

Just for the heck of it I made a custom function using PL/Python, but that was much slower (5s+), possibly due to json.loads:

CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[])
RETURNS TEXT[] AS $$
  if not data:
    return []

  import simplejson as json
  j = json.loads(data) 

  l = []
  for i in k:
    l.append(j[i])

  return l

$$ LANGUAGE plpython2u;

# Usage:
# select retrieve_objects(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key']) from analysis  where date > '2014-01-01' and date < '2014-05-01'

Update 2015-05-21

I re-implemented the table using hstore with GIN index and the performance is almost identical to using jsonb, i.e not helpfull in my case.

Andomar Andomar · Accepted Answer · 2015-05-19T13:41:24

You're using the #> operator, which looks like it performs a path search. Have you tried a normal -> lookup? Like:

select  json_column->'json_field1'
,       json_column->'json_field2'

It would be interesting to see what happened if you used a temporary table. Like:

create temporary table tmp_doclist (doc jsonb)
;
insert  tmp_doclist
        (doc)
select  x
from    analysis
where   ... your conditions here ...
;
select  doc->'col1'
,       doc->'col2'
,       doc->'col3'
from    tmp_doclist
;

Retrieving multiple values from a large jsonb field faster (postgresql 9.4)

2 Answers