0
votes

I have a Postgres table with a large jsonb column.

CREATE TABLE mytable (id integer, my_jsonb jsonb)

The my_jsonb column contains data like this:

{
name: 'Bob',
city: 'Somecity',
zip: '12345'
}

The table contains several million rows.

I need to create facets, i.e. aggregations, on individual fields in our user interface. For example:

city        | count
New York    | 1000
Chicago     | 3000
Los Angeles | 4000
maybe 200 more values...

My current query, which yields the correct results, looks like this:

select my_jsonb->>'city', count(*)
from mytable
where foo='bar'
group by my_jsonb->>'city'
order by my_jsonb->>'city'

The problem is that it is painfully slow. It takes 5-10 seconds, depending on the particular column that I pick. It has to do a full table scan and extract each jsonb value, row by row.

Question: how do I create an index that does this query efficiently, and works no matter which jsonb field I choose?

A GIN index doesn't work. The query optimizer doesn't use it. Same for a simple BTREE on the jsonb column.

I'm thinking that there might be some kind of expression index, and I might be able to rewrite the facet query to use the expression, but I haven't figured it out.

Worst case, I can extract all of the values into a second table and index that, I'd prefer not to.

1

1 Answers

0
votes

Your only hope would be an index-only scan, but since that doesn't work with expression indexes, you're out. There is no way to avoid scanning the whole table and extracting the JSON values.

You'll have to extract the JSON values in a normalized form. This goes as a reminder that data models involving JSON are very often a bad choice in a relational database (although there are valid use cases).