I have an ever-growing table called transactions
which grows ~10 million rows per month.
This table has a jsonb
column called extra
.
70% of extra
column of the transactions
records are NULL and the rest of them has a json value like so:
{
"lang": "en",
"pages": 3,
"message": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Egestas purus viverra accumsan in nisl nisi. Arcu cursus vitae congue mauris rhoncus aenean vel elit scelerisque. In egestas erat imperdiet sed euismod nisi porta lorem mollis. Morbi tristique senectus et netus. Mattis pellentesque id nibh tortor id aliquet lectus proin. Sapien faucibus et molestie ac feugiat sed lectus vestibulum..."
}
Notice: All extra
json keys are fixed for all rows and will not change.
An overview of transactions
table :
id | price | type | extra
-------------------------------------------
1 | 2000.00 | SMS | null
2 | 2000.00 | SMS | null
3 | 4000.00 | SMS | null
4 | 5000.00 | SMS | {"lang": "en", "pages":8, "message":"Lore..."}
5 | 4000.00 | SMS | null
6 | 4000.00 | SMS | null
7 | 5000.00 | SMS | {"lang": "de", "pages":5, "message":"Some..."}
Why I did that?
I'm using jsonb
column instead of three separate columns to avoid many NULL values.
With jsonb
I only have 30% NULLs only on 1 column, But when I use 3 separate columns instead of 1 jsonb column I have 30% NULL for each column.
The Question:
Is it good idea to split the my extra
column into 3 separate columns?
Like so:
id | price | type | lang | pages | message
--------------------------------------------
1 | 2000.00 | SMS | null | null | null
2 | 2000.00 | SMS | null | null | null
3 | 4000.00 | SMS | null | null | null
4 | 5000.00 | SMS | en | 8 | Lorem...
5 | 4000.00 | SMS | null | null | null
6 | 4000.00 | SMS | null | null | null
7 | 5000.00 | SMS | de | 5 | Some...
Or instead, I can add an extra table(for example transaction_info
) with One-To-One relationship. Like so:
transcations
id | price | type
-------------------
1 | 2000.00 | SMS
2 | 2000.00 | SMS
3 | 4000.00 | SMS
4 | 5000.00 | SMS
5 | 4000.00 | SMS
6 | 4000.00 | SMS
7 | 5000.00 | SMS
transaction_info
id | transaction_id | lang | pages | message
--------------------------------------------
1 | 4 | en | 8 | Lorem...
2 | 7 | de | 5 | Some...
With this approach I don't have any NULL values on both tables.
Which one do you prefer?