0
votes

I have an ever-growing table called transactions which grows ~10 million rows per month.

This table has a jsonb column called extra.

70% of extra column of the transactions records are NULL and the rest of them has a json value like so:

{
   "lang": "en",
   "pages": 3,
   "message": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Egestas purus viverra accumsan in nisl nisi. Arcu cursus vitae congue mauris rhoncus aenean vel elit scelerisque. In egestas erat imperdiet sed euismod nisi porta lorem mollis. Morbi tristique senectus et netus. Mattis pellentesque id nibh tortor id aliquet lectus proin. Sapien faucibus et molestie ac feugiat sed lectus vestibulum..."
}

Notice: All extra json keys are fixed for all rows and will not change.

An overview of transactions table :

id |  price  | type |    extra    
-------------------------------------------
 1 | 2000.00 | SMS  |     null    
 2 | 2000.00 | SMS  |     null    
 3 | 4000.00 | SMS  |     null    
 4 | 5000.00 | SMS  | {"lang": "en", "pages":8, "message":"Lore..."}
 5 | 4000.00 | SMS  |     null    
 6 | 4000.00 | SMS  |     null    
 7 | 5000.00 | SMS  | {"lang": "de", "pages":5, "message":"Some..."}

Why I did that?

I'm using jsonb column instead of three separate columns to avoid many NULL values.

With jsonb I only have 30% NULLs only on 1 column, But when I use 3 separate columns instead of 1 jsonb column I have 30% NULL for each column.

The Question:

Is it good idea to split the my extra column into 3 separate columns?

Like so:

id |  price  | type | lang  | pages |  message       
--------------------------------------------
 1 | 2000.00 | SMS  |  null |  null | null
 2 | 2000.00 | SMS  |  null |  null | null
 3 | 4000.00 | SMS  |  null |  null | null
 4 | 5000.00 | SMS  |  en   |   8   | Lorem...
 5 | 4000.00 | SMS  |  null |  null | null
 6 | 4000.00 | SMS  |  null |  null | null
 7 | 5000.00 | SMS  |  de   |   5   | Some...

Or instead, I can add an extra table(for example transaction_info) with One-To-One relationship. Like so:

transcations

id |  price  | type
-------------------
 1 | 2000.00 | SMS 
 2 | 2000.00 | SMS 
 3 | 4000.00 | SMS 
 4 | 5000.00 | SMS 
 5 | 4000.00 | SMS 
 6 | 4000.00 | SMS 
 7 | 5000.00 | SMS 

transaction_info

id |  transaction_id  | lang  | pages |  message       
--------------------------------------------
 1 |       4          |   en  |   8   |  Lorem...
 2 |       7          |   de  |   5   |  Some...

With this approach I don't have any NULL values on both tables.

Which one do you prefer?

1

1 Answers

2
votes

You should to read some about normal forms - 1. NF says - every value is atomic. That expects so any attribute has own column - it is usually good idea (when number of attributes are less than 50). NULL value needs only 1bite - and probably storing data in clean relational 1NF is more effective than in JSON format.

So, because your new columns are only three, then my reply to your question is yes. It is good idea.

The second question one or two tables - there is not clean reply - both variants are correct from relation model perspective. If there is visible separation in reality - there are two entities, then I prefer two tables. Elsewhere (and when number of columns is small) I prefer one table.