PostgreSQL just introduced JSONB and it's already trending on hacker news. It would be great if someone could explain how it's different from Hstore and JSON previously present in PostgreSQL. What are its advantages and limitations and when should someone consider using it?
9 Answers
First, hstore
is a contrib module, which only allows you to store key => value pairs, where keys and values can only be text
s (however values can be sql NULL
s too).
Both json
& jsonb
allows you to store a valid JSON value (defined in its spec).
F.ex. these are valid JSON representations: null
, true
, [1,false,"string",{"foo":"bar"}]
, {"foo":"bar","baz":[null]}
- hstore
is just a little subset compared to what JSON is capable (but if you only need this subset, it's fine).
The only difference between json
& jsonb
is their storage:
json
is stored in its plain text format, whilejsonb
is stored in some binary representation
There are 3 major consequences of this:
jsonb
usually takes more disk space to store thanjson
(sometimes not)jsonb
takes more time to build from its input representation thanjson
json
operations take significantly more time thanjsonb
(& parsing also needs to be done each time you do some operation at ajson
typed value)
When jsonb
will be available with a stable release, there will be two major use cases, when you can easily select between them:
- If you only work with the JSON representation in your application, PostgreSQL is only used to store & retrieve this representation, you should use
json
. - If you do a lot of operations on the JSON value in PostgreSQL, or use indexing on some JSON field, you should use
jsonb
.
Peeyush:
The short answer is:
- If you are doing a lot of JSON manipulation inside PostgreSQL, such as sorting, slicing, splicing, etc., you should use JSONB for speed reasons.
- If you need indexed lookups for arbitrary key searches on JSON, then you should use JSONB.
- If you are doing neither of the above, you should probably use JSON.
- If you need to preserve key ordering, whitespace, and duplicate keys, you should use JSON.
For a longer answer, you'll need to wait for me to do a full "HowTo" writeup closer to the 9.4 release.
A simple explanation of the difference between json and jsonb (original image by PostgresProfessional):
SELECT '{"c":0, "a":2,"a":1}'::json, '{"c":0, "a":2,"a":1}'::jsonb;
json | jsonb
------------------------+---------------------
{"c":0, "a":2,"a":1} | {"a": 1, "c": 0}
(1 row)
- json: textual storage «as is»
- jsonb: no whitespaces
- jsonb: no duplicate keys, last key win
- jsonb: keys are sorted
More in speech video and slide show presentation by jsonb developers. Also they introduced JsQuery, pg.extension provides powerful jsonb query language
hstore
is more of a "wide column" storage type, it is a flat (non-nested) dictionary of key-value pairs, always stored in a reasonably efficient binary format (a hash table, hence the name).json
stores JSON documents as text, performing validation when the documents are stored, and parsing them on output if needed (i.e. accessing individual fields); it should support the entire JSON spec. Since the entire JSON text is stored, its formatting is preserved.jsonb
takes shortcuts for performance reasons: JSON data is parsed on input and stored in binary format, key orderings in dictionaries are not maintained, and neither are duplicate keys. Accessing individual elements in the JSONB field is fast as it doesn't require parsing the JSON text all the time. On output, JSON data is reconstructed and initial formatting is lost.
IMO, there is no significant reason for not using jsonb
once it is available, if you are working with machine-readable data.
JSONB is a "better" version of JSON.
Let's look at an example:
SELECT '{"c":0, "a":2,"a":1}'::json, '{"c":0, "a":2,"a":1}'::jsonb;
json | jsonb
------------------------+---------------------
{"c":0, "a":2,"a":1} | {"a": 1, "c": 0}
(1 row)
- JSON stores white space, they is why we can see spaces when key "a" is stored, while JSONB does not.
- JSON stores all the values of key. This is the reason you can see multiple values (2 and 1) against the key "a" , while JSONB only "stores" the last value.
- JSON maintains the order in which elements are inserted, while JSONB maintains the "sorted" order.
- JSONB objects are stored as decompressed binary as opposed to "raw data" in JSON , where no reparsing of data is required during retrieval.
- JSONB also supports indexing, which can be a significant advantage.
In general, one should prefer JSONB , unless there are specialized needs, such as legacy assumptions about ordering of object keys.
Regarding the differences between json
and jsonb
datatypes, it worth mentioning the official explanation:
PostgreSQL offers two types for storing JSON data:
json
andjsonb
. To implement efficient query mechanisms for these data types, PostgreSQL also provides the jsonpath data type described in Section 8.14.6.The
json
andjsonb
data types accept almost identical sets of values as input. The major practical difference is one of efficiency. Thejson
data type stores an exact copy of the input text, which processing functions must reparse on each execution; whilejsonb
data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.jsonb
also supports indexing, which can be a significant advantage.Because the
json
type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast,jsonb
does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.In general, most applications should prefer to store JSON data as
jsonb
, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.PostgreSQL allows only one character set encoding per database. It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Attempts to directly include characters that cannot be represented in the database encoding will fail; conversely, characters that can be represented in the database encoding but not in UTF8 will be allowed.
Source: https://www.postgresql.org/docs/current/datatype-json.html
I was at the pgopen today benchmarks are way faster than mongodb, I believe it was around 500% faster for selects. Pretty much everything was faster at least by at 200% when contrasted with mongodb, than one exception right now is a update which requires completely rewriting the entire json column something mongodb handles better.
The gin indexing on on jsonb sounds amazing.
Also postgres will persist types of jsonb internally and basically match this with types such as numeric, text, boolean etc.
Joins will also be possible using jsonb
Add PLv8 for stored procedures and this will basically be a dream come true for node.js developers.
Being it's stored as binary jsonb will also strip all whitespace, change the ordering of properties and remove duplicate properties using the last occurance of the property.
Besides the index when querying against a jsonb column contrasted to a json column postgres doesn't have to actually run the functionality to convert the text to json on every row which will likely save a vast amount of time alone.
Another important difference, that wasn't mentioned in any answer above, is that there is no equality operator for json
type, but there is one for jsonb
.
This means that you can't use DISTINCT
keyword when selecting this json
-type and/or other fields from a table (you can use DISTINCT ON
instead, but it's not always possible because of cases like this).
As far as I can tell,
hstore as it currently exists (in Postgresql 9.3) does not allow for nesting other objects and arrays as the values of its key/value pairs. however a future hstore patch will allow for nesting. this patch will not be in the 9.4 release and may not be included any time soon.
json as it currently exists does allow for nesting, but is text-based, and does not allow for indexing, thus it is "slow"
jsonb that will be released with 9.4 will have the current nesting capabilities of json, as well as the GIN/GIST indexing of hstore, so it will be fast
People working on postgresql 9.4 seem to be saying that the new, fast jsonb type will appeal to people who would have chosen to use a noSQL data store like MongoDB, but can now combine a relational database with query-able unstructured data under one roof
http://www.databasesoup.com/2014/02/why-hstore2jsonb-is-most-important.html
Benchmarks of postgresql 9.4 jsonb seem to be on par with or in some cases faster than MongoDB
http://texture.io/alphabetum/postgresql-incl-hstore-vs-mongodb