399
votes

PostgreSQL just introduced JSONB and it's already trending on hacker news. It would be great if someone could explain how it's different from Hstore and JSON previously present in PostgreSQL. What are its advantages and limitations and when should someone consider using it?

9
From PGCon2014: youtube.com/…msanford
@CraigRinger url is not precise enough, now, 1 year later it doesn't even point close enough to JSONB-related content.berkus
@berkus I thought I linked to the specific post. How frustrating.Craig Ringer
It does point to the specific video.talonx

9 Answers

518
votes

First, hstore is a contrib module, which only allows you to store key => value pairs, where keys and values can only be texts (however values can be sql NULLs too).

Both json & jsonb allows you to store a valid JSON value (defined in its spec).

F.ex. these are valid JSON representations: null, true, [1,false,"string",{"foo":"bar"}], {"foo":"bar","baz":[null]} - hstore is just a little subset compared to what JSON is capable (but if you only need this subset, it's fine).

The only difference between json & jsonb is their storage:

  • json is stored in its plain text format, while
  • jsonb is stored in some binary representation

There are 3 major consequences of this:

  • jsonb usually takes more disk space to store than json (sometimes not)
  • jsonb takes more time to build from its input representation than json
  • json operations take significantly more time than jsonb (& parsing also needs to be done each time you do some operation at a json typed value)

When jsonb will be available with a stable release, there will be two major use cases, when you can easily select between them:

  1. If you only work with the JSON representation in your application, PostgreSQL is only used to store & retrieve this representation, you should use json.
  2. If you do a lot of operations on the JSON value in PostgreSQL, or use indexing on some JSON field, you should use jsonb.
156
votes

Peeyush:

The short answer is:

  • If you are doing a lot of JSON manipulation inside PostgreSQL, such as sorting, slicing, splicing, etc., you should use JSONB for speed reasons.
  • If you need indexed lookups for arbitrary key searches on JSON, then you should use JSONB.
  • If you are doing neither of the above, you should probably use JSON.
  • If you need to preserve key ordering, whitespace, and duplicate keys, you should use JSON.

For a longer answer, you'll need to wait for me to do a full "HowTo" writeup closer to the 9.4 release.

97
votes

A simple explanation of the difference between json and jsonb (original image by PostgresProfessional):

SELECT '{"c":0,   "a":2,"a":1}'::json, '{"c":0,   "a":2,"a":1}'::jsonb;

          json          |        jsonb 
------------------------+--------------------- 
 {"c":0,   "a":2,"a":1} | {"a": 1, "c": 0} 
(1 row)
  • json: textual storage «as is»
  • jsonb: no whitespaces
  • jsonb: no duplicate keys, last key win
  • jsonb: keys are sorted

More in speech video and slide show presentation by jsonb developers. Also they introduced JsQuery, pg.extension provides powerful jsonb query language

59
votes
  • hstore is more of a "wide column" storage type, it is a flat (non-nested) dictionary of key-value pairs, always stored in a reasonably efficient binary format (a hash table, hence the name).
  • json stores JSON documents as text, performing validation when the documents are stored, and parsing them on output if needed (i.e. accessing individual fields); it should support the entire JSON spec. Since the entire JSON text is stored, its formatting is preserved.
  • jsonb takes shortcuts for performance reasons: JSON data is parsed on input and stored in binary format, key orderings in dictionaries are not maintained, and neither are duplicate keys. Accessing individual elements in the JSONB field is fast as it doesn't require parsing the JSON text all the time. On output, JSON data is reconstructed and initial formatting is lost.

IMO, there is no significant reason for not using jsonb once it is available, if you are working with machine-readable data.

38
votes

JSONB is a "better" version of JSON.

Let's look at an example:

SELECT '{"c":0,   "a":2,"a":1}'::json, '{"c":0,   "a":2,"a":1}'::jsonb;
          json          |        jsonb 
------------------------+--------------------- 
 {"c":0,   "a":2,"a":1} | {"a": 1, "c": 0} 
(1 row)
  1. JSON stores white space, they is why we can see spaces when key "a" is stored, while JSONB does not.
  2. JSON stores all the values of key. This is the reason you can see multiple values (2 and 1) against the key "a" , while JSONB only "stores" the last value.
  3. JSON maintains the order in which elements are inserted, while JSONB maintains the "sorted" order.
  4. JSONB objects are stored as decompressed binary as opposed to "raw data" in JSON , where no reparsing of data is required during retrieval.
  5. JSONB also supports indexing, which can be a significant advantage.

In general, one should prefer JSONB , unless there are specialized needs, such as legacy assumptions about ordering of object keys.

19
votes

Regarding the differences between json and jsonb datatypes, it worth mentioning the official explanation:

PostgreSQL offers two types for storing JSON data: json and jsonb. To implement efficient query mechanisms for these data types, PostgreSQL also provides the jsonpath data type described in Section 8.14.6.

The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

PostgreSQL allows only one character set encoding per database. It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Attempts to directly include characters that cannot be represented in the database encoding will fail; conversely, characters that can be represented in the database encoding but not in UTF8 will be allowed.

Source: https://www.postgresql.org/docs/current/datatype-json.html

14
votes

I was at the pgopen today benchmarks are way faster than mongodb, I believe it was around 500% faster for selects. Pretty much everything was faster at least by at 200% when contrasted with mongodb, than one exception right now is a update which requires completely rewriting the entire json column something mongodb handles better.

The gin indexing on on jsonb sounds amazing.

Also postgres will persist types of jsonb internally and basically match this with types such as numeric, text, boolean etc.

Joins will also be possible using jsonb

Add PLv8 for stored procedures and this will basically be a dream come true for node.js developers.

Being it's stored as binary jsonb will also strip all whitespace, change the ordering of properties and remove duplicate properties using the last occurance of the property.

Besides the index when querying against a jsonb column contrasted to a json column postgres doesn't have to actually run the functionality to convert the text to json on every row which will likely save a vast amount of time alone.

9
votes

Another important difference, that wasn't mentioned in any answer above, is that there is no equality operator for json type, but there is one for jsonb.

This means that you can't use DISTINCT keyword when selecting this json-type and/or other fields from a table (you can use DISTINCT ON instead, but it's not always possible because of cases like this).

6
votes

As far as I can tell,

  • hstore as it currently exists (in Postgresql 9.3) does not allow for nesting other objects and arrays as the values of its key/value pairs. however a future hstore patch will allow for nesting. this patch will not be in the 9.4 release and may not be included any time soon.

  • json as it currently exists does allow for nesting, but is text-based, and does not allow for indexing, thus it is "slow"

  • jsonb that will be released with 9.4 will have the current nesting capabilities of json, as well as the GIN/GIST indexing of hstore, so it will be fast

People working on postgresql 9.4 seem to be saying that the new, fast jsonb type will appeal to people who would have chosen to use a noSQL data store like MongoDB, but can now combine a relational database with query-able unstructured data under one roof

http://www.databasesoup.com/2014/02/why-hstore2jsonb-is-most-important.html

Benchmarks of postgresql 9.4 jsonb seem to be on par with or in some cases faster than MongoDB

http://texture.io/alphabetum/postgresql-incl-hstore-vs-mongodb