Trie fields make range queries faster by precomputing certain range results and storing them as a single record in the index. For clarity, my example will use integers in base ten. The same concept applies to all trie types. This includes dates, since a date can be represented as the number of seconds since, say, 1970.
Let's say we index the number 12345678
. We can tokenize this into the following tokens.
12345678
123456xx
1234xxxx
12xxxxxx
The 12345678
token represents the actual integer value. The tokens with the x
digits represent ranges. 123456xx
represents the range 12345600
to 12345699
, and matches all the documents that contain a token in that range.
Notice how in each token on the list has successively more x
digits. This is controlled by the precision step. In my example, you could say that I was using a precision step of 2, since I trim 2 digits to create each extra token. If I were to use a precision step of 3, I would get these tokens.
12345678
12345xxx
12xxxxxx
A precision step of 4:
12345678
1234xxxx
A precision step of 1:
12345678
1234567x
123456xx
12345xxx
1234xxxx
123xxxxx
12xxxxxx
1xxxxxxx
It's easy to see how a smaller precision step results in more tokens and increases the size of the index. However, it also speeds up range queries.
Without the trie field, if I wanted to query a range from 1250 to 1275, Lucene would have to fetch 25 entries (1250
, 1251
, 1252
, ..., 1275
) and combine search results. With a trie field (and precision step of 1), we could get away with fetching 8 entries (125x
, 126x
, 1270
, 1271
, 1272
, 1273
, 1274
, 1275
), because 125x
is a precomputed aggregation of 1250
- 1259
. If I were to use a precision step larger than 1, the query would go back to fetching all 25 individual entries.
Note: In reality, the precision step refers to the number of bits trimmed for each token. If you were to write your numbers in hexadecimal, a precision step of 4 would trim one hex digit for each token. A precision step of 8 would trim two hex digits.