1
votes

Couple questions about ES index structure please:

(1) Is _source a field in Lucene, if so, how does Lucene store it, would it be a Key-Value store rather than inverted index.

(2) Is ES _id a field in Lucene, or it's in other Key-Value storage. If I use md5 as my doc's id and also create md5 field in my doc, which one query would be faster, i.e search _id or search md5 faster?

(3) Is ES _type a field in Lucene, if so, why different _type in ES can have the same doc _id. Thanks in advance!

1

1 Answers

1
votes

(1) The _source field which contains the original JSON is stored so it can be fetched (via get requests, scripts, etc). However, it is not indexed, and thus not searchable.

(3) Each document has a _type and an _id field. Both together form the _uid field whose value is {type}#{id}. Both the _uid and the _type fields are indexed and can be used in queries, aggregations, scripts and sorting. The _uid field is also the reason why the same _id can be used in different _type (i.e. _uid will always be unique). However, the _id field is not indexed as its value can be derived from the _uid field.

(2) You can retrieve a document by its _id and that will always be faster than searching the document via any other field, whether that field is MD5 or not. It is also worth noting, though, that before ES 2.0, it was important to cleverly pick the right IDs for your documents. As of 2.0, it has become less of a concern and you can pick whatever ID you like.