how does Solr store documents

Question

I know Solr uses Lucene and Lucene uses an inverted index. But from the Lucene examples I have seen so far, I am not sure I understand how it woks in combination with Solr.

Given the following document:

<doc>
  <field name="id">9885A004</field>
  <field name="name">Canon PowerShot SD500</field>
  <field name="manu">Canon Inc.</field>
  <field name="inStock">true</field>
</doc>

From the examples I have seen so far, I would think that Lucene has to treat each field as a document. it would then say: the ord Cannon appears in field name and field manu.

Is the index broken down this much? Or does the index only say: "the word Canon appears in the document with id such and such"?

How does this work exactly when using Lucene with Solr? What would this document look like in the index? (supposing each field has indexed="true")

You can get a detailed rundown on how Lucene stores data through one of the presentations from Lucene/Solr Revolution in 2013. I'm not sure if it mentions DocValues, which is a column oriented storage as opposed to the regular, inverted index that speeds up certain operations as well. — MatsLindh

Alessandro Benedetti Alessandro Benedetti · Accepted Answer · 2017-11-27T12:06:27

I made a blog post few years ago, to explain that in details[1] .

Short answer to this question :

" From the examples I have seen so far, I would think that Lucene has to treat each field as a document."

Absolutely NOT. Lucene unit of information is the document which is composed by a map field -> value[s] . A Solr document is just a slightly different representation as Solr incorporate a schema where fields are described. So in Solr you can just add fields to the documents without having to describe the type and other properties ( which are stored in the schema), while in Lucene you need to define them explicitly when creating the doc.

[1] https://sease.io/2015/07/exploring-solr-internals-lucene.html

how does Solr store documents

1 Answers