I'm new to Solr and I want to understand exactly how it indexes documents.
Let's say I have a 100 MB document (document1) full of text. The text is not structured, it's just raw text. I send that document to Solr in order to be indexed.
As far as I understood, Lucene will parse the document, extract all the words, based on the default schema (let's assume we're using the default schema), and create an index that is basically a mapping between a word and a list of documents, like so:
word1 -> [document1]
word2 -> [document1]
etc
Now, if I want to search for the word "word1", Solr will give me the entire 100 MB document that contains the word "word1", correct?
Please correct me if I'm wrong, I need to understand exactly how it works.