We know that elasticsearch using Lucene or the famous search engine Google will keep the offset distance of the words in the indexed document for better results. Both of the above mentioned software perform indexing and searching on a very large amount of data. What is the special index (or data structure) or algorithm for efficient and fast internally? And what about the cost (time and space)? Is there a web page or document that explains the word offset distance-based algorithm used by Google or elasticsearch (lucene)? Below is a picture of what I would like to create myself.

match_phrasequery which pretty much implements those needs. Use a slop of 2 to make sure to match different orders. - Val