Is there an index structure (data structure) or algorithm that performs proximity search efficiently and quickly?

Question

We know that elasticsearch using Lucene or the famous search engine Google will keep the offset distance of the words in the indexed document for better results. Both of the above mentioned software perform indexing and searching on a very large amount of data. What is the special index (or data structure) or algorithm for efficient and fast internally? And what about the cost (time and space)? Is there a web page or document that explains the word offset distance-based algorithm used by Google or elasticsearch (lucene)? Below is a picture of what I would like to create myself.

You should try the match_phrase query which pretty much implements those needs. Use a slop of 2 to make sure to match different orders. — Val
I appreciate your answer, but what I'm trying to do is not using elasticsearch to make something. The only thing I want to do is manually create the data structures (index structures) or algorithms mentioned in the questions (which works efficiently for very large numbers of documents). — 전원표
I would be very grateful if you could tell me about the internal structure or the location of the documents related to it. — 전원표

nicemayi nicemayi · Accepted Answer · 2017-07-12T01:14:38

0

votes

Check TF-IDF https://en.wikipedia.org/wiki/Tf-idf This pretty much it.

Is there an index structure (data structure) or algorithm that performs proximity search efficiently and quickly?

1 Answers