Solr/Lucene - equivalent of LIKE '%foo%' with scoring

Question

What's the best way in Solr/Lucene to index a text column to behave like a SQL LIKE '%foo% where the best matches come first? "Best" in my case means exact matches first, then fewer extra characters before more extra. For example

search for "1234" should return

1234
12345 (one extra char)
01234
123456 (two extra chars)
001234567890
etc.

What I've tried so far and doesn't quite work (column mapped as text_en_splitting)

search for 1234 -> only exact matches
search for *1234* -> finds everything but doesn't score exact matches higher
search for 1234~ -> will match 12345 but doesn't score exact matches higher. Will NOT match longer strings like "001234567890".

Fuxi Fuxi · Accepted Answer · 2013-05-16T21:43:54

To score exact matches higher you might need to search in your field that will provide all matches and the second one what will produce only exact matches (with some boost).

To avoid using asterisks (they are slowing search down) in query you could use NGramFilterFactory in your schema.

Keeping that in mind your query might look like that:

q=1234&qf=ngram_text_field,simple_text_field^2&defType=edismax

More info about edismax

Solr/Lucene - equivalent of LIKE '%foo%' with scoring

1 Answers