0
votes

What's the best way in Solr/Lucene to index a text column to behave like a SQL LIKE '%foo% where the best matches come first? "Best" in my case means exact matches first, then fewer extra characters before more extra. For example

search for "1234" should return

  • 1234
  • 12345 (one extra char)
  • 01234
  • 123456 (two extra chars)
  • 001234567890
  • etc.

What I've tried so far and doesn't quite work (column mapped as text_en_splitting)

  • search for 1234 -> only exact matches
  • search for *1234* -> finds everything but doesn't score exact matches higher
  • search for 1234~ -> will match 12345 but doesn't score exact matches higher. Will NOT match longer strings like "001234567890".
1

1 Answers

0
votes

To score exact matches higher you might need to search in your field that will provide all matches and the second one what will produce only exact matches (with some boost).

To avoid using asterisks (they are slowing search down) in query you could use NGramFilterFactory in your schema.

Keeping that in mind your query might look like that:

q=1234&qf=ngram_text_field,simple_text_field^2&defType=edismax

More info about edismax