2
votes

I have an application that accept free text searchs for users. Suppose an user writes "one two three" in a html input text, so my search URI is ".../solr/my_index/select?q=expressions:(one two three)...".

Documents are described in schema as follows:

<field name="id" type="int" indexed="true" stored="true" required="true" />
<field name="expressions" type="text_general" indexed="true" stored="true" multiValued="true" />

In "my_index" I have two documents indexed:

id:"1", expressions: ["seven one two three four five", "seven eight seven", "two six nine six"]

id:"2", expressions: ["one", "one two", "one two four", "four one two one"]

The result of the query is that document id=2 have bigger score because of more matches of the words "one" and "two". But I have more specific requirements: the SCORE must consider not match count, but "similarity in the search phrase". So, because the document id=1 has a value "seven one two three four five", with the "..one two three..." substring inside the value, and this is very similar to the phrase written by the user, document id=1 must have the bigger SCORE.

Can this be done? I am very new to SOLR/Lucene, so I don't know if I need to use an specific query parser, build a custom one...

Thanks.

2

2 Answers

0
votes

So basically your problems boils down to how to boost a document based on how early the match is. In Lucene there is support for SpanFirst query which gives this feature. Although there is an open ticket in Solr Jira regarding adding support for SpanFirst to Solr but I have not seen anything regarding this yet implemented. However you can look here for help on this.

0
votes

You can try with minimum match mm parameter with dismax/edismax Query Parser.

mm it indicates minimum number of clauses that must match in a query

solr request url with mm paramter.

Ex:

.../solr/my_index/select?q=expressions:(one two three)&defType=dismax&mm=3...