3
votes

I want to create a field that will only match if the document's value for that field matches the query term with no additions. For instance, a query for "john" should only return results where the name is "john", not "johnson", "johns", etc.

I've seen other posts about exact matching in solr, and the prevailing answer seems to be to create a new field in schema.xml with type string. I've tried it, but that approach seems to also match when the exact query is contained within a field (results containing "johnson" still appear with the query "john").

The schema has fields lastName and lastName_ngram (which we're currently searching with):

<field name="lastName_ngram"        type="text_token_ngram"     indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true"/>
<fieldType name="text_token_ngram" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer  class="solr.StandardTokenizerFactory"/>
            <filter     class="solr.LowerCaseFilterFactory"/>
            <filter     class="solr.TrimFilterFactory"/>
            <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="1"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer  class="solr.StandardTokenizerFactory"/>
            <filter     class="solr.LowerCaseFilterFactory"/>
            <filter     class="solr.TrimFilterFactory"/>
        </analyzer>
    </fieldType>

<field name="lastName"              type="text_token"           indexed="true" stored="true"  omitNorms="true" omitTermFreqAndPositions="true"/>
<fieldType name="text_token" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer  class="solr.KeywordTokenizerFactory"/>
            <filter     class="solr.LowerCaseFilterFactory"/>
            <filter     class="solr.TrimFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer  class="solr.KeywordTokenizerFactory"/>
            <filter     class="solr.LowerCaseFilterFactory"/>
            <filter     class="solr.TrimFilterFactory"/>
        </analyzer>
    </fieldType>

And I'd like to include a field lastNameExact so that documents that exactly match the entire field can be boosted:

<field name="lastNameExact"         type="string"               indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true"/>
<copyField source="lastName"      dest="lastNameExact"/>

Is there a modification I can make to this so that the lastNameExact field will only hit on documents containing a field with the entirety of the search query?

1
Strange! String fields should give you exact case-sensitive match, so if you search for lastNameExact:john it should not return documents with last names johnson, etc.,arun

1 Answers

5
votes

I could propose you a fix for that. Do not use type string for lastNameExact and use exact_match field type instead.

<fieldType name="exact_match" class="solr.TextField" positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.TrimFilterFactory"/>
            </analyzer>
 </fieldType>

Copy field should remain the same.

Link for working schema.xml - https://github.com/MysterionRise/information-retrieval-adventure/blob/dadb683820fe4f1eaf6081185a933a28a5e1e481/lucene5/src/main/resources/solr/cores/test/conf/schema.xml