Solr: the query phrase returns results for some cases and doesn't for some

Question

I get Solr results for following:

Sports
World Health Organisation
percent

but I don't get results for the below:

Sport (UK)
World Health Organisat
1-percent

All these are in the text field which definitely contains these phrases and i have used a ngram filter on the indexer so the combination do exist. While the analysis tab of the solr UI shows me exactly what i am expecting, i am not getting the required results on my java output.

My solrj code is as below:

query.setQuery("full_text:\"World Health Organisation\"");

Also, I have to add the \".."\ as I always get errors in my front end if I remove them and half the results I otherwise get also don't turn up.

Can someone help with what I might be missing?

Much thanks!

Edit Inclusion: Definition of full_text in schema.xml

<field name="full_text" type="text_en" indexed="true" stored="false" multiValued="true"/>   
   <copyField source="title" dest="full_text"/>
   <copyField source="content" dest="full_text"/>

   <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">>
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Solution: I figured out what the problem was. For cases of "Sports (UK)" and "1-percent", the tokeniser I was using was removing all special characters and so I have change my tokeniser. As for "World Health Organisation:, it was caused by the stemmer which changed Organisation to Organis and query like "Organisat" was kept as it is. Hence I did not get results. So I removed the stemmer as I am using a ngram filter.

Hope this helps others in the long run. :)

In the logs, you can check what is the exact query solr is receiving? Also you want to match the exact phrase that is why you are using double quotes. — YoungHobbit
The queries are exactly like the way it shows on the analyzer. And yes i am trying to get a exact match, but the issue is, I am getting the same results for "World Health Organisation" and "World Health Organis" but i am not getting results for "World Health Organisat". Why would that happen? — catchingPatterns
Also, since i am using quotes, which indicates exact match, why aren't the phrases like '1-percent' or 'Sport UK' getting displayed in the result set? — catchingPatterns
Please post your definition of "full_text" in your schema.xml — Karsten R.

catchingPatterns catchingPatterns · Accepted Answer · 2016-01-19T10:05:14

Figured out what the problem was. For cases of "Sports (UK)" and "1-percent", the tokeniser I was using was removing all special characters and so I have change my tokeniser. As for "World Health Organisation", it was caused by the stemmer which changed Organisation to Organis and query like "Organisat" was kept as it is. Hence I did not get results. So I removed the stemmer as I am using a ngram filter.

Solr: the query phrase returns results for some cases and doesn't for some

1 Answers