2
votes

I get Solr results for following:

  • Sports
  • World Health Organisation
  • percent

but I don't get results for the below:

  • Sport (UK)
  • World Health Organisat
  • 1-percent

All these are in the text field which definitely contains these phrases and i have used a ngram filter on the indexer so the combination do exist. While the analysis tab of the solr UI shows me exactly what i am expecting, i am not getting the required results on my java output.

My solrj code is as below:

query.setQuery("full_text:\"World Health Organisation\"");

Also, I have to add the \".."\ as I always get errors in my front end if I remove them and half the results I otherwise get also don't turn up.

Can someone help with what I might be missing?

Much thanks!

Edit Inclusion: Definition of full_text in schema.xml

<field name="full_text" type="text_en" indexed="true" stored="false" multiValued="true"/>   
   <copyField source="title" dest="full_text"/>
   <copyField source="content" dest="full_text"/>

   <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">>
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Solution: I figured out what the problem was. For cases of "Sports (UK)" and "1-percent", the tokeniser I was using was removing all special characters and so I have change my tokeniser. As for "World Health Organisation:, it was caused by the stemmer which changed Organisation to Organis and query like "Organisat" was kept as it is. Hence I did not get results. So I removed the stemmer as I am using a ngram filter.

Hope this helps others in the long run. :)

1
In the logs, you can check what is the exact query solr is receiving? Also you want to match the exact phrase that is why you are using double quotes.YoungHobbit
The queries are exactly like the way it shows on the analyzer. And yes i am trying to get a exact match, but the issue is, I am getting the same results for "World Health Organisation" and "World Health Organis" but i am not getting results for "World Health Organisat". Why would that happen?catchingPatterns
Also, since i am using quotes, which indicates exact match, why aren't the phrases like '1-percent' or 'Sport UK' getting displayed in the result set?catchingPatterns
Please post your definition of "full_text" in your schema.xmlKarsten R.
I've updated the question with the definition.catchingPatterns

1 Answers

0
votes

Figured out what the problem was. For cases of "Sports (UK)" and "1-percent", the tokeniser I was using was removing all special characters and so I have change my tokeniser. As for "World Health Organisation", it was caused by the stemmer which changed Organisation to Organis and query like "Organisat" was kept as it is. Hence I did not get results. So I removed the stemmer as I am using a ngram filter.