I have indexed my documents text using the following config in solr:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> -->
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<field name="desc" type="text_general" indexed="true" stored="true" multiValued="false"/>
And a test query
desc:Alabama Crimson Tide Toddler Crimson Team Logo Flannel Pajama Pants
Returns the first 2 results that look like:
{
"id":"_:node1b897e5ffccc354e5da5128066e2e9e4|https://www.crookscountry.com/product/alabama-greatest-hits",
"name":"Alabama - Greatest Hits",
"source_entity_index":"prod03",
"category":"",
"category_str":"",
"desc":"Alabama ~ Alabama - Greatest Hits",
"host":"www.crookscountry.com",
"url":"https://www.crookscountry.com/product/alabama-greatest-hits",
"_version_":1652845859059007489},
{
"id":"_:noded8c4ca8e98bb12e1132af18c76f277b|https://shop.spreadshirt.com/thatshirtcray/amateur+sketch+shirt-A12174934",
"name":"Amateur Sketch Shirt | Men's T-Shirt",
"source_entity_index":"prod03",
"category":"",
"category_str":"",
"desc":"Leprechaun in Alabama amateur sketch.",
"host":"shop.spreadshirt.com",
"url":"https://shop.spreadshirt.com/thatshirtcray/amateur+sketch+shirt-A12174934",
"_version_":1652846254331265025},
But the documents I really want to rank high are ranked even after top 100, e.g.:
{
"id":"_:nodec65a89504cb5f3af808caf654ac7cb72|http://shop.rolltide.com/Alabama_Crimson_Tide_Sweatshirts_And_Fleece_Sweaters",
"host":"shop.rolltide.com",
"name":"Men's Crimson Alabama Crimson Tide Big Logo Sweater",
"text":"Show off your team spirit with this Alabama Crimson Tide Big Logo sweater.",
"_version_":1646377538225700866},
{
"id":"_:nodeebc0adb5a11937556ebdf77132fab580|http://shop.foxsports.com/FOX_Alabama_Crimson_Tide_Sweaters_And_Dress_Shirts",
"host":"shop.foxsports.com",
"name":"Men's Crimson Alabama Crimson Tide Big Logo Sweater",
"text":"Show off your team spirit with this Alabama Crimson Tide Big Logo sweater.",
"_version_":1646383652576165892},
I do not quite understand how the default solr ranking works... it seems that it favours short text, even if there is only one overlapping word with the query. Is there anyway I can change this based on my needs?
Much appreciated!
desc:Alabama Crimson Tide Toddler Crimson Team Logo Flannel Pajama Pants
searches forAlabama
indesc
, but the rest of the terms are searched in the default search field. Seeing as the two documents you want higher doesn't even have adesc
field, it's hard to say exactly why the score is what it is - appenddebug=all
to your query to see how each document is scored (i.e. which terms contribute what to the total score). Using theedismax
handler (defType=edismax
) withqf
and explicit field weights usually give you a better result. – MatsLindh