Solr provides some data type out of box in managed schema for different languages such as English, French, Japanese etc.
We are using common data type "text_general" for fields declaration and using stopwards.txt for stopword filtering.
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="1"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
While sycing data to Solr core we are importing different languages text in the fields such as french, english, german etch.
My query is shall we use all different language stopwords into same "stopwards.txt" file or how solr use different language stopwords?
fieldname_en
,fieldname_jp
etc. – MatsLindhcurl http://your-solr/solr/your-core/schema/fieldtypes/text_cjk
andcurl http://your-solr/solr/your-core/schema/fieldtypes/text_en
– Hector Correa