I am trying to make multi-language stemming working with the Solr. I have setup language detection with LangDetectLanguageIdentifierUpdateProcessorFactory
as per official Solr guides. The language is recognized and now I have a whole bunch of dynamic fields like:
- description_en
- description_de
- description_fr
- ...
which are properly stemmed.
The question now is how do I search across so many fields? Making a long query every time that will search across dozens possible language fields doesn't seem like a smart option. I have tried using copyField
like:
<copyField source="description_*" dest="text"/>
but stemming is being lost in the text
field when I do that.
The text
field is defined as solr.TextField
with solr.WhitespaceTokenizerFactory
. Maybe I am not setting up the text
field properly or how is this supposed to be done?
The original text is sent from the "source" field to the "dest" field, before any configured analyzers for the originating or destination field are invoked.
copyField will not take the tokens fromdescription_*
fields after all the analysis is done. It will take the inputs todescription_*
fields and apply the analysis defined for its own field type, which is just the TextField with white space tokenizer in your case. So copyField is not a solution for this. – aruncopyField
didn't work. The second link is also very helpful. So I see that at this time my only choice is to list all the possibledescription_[en|fr|de|...]
as list of fields to search on in each query. This is still ok I guess, I was just thinking that there were some other ways to do that. Thank you again for your help, Arun! – user2113581