We have a SOLR v.5.5.0 server that we have loaded with documents. Each of the SOLR fields are copied into a composite field that we want to search against.
For example in our schema we have:
<field name="Key" type="int" indexed="true" stored="true" required="true"/>
<field name="_version_" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="Name" type="text_suggest_ngram" indexed="true" stored="true" required="false"/>
<field name="EmailAddress" type="text_email" indexed="true" stored="true" required="false"/>
<field name="Indexing" type="text_suggest_ngram" indexed="true" stored="true" multiValued="true"/>
There are about 20 different fields. Each field is copied into the index:
<copyField source="Key" dest="Indexing"/>
<copyField source="Name" dest="Indexing"/>
<copyField source="EmailAddress" dest="Indexing"/>
The custom field type is given the following tokenisers:
<fieldType name="text_email" class="solr.TextField"/>
<fieldType name="text_suggest_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
</analyzer>
</fieldType>
Hence the Indexing field becomes a multi-value field. We use this field to do the searches on as we have a general search functionality that we should be able to search across all fields.
When we import data into SOLR and then do a search, some records work as expected. For example, if we search for an email address (e.g. select?q=Indexing%3Asomeone%40example.com), SOLR provides the correct document back.
However, on other documents, SOLR provides 0 results when searching (esp. on email addresses). What we see is a search for [email protected] SOLR finds no documents, but changing the query to secondexample SOLR finds the document. Changing the query to secondexample@e SOLR finds no documents. If we do a field search against the field EmailAddress (select?q=EmailAddress%3Asecondexample%40example.com) then the search succeeds as expected.
We don't want to encode the search for specific named fields as the field names are subject to change and changing our search service each time is undesirable.
Is there anyway to find out why SOLR does not search multi-value fields correctly?
Update Sample JSON document (content fuzzed for security)"
{
"Phone": "555",
"IndexText": [
"555",
"7854",
"",
"Main App",
"16",
"Life MTG L",
"New MTG LL",
"Application",
"574",
"574",
"[email protected]",
"",
"",
"M M S N",
"Open",
"P",
"3876 K E 4 O N W 2619 S B",
"",
"A",
"6055 C P E 32 L S C P B G 1501 S B",
"S I N",
"1597456 1254735"
],
"Id": "7854",
"Name": "Open",
"WP": "",
"OK": "16",
"HP": "574",
"LK": 1048808,
"FN": "",
"PN": "",
"TN": "",
"FN2": "MS",
"LN2": "M M S N",
"CL": "2",
"Type": "P",
"Laddr": "3876 K E 4 O N W 2619 S B",
"EmailAddress": "[email protected]",
"LES": "A",
"PA": "6055 C P E 32 L S C P B G 1501 S B",
"LIT": "S I N",
"S": "N",
"Acc": "1597456 1254735",
"_version_": "1557490405902123010",
"score": 11.771251
}
The fields and content has been edited from real data, but it gives the idea. The field names and content are longer words. This is taken from the SOLR admin search interface.
maxGramSize. - femtoRgon