25
votes

I have a rather simple SOLR structure, that hold three different fields:

id, text and tags

in the schema.xml I set the following

<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="tags" dest="text"/>

However, when I search a word that only appears as a tag, then the document is not found.

My question here is: does copyField happen before any analyzer runs (index and query) as described here or just before the query analyzer?


EDIT

the analyzer def:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />              
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />              
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

and the field-type definitions (they are pretty much as the default configs):

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>

and last the field definitions:

<fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="text" type="text" indexed="true" stored="false" multiValued="true" />
    <field name="tags" type="text" indexed="false" stored="false" />
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="tags" dest="text"/>
3
Be carefull with defaultSearchField : "It is preferable to not use or rely on this setting; instead the request handler or query LocalParams for a search should specify the default field(s) to search on. This setting here can be omitted and it is being considered for deprecation." From the documentation : wiki.apache.org/solr/SchemaXml#The_Default_Search_FieldErowlin

3 Answers

32
votes

The copyField is done when a document is indexed, so it is before the index analyzer. It is really like you had put the same input text in two different fields. But after that, it all depends on the analyzers you defined for both fields.

3
votes

If you search q=tags:xyz then xyz will not be found because you had sent it not be indexed.

If you do a default search, yes, it should search the copyfield, however, according to the Solr wiki

Any number of declarations can be included in your schema, to instruct Solr that you want it to duplicate any data it sees in the "source" field of documents that are added to the index

I think that having not added 'tags' to index would also cause the copyfield of 'tags' to not be indexed.

1
votes

I haven't tried using the copyField to append additional text to an existing field. I suppose Solr could concatenate it, or add it as a second value.

But here's a couple ideas to try:

  1. Experiment with a document where the text field is blank, perhaps not even mentioned as a under the structure. Does it seem to make a difference when tags make it into the main text whether text starts out as totally blank or not?

  2. Declare a second field, call it text2. And then ALSO copy tags into text2 via a second copyField directive. This text2 field won't have anything else in it, presumably not even mentioned in your fields, so for sure it should get the content.

In both cases you'd check results with the schema browser, as before. I'd be very curious to hear how you find out!