3
votes

I would like to use Solr atomic updates in combination with some stored copyField destination fields, which is not a recommended combination - so I wish to understand the risks.

The Solr documentation for Atomic Updates says (my emphasis):

The core functionality of atomically updating a document requires that all fields in your schema must be configured as stored (stored="true") or docValues (docValues="true") except for fields which are <copyField/> destinations, which must be configured as stored="false". Atomic updates are applied to the document represented by the existing stored field values. All data in copyField destinations fields must originate from ONLY copyField sources.

However, I have some copyField destinations that I would like to set stored=true so that highlighting works correctly for them (see this question, for example).

I need atomic updates so that an (unrelated) field can be modified by another process, without losing data indexed by my process.

The documentation warns that:

If destinations are configured as stored, then Solr will attempt to index both the current value of the field as well as an additional copy from any source fields. If such fields contain some information that comes from the indexing program and some information that comes from copyField, then the information which originally came from the indexing program will be lost when an atomic update is made.

But what does that mean? Can someone give an example that demonstrates this information-loss problem?

I am unsure what is meant by "some information that comes from the indexing program and some information that comes from copyField", in concrete terms.

Is it safe to make one copyField destination stored, whilst atomically updating other fields, or vice versa? I have tried this out via the Solr Admin console, and have not been able to demonstrate any issues, but would like to be clear on what circumstances would trigger the problem.

1

1 Answers

0
votes

It means that the copy field will have an additional value added from the source field effectively creating a multi-valued field in your copyField, which if it isn't defined as multi-valued then the field won't be of the right type and no further updates can be made to it, until you reindex everything. I'm currently struggling with this exact issue, because we need the values to come back as part of the response for the copyField, which means it needs to be stored, but by doing so breaks the structure of the document if we do an atomic update on a different field.