0
votes

I have a Solr 7.6.0 Lucene index (lots of .pdf's, .docx and .xlsx files)

The index was created using the post command in a command window, pointing to a directory share (mapped filepath) where the files exist.

There is also a web URL for the document which I have in a database and Lucene currently knows nothing about. I would like to 'enrich' the existing index with this URL data.

Can I extract the id of the currently indexed files and then use the Solr web interface to modify the existing index, injecting the URL?

I am looking at the following tutorial for advice: https://www.tutorialspoint.com/apache_solr/apache_solr_indexing_data.htm

The tutorial shows an example of adding a document but not modifying one.

1
You can modify a single field, but it'll require all your existing fields to be set as stored="true" - assuming you're using Solr. It'll also require running a Solr version that supports atomic updates. - MatsLindh
Thanks @MatLindh I am using Solr 7.6.0 (added to original question). Not all my fields are stored. text for example is only indexed. Will that be a problem? - GoodJuJu
_text_ is usually generated from a copyField statement. All the fields that aren't defined as copyField destinations must be stored, however - otherwise the value of that field is lost. - MatsLindh
I see. Thank you... I'll give it a go then and see what happens. - GoodJuJu

1 Answers

0
votes

Thanks @MatsLindh I managed to get it to work:

I used the Solr GUI to run the JSON add-field update:

{
    "add-field" : {
    "name":"URL",
    "type":"string",
    "stored":true
    "indexed":true
    }
}

I then inserted/set the property:

{"id":"S:\\Docs\\forIndexing\\indexThisFile_001.pdf",
 "URL":{"set":"https//localhost/urlToFiles/indexThisFile_001.pdf:"}
}