3
votes

I am implementing Solr for a free text search for a project where the records available to be searched will need to be added and deleted on a large scale every day.

Because of the scale I need to make sure that the size of the index is appropriate.

On my test installation of Solr, I index a set of 10 documents. Then I make a change in one of the document and want to replace the document with the same ID in the index. This works correctly and behaves as expected when I search.

I am using this code to update the document:

getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();

What I noticed though is that when I look at the stats page for the Solr server that the figures are not what I expect.

After the initial index, numDocs and maxDocs both equal 10 as expected. When I update the document however, numDocs is still equal to 10 (expected) but maxDocs equals 11 (unexpected).

When reading the documentation I see that

maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index.

So the question is, how do I remove logically deleted documents from the index?

If these documents still exist in the index do I run the risk of performance penalties when this is run with a very large volume of documents?

Thanks :)

1

1 Answers

6
votes