2
votes

I have a SOLr instance where i index a large number of documents from my client so users can search them in a web application.

Because we have a large number of files and they need to search the recent ones only (90 days or so) we have a scheduled job that remove old documents from index.

The problem is, the disk space is increasing about 2Gb a day, even with the deletions.

Is this a normal behavior or should we do something more to keep index in a stable size?

We are using a Java application to add and remove files to the index.

1

1 Answers

7
votes

Deletions will only mark documents as deleted - they're still present in the index. Since removing them would require rewriting the index files, the actual removal is not performed before you issue an optimize command.

There's also an option to expungeDeletes when you issue a commit, but as far as I can see, it's better to issue an optimize outside of normal operating hours. If you remove documents nightly, you can issue the optimize after removal, or even more infrequent, such as every second or third day.

Optimizing requires the same amount in free disk space as the index takes up (since worst case is the whole index being written again).