2
votes

What is the process to purge index when you've got some deleted documents (after a delete by query) in index ?

I'm asking this question because I'm working on a project based on solr and I've noticed a strange behavior and I would like to have some informations about it.

My system got those features :

  • My documents are indexed continuously (1000docs per second)

  • A purge is done every couple of second with this query :

    <delete><query>timestamp_utc:[ * TO NOW-10MINUTES ]</query></delete>
    

So I got 600000 documents everytime visible in my index : 10 Minutes * 60 = 600 seconds and speed = 1000docs/s so 600 * 1000 = 600000

But the size of my index increase with the time. And I know that when you do a delete by query the documents are affected by a "delete" label or something like that in the index.

I've seen and tried the attribute "expungeDeletes=true", but I didn't notice a considerable change on my index size.

Any informations about the index purge process would be appreciated.

Thanks.

Edit

I know that an optimize can to do this job but it's a long operation and I want to avoid that.

1
See previous related question - stackoverflow.com/questions/3053425/…Paige Cook
Unrelated to your qn: Do you really need Solr for this use case? If all you need is your doc IDs in the past 10 min, a technology like Redis may be better suited.arun
Yes, I need to you solr because it's for a internship work. But thanks you for this alternative, I would talk about that.Corentin
Thanks Paige for this advice. But optimizing is a very time consuming. Is there another way to speed up the purge of deleted docs. For exemple adjusting the merge Factor or the commit frequency ?Corentin

1 Answers

0
votes

You can create a new collection/core every 10 minutes, switch to it (plus the previous) and delete the oldest collection/core (later than 10 minutes).