2
votes

We have monthly indexes (currently 11 months) with 22 shards per index. I am seeing what seems to be a lot of segments per index (roughly 1200 to 1380 segments). The older indexes should have very little, if any updates occurring on them. From everything I have read it sounds like ES should be automatically merging segments, but now I am a bit concerned that this is not occurring. I know we can manually run an optimize, but will need to allocate another resource to do that work (as to not impact current system). I am fairly new to ES (if that isn't obvious) and am really trying to understand if we have an issue or not. It could also be that we need to tweak index.merge.policy.segments_per_tier to be less than 10. Really not sure.

Rough index stats:

11 indexes 22 shards per index 65 million docs per index 350 GB per index

Any info, suggestions, etc. are greatly appreciated.

Thanks,

S

1
The best step you can perform now, especially that you have time-based indices, is to manually optimize the indices that are not written to. You will see improvements in performance for sure. The more segments there are, the more heap memory is being used. ES does automatically merge segments, but there are certain conditions that should apply for Lucene to merge segments (size of the segments, number of deleted docs in them, number of segments of almost the same size etc). There were issues in the past versions related to merging, but not sure if you are hitting one or not. - Andrei Stefan
Thanks for the info Andrei! We are currently on 1.3 (soon moving to 1.5). How impactful is running the optimize manually? If we stop all indexing will it roughly take minutes, hours, etc. to optimize an index similar in size to ours? Thanks again. - scarpacci
Do you index in older indices? - Andrei Stefan
Up to a certain point, then no. So the indexes that are older than 6 months should not be modified. - scarpacci
I don't think you should stop indexing altogether just to optimize indices. You could try to optimize a single index per day, when you believe the load on the cluster is not that high... - Andrei Stefan

1 Answers

2
votes

The best step you can perform now, especially that you have time-based indices, is to manually optimize the indices that are not written to. You will see improvements in performance for sure. The more segments there are, the more heap memory is being used.

ES does automatically merge segments, but there are certain conditions that should apply for Lucene to merge segments (size of the segments, number of deleted docs in them, number of segments of almost the same size etc). There were issues in the past versions related to merging, but not sure if you are hitting one or not.

You could try to optimize a single index per day, when you believe the load on the cluster is not that high. You are probably aware of Curator which can be used for this and other operations.