Context:
I have an AWS EC2 instance
- 8Gb RAM
- 8Gb of disk space
It runs Solr 5.1.0 with
- Java Heap of 2048Mb
-Xms2048m -Xmx2048m
Extra: (updated)
- Logs are generated on the server
- Imports happen in intervals of 10s (always delta)
- Importing from DB (
JdbcDataSource
) - I don't think I have any optimization strategy configured right now
- GC profiling? I don't know.
- How can I find out how large the fields are .. and what is large?
Situation:
The index on Solr has 200.000 documents and is queried not more than once per second. However, in about 10 days, the memory and disk space of the server reaches 90% - 95% of the available space.
When investigating the disk usage sudo du -sh /
it only returns a total of 2.3G
. Not nearly as much as what df -k
tells me (Use% -> 92%
).
I can, sort of, resolve the situation by restarting the Solr service.
What am i missing? How come Solr consumes all memory and disk space and how to prevent it?
Extra info for @TMBT
Sorry for the delay, but I’ve been monitoring the Solr production server for the last few days. You can see a roundup here: https://www.dropbox.com/s/x5diyanwszrpbav/screencapture-app-datadoghq-com-dash-162482-1468997479755.jpg?dl=0 The current state of Solr: https://www.dropbox.com/s/q16dc5t5ctl32od/Screenshot%202016-07-21%2010.29.13.png?dl=0 I restarted Solr at the beginning of the monitoring and now, 2 days later I see the disk space goes down at a rate of 1,5Gb per day. If you need more specifics, let me know.
- There are not so many deleted docs per day. We’re talking 50 - 250 per day max.
- The current logs directory of Solr:
ls -lh /var/solr/logs
->total 72M
- There is no master-slave setup
- The importer runs ever 10 seconds, but it imports no more than 10 - 20 docs each time. The large import of 3k-4k docs happens each night. There is not much action going on in Solr at that time.
- There are no large fields, the largest field can contain up to 255 chars.
With the monitoring in place I tested the most common queries. It does contain faceting (field, queries), sorting, grouping, … But I doesn’t really affect the various metrics of heap and gc count.