I have an installation of solr-4.8.1 under development. I have a script that runs the query id:* once a minute and extracts QTime and numFound. The result looks like follows:
2014-07-07:09:30:12 QTime=10 numFound="6095776"
2014-07-07:09:32:02 QTime=50022 numFound="6095776"
2014-07-07:09:33:02 QTime=30 numFound="6095776"
2014-07-07:09:34:02 QTime=19 numFound="6095776"
2014-07-07:09:35:02 QTime=10 numFound="6095776"
2014-07-07:09:36:52 QTime=50029 numFound="6095776"
2014-07-07:09:37:52 QTime=26 numFound="6095776"
2014-07-07:09:38:52 QTime=12 numFound="6095776"
2014-07-07:09:39:52 QTime=11 numFound="6095776"
2014-07-07:09:40:52 QTime=12 numFound="6095776"
As you can see, once in a while the QTime is >50 seconds.
Further observations:
- First I thought it might correlate with the autocommit interval of 5 Minutes, but the times between the long QTimes are irregular, not regularly every 5 minutes. Further, I am not feeding anything.
- Apart from the query once a minute (or rather, with a delta of one minute between queries), the machine does absolutely nothing. There is no query load.
- The behavior started already when we had far less documents indexed.
Any ideas where to look?
EDIT:
I took a closer look at the logs and find strange things. All 10 core (all on the same machine), report a QTime of 0 or 1, but one of them logs this 50 seconds after the other 9. (loglines cut for brevity)
10:53:55.452 [core000] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core000/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=610796 status=0 QTime=1
10:53:55.452 [core005] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core005/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=609562 status=0 QTime=0
10:53:55.453 [core002] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core002/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=610392 status=0 QTime=1
10:53:55.453 [core004] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core004/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=609186 status=0 QTime=0
10:53:55.452 [core001] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core001/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=608294 status=0 QTime=0
10:53:55.454 [core003] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core003/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=612155 status=0 QTime=1
10:53:55.455 [core006] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core006/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=608606 status=0 QTime=1
10:53:55.456 [core009] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core009/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=608956 status=0 QTime=1
10:53:55.458 [core008] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core008/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=608847 status=0 QTime=1
10:54:45.469 [core007] params={NOW=1404723235442&shard.url=http://x.x.x.x/solr/core007/&df=content&fl=id,score&start=0&q=id:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=0} hits=608982 status=0 QTime=1
10:54:45.470 [core000] params={start=0&q=id:*&wt=xml&rows=0} hits=6095776 status=0 QTime=50028
And it seems to always be exactly 50 seconds (plus some minimal overhead), as if there is some timeout involved, which is exactly 50s. This makes it hard to believe it is the garbage collector (as proposed in one answer and which I will nevertheless lookk after), in particular because it happens independent of the index content.
EDIT: I seem to have found a complete crazy pseudo-solution: trying to figure out what is going on I attached strace to the solr-process. Since then the 50s hangs did not appear anymore.