We're using Lucene 4.7 to build and query a rather large data set (110+ millions documents).
One of the document field, which we used for faceting, is defined as follow:
<field name="topic_paths"
type="string"
indexed="false"
stored="false"
docValues="true"
multiValued="true"
termVectors="false"
termPositions="false"
termOffsets="false"/>
Whenever we include this field in queries, they become extremely slow: about 7 seconds per topic_path
value included in the search, so about 30 seconds for four topic_path
values (typical in our case).
Queries that don't use this field are very fast (15 ms).
Is this performance we should expect from Lucene with multi-valued fields used for faceting? Is there anything wrong or suboptimal with our field definition? Are there tricks we could use to speedup searches?
Details:
- Hardware: Xen VM, 8-core Xeon CPU E5-2670 v2 at 2.5 GHz, 64 GB RAM
- OS: Windows Server 2012 Standard
- JVM: started with -Xmx8000m (Lucene is using about 45% of that)
- Lucene queries are single-threaded
-Xmx8000m
to the JVM; Lucene seems to be using 45% of that. – François Beaune