0
votes

We're using Lucene 4.7 to build and query a rather large data set (110+ millions documents).

One of the document field, which we used for faceting, is defined as follow:

<field name="topic_paths"
       type="string"
       indexed="false"
       stored="false"
       docValues="true"
       multiValued="true"
       termVectors="false"
       termPositions="false"
       termOffsets="false"/>

Whenever we include this field in queries, they become extremely slow: about 7 seconds per topic_path value included in the search, so about 30 seconds for four topic_path values (typical in our case).

Queries that don't use this field are very fast (15 ms).

Is this performance we should expect from Lucene with multi-valued fields used for faceting? Is there anything wrong or suboptimal with our field definition? Are there tricks we could use to speedup searches?

Details:

  • Hardware: Xen VM, 8-core Xeon CPU E5-2670 v2 at 2.5 GHz, 64 GB RAM
  • OS: Windows Server 2012 Standard
  • JVM: started with -Xmx8000m (Lucene is using about 45% of that)
  • Lucene queries are single-threaded
1
I guess the facets will use docValues so peformance depends on the os filecache efficiency. Do you see a lot IO when you extract facets? If you run the same query a second time is it faster? How much heap space do you give to java (Xmx)?nomoa
If I run the exact same query a second time, it is instant (thanks to the query cache I presume). I don't yet know how much disk traffic queries generate since for some reason I/O counters in Process Explorer are stuck to 0. However queries appear to be CPU-bound. Let me add that individual queries are running single-threaded. We pass -Xmx8000m to the JVM; Lucene seems to be using 45% of that.François Beaune
I'm guessing it's a matter of warming up the os filecache. Could you try to run a long MatchAllQuery with facets and see if all subsequent facet queries are faster? You can also check the virtual memory usage.nomoa

1 Answers

1
votes

Read this article, http://wiki.apache.org/solr/SchemaXml#Fields

You need to "index" you field for including it into search/faceting, otherwise Solr will skipping this field without any exception