1
votes

I'm in the process of transitioning to SolrCloud from a conventional Master-Slave model. I'm using Solr 4.4 and has set-up 2 shards with 1 replica each. I've 3 zookeeper ensemble. All the nodes are running on AWS EC2 instances. Shards are on m1.xlarge and sharing a zookeeper instance (mounted on a separate volume). 6 gb memory is allocated to each solr instance.

I've around 10 million documents in index. With the previous standalone model, the queries avg around 100 ms. The SolrCloud query response have been abysmal so far. The query response time is over 1000ms, reaching 2000ms often. I expected some surge due to additional servers, network latency, etc. but this difference is really baffling. The hardware is similar in both cases, except for the fact that couple of SolrCloud node is sharing zookeeper as well. m1x.large I/O is high, so shouldn't be a bottleneck as well.

The other difference from old setup is that I'm using the new CloudSolrServer class which is having the 3 zookeeper reference for load balancing. But I don't think it has any major impact as the queries executed from Solr admin query panel confirms the slowness.

Here are some of my configuration setup:

Commit frequency


<autoCommit> 
        <maxTime>30000</maxTime> 
        <openSearcher>false</openSearcher> 
</autoCommit> 

<autoSoftCommit> 
        <maxTime>1000</maxTime> 
</autoSoftCommit>

Boolean clause :

<maxBooleanClauses>1024</maxBooleanClauses> 
Cache setting:


<filterCache class="solr.FastLRUCache" size="16384" initialSize="4096" 
autowarmCount="4096"/> 

<queryResultCache class="solr.LRUCache" size="16384" initialSize="8192" 
autowarmCount="4096"/> 

<documentCache class="solr.LRUCache" size="32768" initialSize="16384" 
autowarmCount="0"/> 

<fieldValueCache class="solr.FastLRUCache" size="16384" 
autowarmCount="8192" showItems="4096" /> 

<enableLazyFieldLoading>true</enableLazyFieldLoading> 

<queryResultWindowSize>200</queryResultWindowSize> 

<queryResultMaxDocsCached>400</queryResultMaxDocsCached> 

Query Listener :


<listener event="newSearcher" class="solr.QuerySenderListener"> 
        <arr name="queries"> 
                <lst><str name="q">line</str></lst> 
                <lst><str name="q">xref</str></lst> 
                <lst><str name="q">draw</str></lst> 
        </arr> 
        </listener> 
                <listener event="firstSearcher" 
class="solr.QuerySenderListener"> 
                        <arr name="queries"> 
                                <lst><str name="q">line</str></lst> 
                                <lst><str name="q">draw</str></lst> 
                                <lst><str name="q">line</str><str 
name="fq">language:english</str></lst> 
                                <lst><str name="q">line</str><str 
name="fq">Source2:documentation</str></lst> 
                                <lst><str name="q">line</str><str 
name="fq">Source2:CloudHelp</str></lst> 
                                <lst><str name="q">draw</str><str 
name="fq">language:english</str></lst> 
                                <lst><str name="q">draw</str><str 
name="fq">Source2:documentation</str></lst> 
                                <lst><str name="q">draw</str><str 
name="fq">Source2:CloudHelp</str></lst> 
                        </arr> 
</listener> 

<maxWarmingSearchers>2</maxWarmingSearchers>

Request Handler:

code>
<requestHandler name="/cloudhelp" class="solr.SearchHandler"> 
                <lst name="defaults"> 
                        <str name="echoParams">explicit</str> 
                        <float name="tie">0.01</float> 
                        <str name="wt">velocity</str> 
                        <str name="v.template">browse</str> 
                        <str 
name="v.contentType">text/html;charset=UTF-8</str> 
                        <str name="v.layout">layout</str> 
                        <str name="v.channel">cloudhelp</str> 

                        <str name="defType">edismax</str> 
                        <str name="q.alt">*:*</str> 
                        <str name="rows">15</str> 
                        <str 
name="fl">id,url,Description,Source2,text,filetype,title,LastUpdateDate,PublishDate,ViewCount,TotalMessageCount,Solution,LastPostAuthor,Author,Duration,AuthorUrl,ThumbnailUrl,TopicId,score</str> 
                        <str name="qf">text^1.5 title^2 IndexTerm^.9 
keywords^1.2 ADSKCommandSrch^2 ADSKContextId^1</str> 
                        <str name="bq">Source2:CloudHelp^3 
Source2:youtube^0.85</str> 
                        <str 
name="bf">recip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0</str> 
                        <str name="df">text</str> 


                        <str name="facet">on</str> 
                        <str name="facet.mincount">1</str> 
                        <str name="facet.limit">100</str> 
                        <str name="facet.field">language</str> 
                        <str name="facet.field">Source2</str> 
                        <str name="facet.field">DocumentationBook</str> 
                        <str name="facet.field">ADSKProductDisplay</str> 
                        <str name="facet.field">audience</str> 


                        <str name="hl">true</str> 
                        <str name="hl.fl">text title</str> 
                        <str name="f.text.hl.fragsize">250</str> 
                        <str name="f.text.hl.alternateField">ShortDesc</str> 


                        <str name="spellcheck">true</str> 
                        <str name="spellcheck.dictionary">default</str> 
                        <str name="spellcheck.collate">true</str> 
                        <str name="spellcheck.onlyMorePopular">false</str> 
                        <str name="spellcheck.extendedResults">false</str> 
                        <str name="spellcheck.count">1</str> 
                </lst> 
                <arr name="last-components"> 
                        <str>spellcheck</str> 
                </arr> 
        </requestHandler> 

One thing I've noticed is that the queryresultcache hit rate is really low, not sure our queries are always that unique. I'm using edismax and there's a recip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0 , can this contribute ?

Sorry about the long post, but I'm struggling to nail down the issue here, especially when queries are running fine in a master-slave environment with similar hardware and network.

Any pointers will be highly appreciated.

  • Thanks
1
Is there any indexing running in background as well? What is version you used for Master/Slave? There shouldn't be a major difference if you use the same SOLR 4.4 without indexing in the background. It's the very same distributed search.lexk
@lexk Even without indexing, the performance is bad. For Master-Slave, I was using 4.2. That's what my understanding was when I picked the latest version, it had specific features and fixes targeting Solr Cloud. Moreover,the hardware spec is same for both environment, only difference being Solrcloud has 4 servers which there was only 1 slave server earlier.Shamik

1 Answers

1
votes

Thanks for note:

only difference being Solrcloud has 4 servers which there was only 1 slave server earlier.

By default SolrCloud distributes the request among active nodes and collates the result. My suggestion is to make use of Document and Query Routing among shard, which will provide optimal performance.