I am using SolrJ + Solr in my project. The problem is that I faced unclear bottleneck regarding Solr/Jetty
Using jvisualvm I connected to JVM instance under which Solr launched and saw that 77% of time spent in method "org.eclipse.jetty.io.ByteArrayBuffer.readFrom()", stacktrace of one of threads is below:
"qtp64700533-36718" - Thread t@36718
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1040)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
So, it may looks OK that time spent on I/O, but:
- application, which doing query launched on local machine (so I/O time should not be big, and thread state "RUNNABLE" in above stacktrace seems suspicious)
- query response times may have up to 5-10 seconds
- Load average on machine (CentOS) is about 10
Any help/advices appreciated, thanks!
UPD:
Indeed, guys, I forgot to give addtional info. Here it is:
hardware: i3770, 32gb ram, according to iotop it shows 50-600kb/sec read, 200-1000kb/sec write (almost most relates to SOLR process)
OS: Centos 6.6
java: OpenJDK 64-Bit Server VM (1.7.0_71 24.65-b04)
solr: 4.9.0 (launched with -Xmx=24000, but I think should split SOLR cores to separare JVM SOLR instances to minimize GC time)
solrj: 4.10.3, adding/updating/removing documents done with commitWithIn=10000 msec in java code.
about schemas: I am storing in SOLR data (ads + objects) regarding 5 countries: UA, RU, PL, BY, KZ. So, there are 2 cores for each country, for example for Ukraine: ua_ads and ua_objects (10 cores in total) Schemas between countries almost indentical, see below for Ukraine
"ua_ads" schema (should rename it from "example" though :) )
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>adId</uniqueKey>
<field name="adId" type="long" indexed="true" stored="true" required="true"/>
<field name="objectId" type="long" indexed="true" stored="true" required="false"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="false" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="false" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="address" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="priceUsd" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="false" stored="true" required="false"/>
<field name="area" type="int" indexed="false" stored="true" required="false"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
</schema>
"ua_objects" schema
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="addr_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- no stemming for address, dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- replacing all except letters, removing "-" in home address (9-А) -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/cities_ukr2rus.txt"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<!-- 1-length is for case with home letters: "Хрещатик, 3" -->
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt,lang/stopwords_addr.txt" format="snowball"/>
</analyzer>
</fieldType>
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/synonyms.txt"/>
<filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
</analyzer>
</fieldType>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>objectId</uniqueKey>
<field name="objectId" type="long" indexed="true" stored="true" required="true"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="true" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="true" stored="true" required="true"/>
<field name="address" type="addr_ru" indexed="true" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="ownerDetected" type="boolean" indexed="true" stored="true" required="true"/>
<field name="priceUsd" type="long" indexed="true" stored="true" required="false"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="true" stored="true" required="false"/>
<field name="area" type="int" indexed="true" stored="true" required="false"/>
<field name="dateUpdated" type="tdate" indexed="true" stored="true" required="true"/>
<field name="dateClosed" type="tdate" indexed="true" stored="true" required="false"/>
<field name="m2priceRel" type="float" indexed="true" stored="true" required="false"/>
<field name="ceddData" type="binary" indexed="false" stored="true" required="false" multiValued="true"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
<field name="uniqAdTexts" type="string" indexed="false" stored="true" required="true" multiValued="true"/>
</schema>
biggest indexes:
ru_ads: 2.99gb
ru_objects: 3.25gb
ua_ads: 5.45gb
ua_objects: 2.36gb
other cores indexes relatively small
queries which runs too long ("too long" from client-side) looks like this one (took from SOLR log, "????" is just non-english letters)
400723188 [qtp64700533-40547] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+????????\+???????\+????????)+AND+type:3+AND+regionId:2+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[2+TO+2])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+60])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[23500+TO+70500])+AND+dateUpdated:[2014-12-09T10:23:07Z+TO+2015-01-28T10:23:07Z]+AND+-objectId:(27824841)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=18 status=0 QTime=287
401989528 [qtp64700533-40830] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(?????????????\+??????)+AND+type:4+AND+regionId:162+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+58])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9+TO+27])+AND+dateUpdated:[2014-12-09T10:44:08Z+TO+2015-01-28T10:44:08Z]+AND+-objectId:(26415616)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=820 status=0 QTime=5755
400832723 [qtp64700533-40322] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(????????\+???????)+AND+type:4+AND+regionId:102+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[31+TO+45])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[115+TO+343])+AND+dateUpdated:[2014-12-09T10:24:57Z+TO+2015-01-28T10:24:57Z]+AND+-objectId:(26415342)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=9 status=0 QTime=372
402069370 [qtp64700533-40832] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=1&fl=*&start=0&q=(????????\+?????????\+??\+????????)+AND+type:3+AND+regionId:135+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[28+TO+40])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9529+TO+28585])+AND+dateUpdated:[2014-10-30T10:45:33Z+TO+2015-01-28T10:45:33Z]+AND+-objectId:(26415855)&qf=address^20+title^2+text&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=14075 status=0 QTime=544
401805198 [qtp64700533-40233] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+??\+??????\+?????\+??????????)+AND+type:3+AND+regionId:16+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[3+TO+3])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[93+TO+95])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[284050+TO+313950])+AND+dateUpdated:[2015-01-08T10:41:09Z+TO+2015-01-28T10:41:09Z]+AND+-objectId:(27826334)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=6 status=0 QTime=462
here is fresh profiling screenshot from jvisualvm
part of "top" command, delay=10sec
rows=2147483647
. Then it is no wonder that queries may take some time. Solr does render output for 820 results in your second query – cheffe