0
votes

Basic configurations:

  • cluster: two nodes as master eligible node and data node
  • es version: 7.2.0
  • jvm heap size: 31GB per node
  • node memory: 256GB per node
  • primary/replica shards: 1/1
  • disk: 2TB+ per node
  • deploy mode: docker
  • host system: debian 8
  • deploy configure:
    docker run -d --name esnode01 \
            -v /home/workspace/elasticsearch/data:/usr/share/elasticsearch/data \
            --ulimit memlock=-1:-1 \
            --ulimit nofile=65535:65535 \
            -p 9200:9200 \
            -p 9300:9300 \
            -e node.name=esnode01 \
            -e node.data=true \
            -e node.master=true \
            -e network.publish_host=$ip01 \
            -e discovery.seed_hosts=$ip02:9300 \
            -e cluster.initial_master_nodes=esnode01,esnode02 \
            -e cluster.name=es-docker-cluster \
            -e bootstrap.memory_lock=true \
            -e "ES_JAVA_OPTS=-Xms31g -Xmx31g" \
            -e http.cors.enabled=true \
            -e http.cors.allow-origin=* \
            --restart=always \
            docker.elastic.co/elasticsearch/elasticsearch:7.2.0
  • index mappings:
        {
          "mapping": {
            "properties": {
              "question": {
                "type": "text"
              },
              "update_time": {
                "type": "date"
              }
            }
          }
        }
  • index docs: 11k
  • ps:
    1. column question are chinese words
    2. enabled search slow log "index.search.slowlog.level": "debug", "index.search.slowlog.threshold.fetch.debug": "5ms", "index.search.slowlog.threshold.query.debug": "1ms"
    3. no high parrallel queries

Query pharse:

POST /index_name/_search
    {
        "from": 0,
        "size": 50,
        "query": {
            "match": {
                "question": text
            }
        }
    }

Problem: Most queries cost time < 20 ms, Occasionally, 'took' field equals from 100ms, 200ms, up to 900+ms..

Analysis: 1. kibana monitoring shows no high indexing rate, no long GC time, no high system load, no two many segments, even no high search and query latency; 2. search_slow_log shows no high took time queries, profile API shows very small query time_in_nano

Any advice are really appreciated

1
May I ask you if you do same queries, or they are different every time (does text change)? Is there any pattern in how took time changes - does it happen to similar queries? Also, do you use SSD or spinning disks?Nikolay Vasiliev
@Nikolay Vasiliev Thanks for reply.There were no common features or pattern among different query texts that took long time. I used HDD not ssd, but since docs size are very small, maybe all docs can loaded in memory, because there exists very small amount of disk read ops according to disk monitoring componentbin zhang
Do you also index new documents in background, or the index does not change (has always the same fixed set of documents)?Nikolay Vasiliev
@Nikolay Vasiliev Adding and deleting docs are running in background, will this process affect search time,or how will this process affect search time please.bin zhang
Adding, deleting or updating documents invalidates Elasticsearch's caches, it also relies a lot on filesystem cache. I will now post an answer with some tips how to tune for search speed.Nikolay Vasiliev

1 Answers

0
votes

There's a great article on how to tune for search speed.

In your case I would recommend to consider the following. First, using SSD disk if possible.

Second, to try to set index.refresh_interval to, say, 10 seconds or more. This might help in case when you do indexing and deleting documents in background. With larger refresh interval it will invalidate filesystem cache for the index segments that were updated less often.

I would also recommend to take a look also at general recommendations.

Hope that helps!