I'm using elasticsearch to index two types of objects -
Data details
Contract object ~ 60 properties (Object size - 120 bytes) Risk Item Object ~ 125 properties (Object size - 250 bytes)
Contract is parent of risk item (_parent)
I'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)
Index size is - 322 gb
Cluster details
11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is a load balancer node with node.data = false) 50 shards 1 replica
elasticsearch.yml
node.data: true
http.enabled: false
index.number_of_shards: 50
index.number_of_replicas: 1
index.translog.flush_threshold_ops: 10000
index.merge.policy.use_compound_files: false
indices.memory.index_buffer_size: 30%
index.refresh_interval: 30s
index.store.type: mmapfs
path.data: /data-xvdf,/data-xvdg
I'm starting the elasticsearch nodes with following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g
My problem is that I'm running following query on risk item type and it is taking about 10-15 seconds to return data, for 20 records.
I'm running this with a load of 50 concurrent users and a bulk index load of about 5000 risk items happening in parallel.
Query (With Join parent child)
http://:9200/contractindex/riskitem/_search*
{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}
Queries from One Table
Query1 (This query takes around 8 seconds.)
<!-- language: lang-json -->
{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}
**Query2** (This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)
<!-- language: lang-json -->
{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}
Can somebody please help me with how I can improve this query performance ?
Xms30g -Xmx30g
and not more ? – jackdbernier