I have a query that runs well in single-instance setup. However, when I tried to run it on a sharded cluster, the performance dropped (4x longer execution time).
The query plan shows that practically all processing is done on Coordinator node, not on DbServer. How to push the query to be executed at DbServer?
To give a bit of a context: I have a collection of ~120k (will grow to several millions) of multi-level JSON documents with nested arrays. And the query needs to unnest these arrays before getting to the proper node.
AQL Query:
for doc IN doccollection
for arrayLevel1Elem in doc.report.container.children.container
for arrayLevel2Elem in arrayLevel1Elem.children.container.children.num
for arrayLevel3Elem in arrayLevel2Elem.children.code
filter doc.report.container.concept.simpleCodedValue == 'A'
filter arrayLevel1Elem.concept.codedValue == "B"
filter arrayLevel2Elem.concept.simpleCodedValue == "C"
filter arrayLevel3Elem.concept.simpleCodedValue == 'X'
filter arrayLevel3Elem.value.simpleCodedValue == 'Y'
collect studyUid = doc.report.study.uid, personId = doc.report.person.id, metricName = arrayLevel2Elem.concept.meaning, value = to_number(arrayLevel2Elem.value)
return {studyUid, personId, metricName, value}
Query Plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
2 EnumerateCollectionNode DBS 121027 - FOR doc IN doccollection /* full collection scan, projections: `report`, 2 shard(s) */ FILTER (doc.`report`.`container`.`concept`.`simpleCodedValue` == "A") /* early pruning */
3 CalculationNode DBS 121027 - LET #8 = doc.`report`.`container`.`children`.`container` /* attribute expression */ /* collections used: doc : doccollection */
19 CalculationNode DBS 121027 - LET #24 = doc.`report`.`study`.`uid` /* attribute expression */ /* collections used: doc : doccollection */
20 CalculationNode DBS 121027 - LET #26 = doc.`report`.`person`.`id` /* attribute expression */ /* collections used: doc : doccollection */
29 RemoteNode COOR 121027 - REMOTE
30 GatherNode COOR 121027 - GATHER /* parallel, unsorted */
4 EnumerateListNode COOR 12102700 - FOR arrayLevel1Elem IN #8 /* list iteration */
11 CalculationNode COOR 12102700 - LET #16 = (arrayLevel1Elem.`concept`.`codedValue` == "B") /* simple expression */
12 FilterNode COOR 12102700 - FILTER #16
5 CalculationNode COOR 12102700 - LET #10 = arrayLevel1Elem.`children`.`container`.`children`.`num` /* attribute expression */
6 EnumerateListNode COOR 1210270000 - FOR arrayLevel2Elem IN #10 /* list iteration */
13 CalculationNode COOR 1210270000 - LET #18 = (arrayLevel2Elem.`concept`.`simpleCodedValue` == "C") /* simple expression */
14 FilterNode COOR 1210270000 - FILTER #18
7 CalculationNode COOR 1210270000 - LET #12 = arrayLevel2Elem.`children`.`code` /* attribute expression */
21 CalculationNode COOR 1210270000 - LET #28 = arrayLevel2Elem.`concept`.`meaning` /* attribute expression */
22 CalculationNode COOR 1210270000 - LET #30 = TO_NUMBER(arrayLevel2Elem.`value`) /* simple expression */
8 EnumerateListNode COOR 121027000000 - FOR arrayLevel3Elem IN #12 /* list iteration */
15 CalculationNode COOR 121027000000 - LET #20 = ((arrayLevel3Elem.`concept`.`simpleCodedValue` == "X") && (arrayLevel3Elem.`value`.`simpleCodedValue` == "Y")) /* simple expression */
16 FilterNode COOR 121027000000 - FILTER #20
23 CollectNode COOR 96821600000 - COLLECT studyUid = #24, personId = #26, metricName = #28, value = #30 /* hash */
26 SortNode COOR 96821600000 - SORT studyUid ASC, personId ASC, metricName ASC, value ASC /* sorting strategy: standard */
24 CalculationNode COOR 96821600000 - LET #32 = { "studyUid" : studyUid, "personId" : personId, "metricName" : metricName, "value" : value } /* simple expression */
25 ReturnNode COOR 96821600000 - RETURN #32
Thanks a lot for any hint.