1
votes

Problem: In database there are 3 collections in each collection containing 10,00,000 documents and number of attributes in each document is 150 and in each collection "ID" Attribute is unique hash index.

Now i want to merge all collections and return document where "ID" is matches in all collection

For Example : collection1 is containing document {"id":1,"firstname":"alex","gender":"male"} collection2 is containing document {"id":1,"middlename":"wilson","age":23} collection3 is containing document {"id":1,"lastname","alive":true}

so i want to return document like {"id":1,"firstname":"alex","gender":"male","middlename":"wilson","age":23,"lastname","alive":true}

Query is working fine and giving me expected result but taking too much time in large collection so i would like to know any other method to write query or any optimization is required in my aql

AQL query :

Query String (219 chars, cacheable: true):

   FOR c1 IN coll1 
     sort c1.id ASC
         FOR c2 IN coll2
           FOR c3 IN coll3
              FILTER ((c1.id == c2.id ) && (c2.id == c3.id))
              limit 10000
     RETURN MERGE(c1, c2,c3)

AQL explain :

Execution plan:
 Id   NodeType             Est.   Comment
  1   SingletonNode           1   * ROOT
 15   IndexNode         1000000     - FOR c1 IN coll1   /* hash index scan */
 13   IndexNode         1000000       - FOR c2 IN coll2   /* hash index scan */
 12   IndexNode         1000000         - FOR c3 IN coll3   /* hash index scan */
  9   LimitNode           10000           - LIMIT 0, 10000
 10   CalculationNode     10000           - LET #7 = MERGE(c1, c2, c3)   /* simple expression */   /* collections used: c1 : coll1, c2 : coll2, c3 : coll3 */
 11   ReturnNode          10000           - RETURN #7

Indexes used:
 By   Name                      Type   Collection   Unique   Sparse   Selectivity   Fields              Ranges
 15   idx_1650897367446061056   hash   coll1        true     false       100.00 %   [ `id` ]   *
 13   idx_1650897883340210176   hash   coll2        true     false       100.00 %   [ `id` ]   (c1.`id` == c2.`id`)
 12   idx_1650895606437117952   hash   coll3        true     false       100.00 %   [ `id` ]   (c2.`id` == c3.`id`)

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 MERGE   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   use-index-for-sort
  4   remove-unnecessary-calculations-2
1
Your question lacks the execution plan (or better query profiling) as well as a description of the dataset.CodeManX

1 Answers

0
votes
FOR c1 IN coll1 
 limit 10000
     sort c1.id ASC
         FOR c2 IN coll2
           FOR c3 IN coll3
              FILTER ((c1.id == c2.id ) && (c2.id == c3.id))
     RETURN MERGE(c1, c2,c3)

Query Execution time taken is less using this query