We have huge data set in MarkLogic with documents spread across multiple collections. We are required to develop reports by searching document spread across these collections.
Sample data set:
Collection1 - Inventory [Contains 5 million Inventory documents]
URI: /inventory/inv1
<xml>
<INVENTORY>
<ItemName>10</ItemName>
<InventoryQuantity>100</InventoryQuantity>
.
.
</INVENTORY>
</xml>
Collection2 - Item[Contains 1 million Item documents]
URI: /item/item1
<xml>
<Item>
<ItemName>10</ItemName>
<ItemWmos>
<UnitPrice>895</UnitPrice>
.
.
<ItemWmos>
.
.
</Item>
</xml>
For each Inventory in the Inventory collection
Step 1 : Get "ItemName", its "InventoryQuantity" and for the same "ItemName" find "UnitPrice" from Item document in Item collection.
Step 2 : CurrentInventoryValue = InventoryQuantity * UnitPrice
Step 3 : TotalInventoryValue = TotalInventoryValue + CurrentInventoryValue;
Repeat;
I have achieved the above report requirement using XQuery below. However its taking lot of time to execute and showing timeout exception. I am not able to leverage Path Range Index here as I have to get "ItemName" and its "InventoryQuantity" together and search for the unit price in a different collection document having the same item name.
XQuery :
sum(for $doc in cts:search(doc(), cts:and-query((cts:collection-query("Inventory"))))
[(fn:string-length(//INVENTORY/ItemName/text()) > 0) and (fn:string-length(//INVENTORY/InventoryQuantity/text()) > 0)]
let $itemName := $doc//INVENTORY/ItemName/text()
let $inventoryQuantity := $doc//INVENTORY/InventoryQuantity/text()
return (
for $doc in cts:search(doc(), cts:and-query((cts:collection-query("Item"))))[//Item/ItemName/text()=$itemName
and (fn:string-length(//ItemWmos/UnitPrice/text()) > 0)]
return ($inventoryQuantity * $doc//ItemWmos/UnitPrice/text())
))
How do we accomplish such complex query requirement in MarkLogic in efficient way ?