0
votes

MarkLogic version - 9.0-6.2

In our data-hub-FINAL database, every entity has a property called "TransactionRequestDtTm", which means, literally every document in the database has this property.

For one specific collection, I have a requirement to fetch documents that have "TransactionRequestDtTm" greater than the input timestamp. I am thinking of using a range index on TransactionRequestDtTm property, but based on my understanding, MarkLogic would pull all documents that have TransactionRequestDtTm property into memory upon initialization. In my case, it means the entire database would be pulled into memory.

Please correct my understanding if the actual behavior is different. Is there a way to indicate that the range index is required only on a specific collection (may be use a different property name)? Please suggest!

1

1 Answers

1
votes

Without a range index, MarkLogic would indeed need to pull up every document to check the timestamp. With smart code it could probably do it in a streaming way, so it won't blow out all your memory, but it won't be fast either.

A range index is pre-loaded in memory, but doesn't hold the entire documents, just a reference to them combined with values for the range index that occur in each document. It is the fastest way to find matching documents, and it will prevent you from running out of memory, provided you are not trying to fetch all matching documents after the search.

You cannot tell MarkLogic to put a range index on only a subset of documents, but you usually don't need to. If you want an intersect to get returned, just ask for those results that match both criteria. MarkLogic can resolve matches from indexes very fast.

Use cts search to play with this, and make sure to fetch first 10 documents only. You will see it will be very fast.

HTH!