0
votes

I am storing millions of documents in cosmos db with proper partitionkey. I need to retrieve say 500,000 documents to do some calculations and display the output in UI , this should happen with in say 10 second. Would this be possible? I have tried this but taking nearly a minute. So for this kind of requirement is this the correct approach?

"id": "Latest_100_Sku1_1496188800",
    "PartitionKey": "Latest_100_Sku1
    "SnapshotType": 2,
    "AccountCode": "100",
    "SkuCode": "Sku1",
    "Date": "2017-05-31T00:00:00",
    "DateEpoch": 1496188800,
    "Body": "rVNBa4MwFP4v72xHElxbvYkbo4dBwXaX0UOw6ZRFIyaBFfG/7zlT0EkPrYUcku+9fO/7kvca"

Size of one document : 825 byte

Am using autoscale 4000 Throughput

Query statistics - am using 2 queries.

Query 1 - select * from c where c.id in ({ids}) here i use PartitionKey in Query options.

Query Statistics METRIC VALUE Request Charge 102.11 RUs Showing Results 1 - 100 Retrieved document count More information 200 Retrieved document size More information 221672 bytes Output document count More information 200 Output document size More information 221972 bytes Index hit document count More information 200 Index lookup time More information 17.0499 ms Document load time More information 1.59 ms Query engine execution time More information 0.3401 ms System function execution time More information 0.060000000000000005 ms User defined function execution time More information 0 ms Document write time More information 0.16 ms Round Trips 1

Query 2 -- select * from c where c.PartitionKey in ({keys}) and c.DateEpoch>={startDate.ToEpoch()} and c.DateEpoch<={endDate.ToEpoch()}

Query Statistics METRIC VALUE Request Charge 226.32 RUs Showing Results 1 - 100 Retrieved document count More information 200 Retrieved document size More information 176580 bytes Output document count More information 200 Output document size More information 176880 bytes Index hit document count More information 200 Index lookup time More information 88.31 ms Document load time More information 4.2399000000000004 ms Query engine execution time More information 0.4701 ms System function execution time More information 0.060000000000000005 ms User defined function execution time More information 0 ms Document write time More information 0.19 ms Round Trips 1

1
There is not enough information necessary to answer this question. Please include a sample document in your container, the query you are trying to run, the partition key for your container, the amount of RU/s provisioned and the output from the Data Explorer showing the query stats.Mark Brown
I have edited the details with all the things you asked.Cij

1 Answers

0
votes

Query #1 looks fine. Query #2 most likely would benefit from a composite index on DateEpoch. I'm not sure what the UDF is but if you're converting dates to epoch you want to read a new blog post New date and time system functions in Azure Cosmos DB

Overall, retrieving 500K documents in 1-2 queries to do some calculations seems like a strange use case. Typically most people will pre-calculate values and persist them using a materialized view pattern using change feed. Depending on how often you run these two queries, this is often a more efficient use of compute resources.