1
votes

I have about 200MB of data stored in a S3 document collection (the highest one). Each document is about 300KB in size.

But when I ran my query, I am surprised to see that it requires 7245.81 RUs. Because S3 is at 2500 RUs/Second, this performance won't scale for my application.

I just want to ask if I am doing anything wrong and if there is anything that I can do to improve it?

My query is like this:

SELECT item.Id,item.Priority, va.something, wa.something... 11 fields in total
FROM model.Item item 
JOIN va in item.Content.Children 
JOIN wa in va.Children 
WHERE item.State != 5

I am using "Range" as the index type. Would that impact the performance of this query?

1
The docs say "index type of Range enables range queries (using >, <, >=, <=, !=)", but you might want to try WHERE item.State > 5 OR item.State < 5 Also try switching it to a hash index. For a single value, I would think that a hash index would be as efficient or more than a range index. - Larry Maccherone

1 Answers

2
votes

Queries using != will require a scan, since all index entries have to be looked up to eliminate the values that are not equal to 5. This will lead to a high RUs. Whenever possible, please try rewriting the query as an equality or a range query.

  1. If the number of possible values of item.State is limited, e.g., 1-10, then you can enumerate the valid values using IN, i.e., item.State IN (1,2,3,4,6,7,8,9,10)
  2. If that's not possible, the please use ranges e.g., item.State > 5 OR item.State < 5 like suggested by Larry above.

Hope this helps. Please email me at arramac at microsoft dot com in case you'd like to discuss in more detail.