Fastest Query on RowKey

Question

All of our table entities' RowKey have their Kinds.
For example in the User table:

PK: yahoo.com  
RK: U_user1       ----------- the kind is 'U' it means User

PK: yahoo.com  
RK: U_user2  

PK: yahoo.com  
RK: U_user3  

PK: Store1  
RK: M_user4       ----------- the kind is 'M' it means Merchant  

PK: Store1  
RK: M_user5

PK: Store1  
RK: M_user6  

PK: Store2  
RK: M_user7

If I want to search all Users without knowing exactly the PartitionKey, I will do it like this:

In Azure Storage Explorer:

RowKey gt 'U_' and RowKey lt 'V_'

In Linq:

var list = from e in dao.Table()
   where string.Compare(e.RowKey, "U_") > 0 && string.Compare(e.RowKey, "V_") < 0
   select e;

My question now is, will it still be fast if records become larger? Or should I put the Kind in the PartitionKey? But doing this is will not be easy.

It says in this article that:
http://blog.maartenballiauw.be/post/2012/10/08/What-PartitionKey-and-RowKey-are-for-in-Windows-Azure-Table-Storage.aspx

Less fast: querying on only RowKey. Doing this will give table storage no pointer on  
which partition to search in, resulting in a query that possibly spans multiple partitions,  
possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still  
pretty fast as it’s a unique index.

EDIT

I just made some test over:

PK: M_Sample  
RK: GUID  
500 records

And

PK: Sample  
RK: U_GUID  
500 records

With these queries:

PartitionKey gt 'M_' and PartitionKey lt 'N_'      --- 26 seconds  
RowKey gt 'U_' and RowKey lt 'V_'               ----- 36 seconds

And it shows that, I must really use PartitionKey as the search Key.

Gaurav Mantri Gaurav Mantri · Accepted Answer · 2014-07-24T05:46:00

My question now is, will it still be fast if records become larger? Or should I put the Kind in the PartitionKey? But doing this is will not be easy.

No, because your query is dong full table scan. You must include PartitionKey in your queries for fastest performance.

Not sure if this would help but in our project, we are taking a different approach. So if I take your example above, we are storing two records per user (or in other words we are denormalizing the data):

PartitionKey = yahoo.com; RowKey = U_user1
PartitionKey = U_user1; RowKey = yahoo.com

Depending on how we want to query users, we pick one of the two criteria.

Fastest Query on RowKey

1 Answers