2
votes

Say I have 2k partitions. I.E. 2k distinct partitions keys. All partitions have 3 guid rowkeys.

To illustrate:

Partition 1 - Guid 1(rowkey) - Guid 2(rowkey) - Guid 3(rowkey)

Partition 2 - Guid 4(rowkey) - Guid 5(rowkey) - Guid 6(rowkey)

.... etc etc.

If I were to do a query for an exact guid across all partitions. What sort of query performance would I be looking at? A direct retrieve or table scan?

More background info. I intend to have the following schema:

UserEntity
Partition Key - User Guid
Row Key - Username

OpenIdEntity
Partition Key - User Guid (Same as UserEntity)
Row Key - OpenId

Now, when a user logs in, I need to 1) find the open ID (select record with 1 distinct rowkey here, regardless of partition) 2) find username. (select record with 1 distinct partition key. table scan for a property or something. Since partition key is known and partition is small, the impact of table scan should be minimal)

My concern is step 1 being slow if the Azure Table Storage scans entire table to find 1 distinct rowkey.

Thanks in advance.

1

1 Answers

5
votes

Your concern is warranted. A query of the form "all entities with RowKey X" will result in a full table scan.

If you know the set of partition keys you're using, you could issue n parallel queries (one for each partition). E.g., "all entities with PartitionKey 1 and RowKey X," "all entities with PartitionKey 2 and RowKey X," etc. Issuing these in parallel would mean you're doing n direct lookups, which will generally be much faster than a table scan.