1
votes

Learning about Azure Storage Tables.
For a table of people with addresses and other fields, what would be the best way to set up the PartitionKey and RowKey to optimize a query like this?

SELECT * FROM {table} 
WHERE
((FIRST_NAME LIKE '%{Name}%'
 OR LAST_NAME LIKE '%{Name}%'
 OR NICK_NAME LIKE '%{Name}%')
AND
 (CITY LIKE '%{Loc}%'
  OR STATE LIKE '{Loc}'
  OR ZIP LIKE '%{Loc}%'))

I am looking for a way to store a large amount of data and be able to query it quickly while keeping the cost as low as possible. I've been looking at the storage table and CosmosDB tables. Pricing wise, it looks like CosmosDB could get expensive for very large tables.

Regarding the query, could I put "City", "State", "Zip" as PartitionKey and "First_Name", "Last_Name", and "Nick_Name" as RowKey. Ex: " Los Angeles CA 90045" as partitionKey and "John Doe JDoe" as RowKey. Will I be able to do a PartitionKey search that contains Los Angeles and KeySearch for John quickly?

1
There would be no best way for these kind of queries. It is not optimized for this. Consider another database like a document database. Why do you want to use azure table storage for this as this is in essence a key/value pair based store.Peter Bons
@PeterBons - the OP tagged this as azure-cosmosdb, which implies they're using the Premium Table support of Cosmos DB, and not Azure Table Storage. If that's the case, every property would be indexed and performance wouldn't be an issue working with non pk/rk properties. Hopefully the OP can clarify.David Makogon
CChang: Assuming you're working with Cosmos DB's Table Storage API, the partition key is going to effectively be your partition key of the underlying Cosmos DB storage, so you'll need to decide what's best for your data distribution.David Makogon
Thanks for the replies. I am looking at a way to store a large amount of data and be able to query it quickly while keeping the cost as low as possible. I've been looking at the storage table and CosmosDB table api. Pricing wise, it looks like for very large tables, CosmosDB could get expensive.CChang
@CChang Any update? If you feel my answer is useful /helpful.Please mark it as an answer so that other folks could benefit from it.Brando Zhang

1 Answers

0
votes

As far as I know, Azure Storage Tables is aimed at high capacity on a single region (optional secondary read only region but no failover), indexing by PK/RK and storage-optimized pricing.

Azure Cosmos DB Tables aims for high throughput (Single-digit millisecond latency for reads and writes, backed with <10 ms latency reads and <15 ms latency writes at the 99th percentile, at any scale, anywhere in the world), global distribution (multiple failover), SLA-backed predictive performance with automatic indexing of each attribute/property and a pricing model focused on throughput.

Pricing:

Azure table:

LRS First 1 TB / Month $0.07 per GB

Next 49 TB (1 to 50 TB) / Month $0.065 per GB

Access Prices:We charge $0.00036 per 10,000 transactions for Tables. Any type of operation against the storage is counted as a transaction, including reads, writes and deletes.

Details see this article

Azure cosmos table api:

SSD Storage (per GB) $0.25 GB/mo

Reserved RUs/second (per 100 RUs, 400 RUs minimum) $0.008/hr

Details see this article

You could see the cosmosdb is more expensive then the table storage for data store.

So if you have a lot of data, I suggest you could choose table storage.

If you want to keep your data query fast, I suggest you could choose cosmosdb.

Regarding the query, could I put "City", "State", "Zip" as PartitionKey and "First_Name", "Last_Name", and "Nick_Name" as RowKey. Ex: " Los Angeles CA 90045" as partitionKey and "John Doe JDoe" as RowKey. Will I be able to do a PartitionKey search that contains Los Angeles and KeySearch for John quickly?

As far as I know, Specify both PartitionKey and RowKey in your queries.

Point queries such as these are the most efficient table service queries. About how to design the table, you could refer to this article.

So if you set the "City_State_Zip" as PartitionKey, "First_Name_Last_Name_Nick_Name" as rowkey, this will be point query. It will be most efficient table service query.

Notice: The table query doesn't support like keyword, if you want to use the point query, you need send the right partitionkey and rowkey to query.