I want to experiment with using Azure table storage as a replacement for Azure SQL. Our application(s) are typically write-heavy (think event registration) and SQL has long been a performance bottleneck.
However, a typical entry may have duplicate restrictions on more than one set of columns. For example, a new entry cannot have a duplicate email address, but it may also not have a duplicate mailing address. I cannot use a combined key for this, since each must be checked for duplicates separately. In SQL, I typically create a unique index on email address, and then another unique index on the physical address columns. This serves my purpose from a data integrity perspective, but again, performance is the issue.
Were I to use Azure storage, I'd probably use the event ID + an hourly timestamp (UTC) as the partition key, and a unique ID for each user's partition key, e.g.:
Partition Key Row Key <columns...>
event101_201402130500 <user's unique ID> <user data...>
I thought to include the hourly timestamp in the partition key just to make aggregate reporting easier later. Each hour I could use a batch job to request all entities for the previous hour's partition key and perform whatever aggregates are necessary, storing the result in another table for further aggregation and reporting.
But how do I prevent duplicates on the other columns? I thought about using a hash of the user's email + mailing address, but then, if either one changes, uniqueness for each individual part is not enforced, since the hash would be unique.
I thought about using a separate table to track duplicates per column, something like:
Partition Key Row Key User Key
event101 email_<user's email> <user's unique ID>
event101 addr_<hash of NACSZ> <user's unique ID>
I would insert the duplicate entities within a single batch transaction, and if the transaction failed (because of a duplicate PK/RK), I would simply avoid inserting the user row. This should work, as long as the batch is <100 entities and <4MB (according to the ATS docs)
Overall, this seems like it would do the trick, but given my lack of experience with table storage in this manner, I want to be sure I'm not overlooking something or over-thinking the solution.
So my actual question here is: Is ATS really this simple and is my solution adequate, or should I consider other (as yet unknown to me) options?