6
votes

I am a total newbie with Azure! The purpose is to return the rows based on the timestamp stored in the RowKey. As there is a transaction cost with each query, I want to minimize the number of transactions/queries whilst maintain performance

These are the proposed Partition and Row Keys:

  • Partition Key: TextCache_(AccountID)_(ParentMessageId)
  • Row Key: (DateOfMessage)_(MessageId)

Legend:

  • AccountId - is an integer
  • ParentMessageId - The parent messageId if there is one, blank if it is the parent
  • DateOfMessage - Date the message was created - format will be DateTime.Ticks.ToString("d19")
  • MessageId - the unique Id of the message

I would like to get back from a single query the rows and any childrows that is > or < DateOfMessage_MessageId

Can this be done via my proposed PartitionKeys and RowKeys?

ie.. (in psuedo code)

var results = ctx.PartitionKey.StartsWith(TextCache_AccountId) 
   && ctx.RowKey > (TimeStamp)_MessageId

Secondly, if there I have a number of accounts, and only want to return back the first 10, could it be done via a single query

ie.. (in psuedo code)

var results = ( 
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId1)) && 
            && ctx.RowKey > (TimeStamp1)_MessageId1 )
      )
      ||
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId2)) && 
            && ctx.RowKey > (TimeStamp2)_MessageId2 )
      ) ... 
          )
         .Take(10)
1

1 Answers

5
votes

The short answer to your questions is yes, but there are some things you need to watch for.

Azure table storage doesn't have a direct equivalent of .StartsWith(). If you're using the storage library in combination with LINQ you can use .CompareTo() (> and < don't translate properly) which will mean that if you run a search for account 1 and you ask the query to return 1000 results, but there are only 600 results for account 1, the last 400 results will be for account 10 (the next account number lexically). So you'll need to be a bit smart about how you deal with your results.

If you padded out the account id with leading 0s you could do something like this (pseudo code here as well)

ctx.PartionKey > "TextCache_0000000001"
&& ctx.PartitionKey < "TextCache_0000000002"
&& ctx.RowKey > "123465798"

Something else to bear in mind is that queries to Azure Tables return their results in PartitionKey then RowKey order. So in your case messages without a ParentMessageId will be returned before messages with a ParentMessageId. If you're never going to query this table by ParentMessageId I'd move this to a property.

If TextCache_ is just a string constant, it's not adding anything by being included in the PartitionKey unless this will actually mean something to your code when it's returned.

While you're second query will run, I don't think it will produce what you're after. If you want the first ten rows in DateOfMessage order, then it won't work (see my point above about sort orders). If you ran this query as it is and account 1 had 11 messages it will return only the first 10 messages related to account 1 regardless if whether account 2 had an earlier message.

While trying to minimise the number of transactions you use is good practice, don't be too concerned about it. The cost of running your worker/web roles will dwarf your transaction costs. 1,000,000 transactions will cost you $1 which is less than the cost of running one small instance for 9 hours.