1
votes

We've got a windows azure table storage system going on where we have various entity types that report values during the day so we've got the following partition and row key scenario:

There are about 4000 - 5000 entities. There are 6 entity types and the types are roughly evenly distributed. so around 800'ish each.

ParitionKey: entityType-Date

Row key: entityId

Each row records the values for an entity for that particular day. This is currently JSON serialized.

The data is quite verbose.

We will periodically want to look back at the values in these partitions over a month or two months depending on what our website users want to look at.

We are having a problem in that if we want to query a month of data for one entity we find that we have to query 31 partition keys by entityId.

This is very slow initially but after the first call the result is cached.

Unfortunately the nature of the site is that there will be a varying number of different queries so it's unlikely the data will benefit much from caching.

We could obviously make the partitions bigger i.e. perhaps a whole week of data and expand the rowKeys to entityId and date.

What other options are open to me, or is simply the case that Windows Azure tables suffer fairly high latency?

1
Not sure if this would work in your scenario but the way we are handling it in our application is that we are storing the data twice. 2nd copy of the data has PartitionKey and RowKey values reversed i.e. RowKey value becomes PartitionKey and vice versa. This way if we want to search for a particular EntityId, we are directly going to that Partition and searching there.Gaurav Mantri

1 Answers

2
votes

Some options include

  1. Make the 31 queries in parallel

  2. Make a single query on a partition key range, that is

    Partition key >= entityType-StartDate and Partition key <= entityType-EndDate and Row key = entityId.

It is possible that depending on your data, this query may have less latency than your current query.