Azure Tables - Partition Key and Row Key - Correct Choice

Question

I am new to Azure tables and having read a lot of articles but would like some reassurance on the above given its fundamental.

I have data which is similar to this:

CustomerId, GUID
TripId, GUID
JourneyStep, GUID
Time, DataTime
AverageSpeed, int

Based on what I have read, is CustomerId a good PartitionKey? Where I become stuck is the combination of CustomerId and TripId that does not make a unique row. My justification for TripId as the Row Key is because every query will be a dataset based on CustomerId and TripId.

Just for context, the CustomerId is clearly unique, the TripId represents one journey in a vehicle and within that journey the JourneyStep represents a unit within that Trip which may be 10 steps or 1000.

The intention is aggregate the data into further tables with each level being used for a different purpose. At the most aggregated level, the customer will be given some scores.

The amount of data will obviously be huge so need to think about query performance from the outset.

Updated:

As requested, the solution is for Vehicle Telematics so think of yourself in your own car. Blackbox shipping data to an server which in turn passes it to Azure Tables. In Relational DB terms, I would have a Customer Table and a trip table with a foreign key back to the customer table.

The tripId is auto generated by the blackbox. TripId does not need stored by date time from a query point of view, however may be relevant from a query performance point of view.

Queries will be split into two:

Display a map of a single journey for each customer, so filter by customer and then Trip to then iterate each row (journeystep) to a map.
Per customer, I will score each trip and then retrieve trips for, let's say, the last month to aggregate a score. I do have SQL Database to enrich data with client records etc but for the volume data (the trip data) I wish to use Azure Tables.

The aggregates from the second query will probably be stored in a separate table, so if someone made 10 trips in one month, I would run the second query which would score each trip, then produce a score for all trips that month and store both answers so potentially a table of trip aggregates and a table of monthly aggregates.

Gaurav, No - one customer will have multiple trips. The only unique combination would be CustomerId and Time but from a query point of view would rarely be used. 'the combination of CustomerId and TripId does not make a unique row' — Steve Newton
Then unfortunately you can't use TripId as RowKey. Within a Partition, RowKey has to be unique. — Gaurav Mantri
Ah, that is what I feared. If I use Time for the RowID which is unique, what is the best approach for query performance given every query will be including TripID? I could append something to the TripId to make a unique Row Key but I would want to split them out in very query. — Steve Newton

cilerler cilerler · Accepted Answer · 2013-12-26T13:59:37

Your design has to be related to your query. You can filter your data based on 2 columns PartitionKey and RowKey. PartitionKey is your most important column since your queries will hit that column first.

In your case CustomerId should be your PartitionKey since most of the time you will try to reach your data based on the customer. (you may also need to keep another table for your client list)

Now, RowKey can be your tripId or time. if I were you I probably use rowKey as yyyyMMddHHmm|tripId format which will let you to query based on startWith and endWidth options.

Azure Tables - Partition Key and Row Key - Correct Choice

5 Answers