I have the following Azure Storage Table.
PositionData table:
PartitionKey: ClientID + VehicleID
RowKey: GUID
Properties: ClientID, VehicleID, DriverID, Date, GPSPosition
Each vehicle will log up to 1,000,000 entities per year per client. Each client could have thousands of vehicles. So, I decided to partition by ClientID
+ VehicleID
so to have small, manageable partitions. When querying by ClientID
and VehicleID
, the operation performs quickly because we are narrowing the search down to one partition.
PROBLEM:
The problem here is that sometimes I need to query on only ClientID
and DriverID
. Because it's not possible to perform partial PartitionKey comparisons, every single partition will need to be scanned. This will kill performance.
I can't have a PartitionKey with all ClientID
, VehicleID
and DriverID
because queries will only ever query on VehicleID
OR DriverID
, never both.
SOLUTION 1:
I considered having a value stored elsewhere which represented a VehicleID and DriverID pair, and then having a ClientID + VehicleDriverPairID
PartitionKey, but that would result in hundreds of thousands of partitions and there will be much unioning of data between partitions in my code.
SOLUTION 2:
Have a partition for Client + VehicleID
and another partition for Client + DriverID
. This means that updating the table is twice as much work (two updates) but both queries will be fast. Also there will be redundant data.
Do any of these solutions sound viable? Other solutions?