How do I choose a partition key in such a way that I can efficiently query all my documents for a given time period?
Background:
I'm building an analytics tool for a chat application using Azure CosmosDB. I have a separate container to store incoming and outgoing messages. A typical Message
document looks like this:
{
"version": "v1",
"partition_key": "user_id",
"timestamp": "2020-01-30 14:02:32.402+00:00",
"type": "incoming_message",
"message": "hi there",
"sender": "sender_id",
"receiver": "receiver_id",
}
I have considered the following options as my partition key:
- User id: With this approach, I can easily query all the messages by user. But, the time based filtering will have to be cross partition queries and the RU cost will be high, especially with thousands of documents in the container.
- A date specific value: According to this, I can use the date along with a random number as the partition key (Ex: 2018-08-09.1,2018-08-09.2 and so on). But, with this approach, I will have to pass hundreds of partition keys into an in clause in order to run a query for large time intervals (Ex: last 6 months).
Do you have any recommendations on selecting a better partition key in order to support single partition queries for filtering documents by time?