1
votes

I am migrating a Cloudant database without partitions to the new partition system of Cloudant to reduce the cost in my ibm cloud account. The context can be summarized like so :

  • I am dealing with emails object which have a category name
  • I might receive more dans 100 new entries (emails) per day
  • The UI can query the emails from date A to date B and also on categories C1, C2, ... C100 in any combination possible of categories.
  • The UI displays only 15 emails/page

The question is about the partitioning of such a data model and avoid as much as possible global queries (cross partitions) which are way more costly than partition based queries.

I thought first I would go for a partitioning per day but eventually I can end up with one situation where the query filters emails on a specific category Cn on 4 months but the specific category receives only 1 email per day which means that to display one page on the UI (of 15 emails) I should do 15 queries which is not acceptable.

Before the partitioning arrival, I was just doing global queries with the Lucene query engine but that is not anymore because of the cost.

Also, I also considered putting all the emails in a single partition so that I would be able to use the same old query within that partition and since it is a partition, I would not hit the global query price but the partition query price. That theoretically work but might have some limits I guess since the documentation about partitions recommends not to put "too many data" in a single partition.

Do you by any mean have any recommandation for this kind of problem ?

Thanks.

1

1 Answers

1
votes

Given your design, it doesn't seem to me like there is a partition key that will allow you to avoid global queries completely. As a rule of thumb, pick a partition key that would allow you to retrieve all data that make up a logical grouping. For example, imagine an order system where you have a set of customers with associated orders -- the obvious partition key would be a unique customer id: you then have a logical grouping of all data associated with each customer.

Over on the Cloudant blog, there is a good article series on partitions:

https://blog.cloudant.com/2019/03/05/Partition-Databases-Data-Design.html