BigTable: Storing IDs as Qualifiers?

Question

On GCP doc it says:

Because Cloud Bigtable tables are sparse, you can create as many column qualifiers as you need in each row. There is no space penalty for empty cells in a row. As a result, it often makes sense to treat column qualifiers as data. For example, if your table is storing user posts, you could use the unique identifier for each post as the column qualifier.

https://cloud.google.com/bigtable/docs/schema-design#column_families

Can anyone help me with an example? If I have 1M users and each posts 1000 posts, does it make sense to have 1B column qualifiers (1M * 1000)?

Thanks !

Igor Bernstein Igor Bernstein · Accepted Answer · 2019-02-27T18:26:02

There are a couple of constraints that are relevant here:

There is a hard limit of 256 MB per row
A row cannot be split across different nodes, which prevents parallelization

So you would want to avoid storing data from multiple users in a single row. So you wouldn't have 1B posts in a single row. However, having a 1M rows, each with a 1000 qualifiers should be fine. You can think of the column qualifiers as keys in a Hashmap. Unlike SQL or column families, the qualifiers in each row are completely unrelated to the qualifiers in a different row.

BigTable: Storing IDs as Qualifiers?

1 Answers