2
votes

I am designing 2 tables.

user_table (userid username) 100K records expected

transaction_table(transactionid, date, description, userid) 5 billion records expected. Nearly 50K transactions per user is expected.

transaction_table stores transactions done by users. This table rarely have any UPDATE or DELETE operations. There are significant SELECT and INSERT operations, but all are on the basis of userid. So, either I am inserting a user's transaction into the table or I am returning the transactions made by a particular user.

Does partitioning transaction_table make sense?

This is because, there will be 100K partition tables, each for 1 user. Is this good in any sense?

What other options do I have, to make INSERT and SELECT operations fast on transaction_table?

1
What queries are you expecting on transaction_table? Partitioning by userid makes sense if you need to retrieve ALL transactions for a specific user. For queries like "100 latest transactions of a user" an index (userid,date) should suffice.Egor Rogov
I am expecting queries like -- (1). 'top X(say 100) transactions of a user'. (2) 'next X transactions of a user' (3) 'all transactions from 1-JAN-2015 to 1-MAY-2015 of a user' etcAni

1 Answers

0
votes

Postgresql does not recommend to create many partitions:

All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably. Partitioning using these techniques will work well with up to perhaps a hundred partitions; don't try to use many thousands of partitions.

On my opinion partitions created by date (monthly, for instance) could be much better.

Another option is to create partitions by userid%100 but not per every user.