I need to prepare some data to connect to tableau, and I'm struggling because the size of the data is too much for tableau to handle, so I'm looking for ideas to code this efficiently in SQL.
Setup:
- I have 2 million users
- There are 30 different categories, and each user can fall into many. For example:
- User 1 - Category A, B and C
- User 2 - Category F
- User 3 - Category A, B
What I want:
- Select three categories and assign priority 1, priority 2 and priority 3
- These selection is not static, so today I may choose A, B, C but tomorrow those categories can be D, G, A
So if I have:
- Priority 1: A
- Priority 2: B
- Priority 3: C
- I want the number of users who fall into category A
- I want the number of users who fall into category B AND are not in category A
- I want the number of users who fall into category C AND are not in category A or B
My original idea was to create a table with one row per user and one yes/no column per category, and then aggregate, but still the size of the final table is too huge for tableau to handle.
Any ideas?
Update: My idea is to prepare a table with aggregated numbers and a few thousand rows max, so that it can be processed with tableau