1
votes

I need to prepare some data to connect to tableau, and I'm struggling because the size of the data is too much for tableau to handle, so I'm looking for ideas to code this efficiently in SQL.

Setup:

  • I have 2 million users
  • There are 30 different categories, and each user can fall into many. For example:
    • User 1 - Category A, B and C
    • User 2 - Category F
    • User 3 - Category A, B

What I want:

  • Select three categories and assign priority 1, priority 2 and priority 3
  • These selection is not static, so today I may choose A, B, C but tomorrow those categories can be D, G, A

So if I have:

  • Priority 1: A
  • Priority 2: B
  • Priority 3: C

  • I want the number of users who fall into category A
  • I want the number of users who fall into category B AND are not in category A
  • I want the number of users who fall into category C AND are not in category A or B

My original idea was to create a table with one row per user and one yes/no column per category, and then aggregate, but still the size of the final table is too huge for tableau to handle.

Any ideas?

Update: My idea is to prepare a table with aggregated numbers and a few thousand rows max, so that it can be processed with tableau

1
Can you give us more info for the size? Can you even load 2M users? Cause you may need to update your tablaeu configuration and enviroment before anything elseA_kat
My idea is to prepare a table with aggregated numbers and a few thousand rows max, so that it can be processed with tableauSapehi

1 Answers

0
votes

You can assign each of the 30 categories a unique placeholder 1 to 30. Each user will be thereafter assigned a binary number of 30digits based on the categories he is falling in. This binary number can then be converted into decimal number the greatest of which can be 2^31-1 i.e. 10 digit number which can be stored without exp format.

Whenever you will have to see the categories user falling in that can be done by applying reverse conversion i.e. decimal to binary and thereafter to string with padding zeros on left side. From this string you can search places of 1s at desired place.

I think you can try this methodology.