I am storing user details in a longitudinal table with each attribute of a user corresponding to a separate row. There can obviously be multiple rows per user since the data is longitudinal.
I am trying to find all user details of a particular user based on the mobile number provided. I am using the following query:
select ws.*
from (select * from user_details) as vs
inner join
(select distinct ms.user_id
from (select <given mobile no> as Phone) as ls
inner join (select * from user_details) as ms on ls.Phone = ms.value
) as ws
on ws.user_id = vs.user_id
'key' column corresponds to the user attribute and 'value' corresponds to the value of that user attribute.
The sample table is:
|---------|--------|------------|---------------------|
| user_id | key | value | timestamp |
|---------|--------|------------|---------------------|
| 100 | mobile | 765783xxxx | 2018-09-09 13:40:00 |
| 100 | email | [email protected] | 2018-09-09 13:41:00 |
| 100 | name | johnny doe | 2018-09-09 13:42:00 |
| 101 | mobile | 456898xxxx | 2018-09-09 13:43:00 |
| 101 | email | [email protected] | 2018-09-09 13:44:00 |
| 101 | name | janey doe | 2018-09-09 13:45:00 |
|---------|--------|------------|---------------------|
I am first finding user_id based on given mobile number and joining it with the value column and then I want to find all rows corresponding to that user_id.
I have set up DISTKEY on user_id column and SORTKEY on timestamp column.
This table has close to 2 billion rows.
The cluster details are:
Cluster Properties:
- Cluster Type: Single Node
- Node Type: dc1.large
Capacity Details:
- Current Node Type: dc1.large
- CPU: 7 EC2 Compute Units (2 virtual cores) per node
- Memory: 15GiB per node
- Platform: 64-bit
The query with this configuration is currently taking 160 seconds to execute.
Appreciate any help in optimizing this query and reducing the run time.
DISTKEY
andSORTKEY
? – John Rotenstein