Optimizing AWS RedShift query

Question

I am storing user details in a longitudinal table with each attribute of a user corresponding to a separate row. There can obviously be multiple rows per user since the data is longitudinal.

I am trying to find all user details of a particular user based on the mobile number provided. I am using the following query:

select ws.*
from (select * from user_details) as vs
inner join 
  (select distinct ms.user_id 
   from (select <given mobile no> as Phone) as ls
   inner join (select * from user_details) as ms on ls.Phone = ms.value
  ) as ws
on ws.user_id = vs.user_id

'key' column corresponds to the user attribute and 'value' corresponds to the value of that user attribute.

The sample table is:

|---------|--------|------------|---------------------|
| user_id |   key  |   value    |       timestamp     |
|---------|--------|------------|---------------------|
|  100    | mobile | 765783xxxx | 2018-09-09 13:40:00 |
|  100    |  email | abc@te.com | 2018-09-09 13:41:00 |
|  100    |  name  | johnny doe | 2018-09-09 13:42:00 |
|  101    | mobile | 456898xxxx | 2018-09-09 13:43:00 |
|  101    | email  | hi@som.org | 2018-09-09 13:44:00 |
|  101    |  name  | janey doe  | 2018-09-09 13:45:00 |
|---------|--------|------------|---------------------|

I am first finding user_id based on given mobile number and joining it with the value column and then I want to find all rows corresponding to that user_id.

I have set up DISTKEY on user_id column and SORTKEY on timestamp column.

This table has close to 2 billion rows.

The cluster details are:

Cluster Properties:

Cluster Type: Single Node
Node Type: dc1.large

Capacity Details:

Current Node Type: dc1.large
CPU: 7 EC2 Compute Units (2 virtual cores) per node
Memory: 15GiB per node
Platform: 64-bit

The query with this configuration is currently taking 160 seconds to execute.

Appreciate any help in optimizing this query and reducing the run time.

I don't think that query will even run. Please post your actual query, and format it as code by adding four or more spaces to each line. — Tim Biegeleisen
Why that's too many nesting? Plan always goes bad when too many nesting work.. — dwir182
your query looks a little convoluted, please can you explain your requirements clearly. it will help if you provide example input and expected output — Jon Scott

John Rotenstein John Rotenstein · Accepted Answer · 2018-11-22T06:20:13

It is very difficult to suggest improvements on a query without understanding the contents of the table nor what you are trying to achieve.

It is quite possible that it could be as simple as:

SELECT DISTINCT user_id
FROM user_details
WHERE value = <given mobile no>

The other part of optimizing a query in Amazon Redshift is to wisely use DISTKEY and SORTKEY. Once again, it is difficult to recommend suitable values without knowing the data and how it is used, but this query would benefit from value being the SORTKEY. (But that doesn't necessarily mean it is best choice for all usage of the table.)

Optimizing AWS RedShift query

1 Answers