0
votes

I am storing user details in a longitudinal table with each attribute of a user corresponding to a separate row. There can obviously be multiple rows per user since the data is longitudinal.

I am trying to find all user details of a particular user based on the mobile number provided. I am using the following query:

select ws.*
from (select * from user_details) as vs
inner join 
  (select distinct ms.user_id 
   from (select <given mobile no> as Phone) as ls
   inner join (select * from user_details) as ms on ls.Phone = ms.value
  ) as ws
on ws.user_id = vs.user_id

'key' column corresponds to the user attribute and 'value' corresponds to the value of that user attribute.

The sample table is:

|---------|--------|------------|---------------------|
| user_id |   key  |   value    |       timestamp     |
|---------|--------|------------|---------------------|
|  100    | mobile | 765783xxxx | 2018-09-09 13:40:00 |
|  100    |  email | [email protected] | 2018-09-09 13:41:00 |
|  100    |  name  | johnny doe | 2018-09-09 13:42:00 |
|  101    | mobile | 456898xxxx | 2018-09-09 13:43:00 |
|  101    | email  | [email protected] | 2018-09-09 13:44:00 |
|  101    |  name  | janey doe  | 2018-09-09 13:45:00 |
|---------|--------|------------|---------------------|

I am first finding user_id based on given mobile number and joining it with the value column and then I want to find all rows corresponding to that user_id.

I have set up DISTKEY on user_id column and SORTKEY on timestamp column.

This table has close to 2 billion rows.

The cluster details are:

Cluster Properties:

  • Cluster Type: Single Node
  • Node Type: dc1.large

Capacity Details:

  • Current Node Type: dc1.large
  • CPU: 7 EC2 Compute Units (2 virtual cores) per node
  • Memory: 15GiB per node
  • Platform: 64-bit

The query with this configuration is currently taking 160 seconds to execute.

Appreciate any help in optimizing this query and reducing the run time.

1
I don't think that query will even run. Please post your actual query, and format it as code by adding four or more spaces to each line.Tim Biegeleisen
Why that's too many nesting? Plan always goes bad when too many nesting work..dwir182
your query looks a little convoluted, please can you explain your requirements clearly. it will help if you provide example input and expected outputJon Scott
Let me know if more details are required.makeshift-programmer
What is the table's DISTKEY and SORTKEY?John Rotenstein

1 Answers

1
votes

It is very difficult to suggest improvements on a query without understanding the contents of the table nor what you are trying to achieve.

It is quite possible that it could be as simple as:

SELECT DISTINCT user_id
FROM user_details
WHERE value = <given mobile no>

The other part of optimizing a query in Amazon Redshift is to wisely use DISTKEY and SORTKEY. Once again, it is difficult to recommend suitable values without knowing the data and how it is used, but this query would benefit from value being the SORTKEY. (But that doesn't necessarily mean it is best choice for all usage of the table.)