SELECT
a.id,
b.url as codingurl
FROM fact_A a
INNER JOIN dim_B b
ON strpos(a.url,b.url)> 0
- Records Count in Fact_A: 2 Million
- Records Count in Dim_B : 1500
- Time Taken to Execute : 10 Mins
- No of Nodes: 2
Could someone help me with an understanding why the above query takes more time to execute?
We have declared the distribution key in Fact_A to appropriately distribute the records evenly in both the nodes and also Sort Key is created on URL in Fact_A.
Dim_B table is created with DISTRIBUTION ALL
.
strpos
is not an efficient way to join tables compared to normal comparisons (eg =, <, >=). – John Rotensteina.url
andb.url
that are being used withstrpos
? There might be a more efficient way to do the query. – John Rotenstein