0
votes

Let's say I have three tables: t1(it has about 1 billion rows, fact table) and t2(empty table, 0 rows). and t0 (dimension table), all of them have properly collected statistics. In addition there is view v0:

REPLACE VIEW v0
AS SELECT * from t1
   union
   SELECT * from t2;

Let's look to these three queries:

1) Select * from t1 inner t0 join on t1.id = t0.id; -- Optimizer correctly estimates 1 bln rows

2) Select * from t2 inner t0 join on t1.id = t0.id; -- Optimizer correctly estimates 0 row

3) Select * from v0 inner t0 join on v0.id = t0.id;  -- Optimizer locks t1 and t2 for read, the correctly estimated, that it will get 1 bln rows from t1, but for no clear reasons estimated same number 1 bln from table t2.

What is going on here? Is is it the bug or a feature?

PS. Original query, that pretty big to show here, didn't finished in 35 minutes. After leaving just t1 - successfully finished in 15 minutes.

  • TD Release: 15.10.03.07

  • TD Version: 15.10.03.09

1
You do aware that the JOIN syntax in completely wrong - David דודו Markovitz
The optimizer never estimate 0 rows - David דודו Markovitz
@DuduMarkovitz 0 or 1 - its not big difference - Rocketq
(1) For the optimizer it is (2) It is indication that you didn't describe the execution plan correctly - David דודו Markovitz
You're right, but it doesnt matter ) - Rocketq

1 Answers

3
votes

It's not the same number for the 2nd Select, it's the overall number of rows in spool after the 2nd Select, which is 1 billion plus 0.

And you query was running slowly because you used a UNION which defaults to DISTINCT, running this on a billion rows is really expensive.

Better switch to UNION ALL instead.