0
votes

I have been running a query of the format below

SELECT b.date as Date,COUNT(DISTINCT user_id) AS NewUsers FROM (
SELECT user_id,MIN(date) as min_date
FROM tableA
WHERE date >= '2018-10-10'
AND filter1 = "XYZ"
GROUP BY ) a
CROSS JOIN (
SELECT date FROM tableB
WHERE date >= '2018-10-19' AND date <= CURRENT_DATE()
GROUP BY 1) b
WHERE a.date >= DATE_SUB(b.date, INTERVAL 6 DAY) AND a.date <= b.date
GROUP BY 1

Let's say the above is result1

SELECT b.date as Date,COUNT(DISTINCT user_id) AS NewUsers FROM (
SELECT user_id,MIN(date) as min_date
FROM tableA
WHERE date >= '2018-07-10'
AND filter1 = "XYZ"
GROUP BY ) a
CROSS JOIN (
SELECT date FROM tableB
WHERE date >= '2018-07-19' AND date <= CURRENT_DATE()
GROUP BY 1) b
WHERE a.date >= DATE_SUB(b.date, INTERVAL 6 DAY) AND a.date <= b.date
GROUP BY 1

The above is result2

Here 2018-07-19 is the launch date.

Since I have the data till 2018-10-19, I want to run the query from the later date to optimize the cost and the data consumption by the query....but some how, I am getting incorrect data.

But, if I run the same query from the launch date, I am getting the correct results.

I mean the NewUsers from result1 for the corresponding dates (like date >= 2018-10-19) are more than the NewUsers from result2.

No sure, where I am missing something.

Any help would be greatly appreciated.

Thanks

1

1 Answers

1
votes

I think - it is because of use of 'MIN(date)' - You see shift in counts because you restricted dates so those users who were first seen in earlier dates - now those same "old" users are counted for recent days - thus the confusion