2
votes

A business analyst in our company was complaining about a slow running dashboard in Tableau. It processed quite a bit of data in a custom query on BigQuery.

Therefore I suggested he perform the query once and store the output as a temporary table, and then work directly on that table in Tableau to speed things up. The temporary table contains just 36 million rows now (instead of 3 billion), so it should be quite zippy.

Unfortunately after some time we got into trouble again. After improving the dashboard, the query that Tableau generated runs for over 30 minutes and receives a billing tier of 6. I'm at a loss to see what's so complicated about it. There's some date comparison inside a COUNT statement, but that's it.

Can someone explain what happens and what we can improve? We can off course move more of the calculations to a run-once job, but it kind of defeats Tableau's purpose.

Here's the job ID: vex-production:job_EDKjXtsTbMMb0Knc6ReNjEPaECk

And this is the query that Tableau generated:

SELECT
  (CASE WHEN (TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) = TIMESTAMP(DATE(TIMESTAMP([month])))) THEN '2.New Members' WHEN ((TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) < TIMESTAMP(DATE(TIMESTAMP([month]))))
      AND (TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL)) THEN '3.Reactivated Visitors' WHEN ((TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) IS NULL)
      AND (TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL)) THEN '3.Reactivated Visitors' WHEN (CASE WHEN TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL
      OR TIMESTAMP(DATE(TIMESTAMP([month]))) IS NULL THEN INTEGER(NULL) ELSE INTEGER(( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([month])))) ) -( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([previous_month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([previous_month])))) )) END = 1) THEN '1.Repeated Visitors' WHEN (CASE WHEN TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL
      OR TIMESTAMP(DATE(TIMESTAMP([month]))) IS NULL THEN INTEGER(NULL) ELSE INTEGER(( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([month])))) ) -( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([previous_month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([previous_month])))) )) END > 1) THEN '3.Reactivated Visitors' ELSE 'Error' END) AS [Calculation_1030479950946398210],
  COUNT((CASE WHEN (((CASE WHEN (TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) = TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month])))))) THEN '2.New Members' WHEN ((TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) < TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))))
              AND (TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL)) THEN '3.Reactivated Visitors' WHEN ((TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) IS NULL)
              AND (TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL)) THEN '3.Reactivated Visitors' WHEN (CASE WHEN TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL
              OR TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))) IS NULL THEN INTEGER(NULL) ELSE INTEGER(( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month])))))) ) -( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([previous_month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([previous_month])))) )) END = 1) THEN '1.Repeated Visitors' WHEN (CASE WHEN TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL
              OR TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))) IS NULL THEN INTEGER(NULL) ELSE INTEGER(( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month])))))) ) -( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([previous_month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([previous_month])))) )) END > 1) THEN '3.Reactivated Visitors' ELSE 'Error' END) = '3.Reactivated Visitors')
        OR ((CASE WHEN (TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) = TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month])))))) THEN '2.New Members' WHEN ((TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) < TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))))
              AND (TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL)) THEN '3.Reactivated Visitors' WHEN ((TIMESTAMP(DATE(TIMESTAMP([subscription_month]))) IS NULL)
              AND (TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL)) THEN '3.Reactivated Visitors' WHEN (CASE WHEN TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL
              OR TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))) IS NULL THEN INTEGER(NULL) ELSE INTEGER(( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month])))))) ) -( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([previous_month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([previous_month])))) )) END = 1) THEN '1.Repeated Visitors' WHEN (CASE WHEN TIMESTAMP(DATE(TIMESTAMP([previous_month]))) IS NULL
              OR TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))) IS NULL THEN INTEGER(NULL) ELSE INTEGER(( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month]))))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP(DATE(TIMESTAMP([month])))))) ) -( (12 * YEAR(TIMESTAMP(DATE(TIMESTAMP([previous_month]))))) + MONTH(TIMESTAMP(DATE(TIMESTAMP([previous_month])))) )) END > 1) THEN '3.Reactivated Visitors' ELSE 'Error' END) = '2.New Members')) THEN [user_id] ELSE NULL END)) AS [TEMP_Calculation_1133499793803542545__4248242405__0_],
  COUNT([user_id]) AS [TEMP_Calculation_1133499793848455199__2157635628__0_],
  MONTH(TIMESTAMP(DATE(TIMESTAMP([month])))) AS [mn_month_ok],
  QUARTER(TIMESTAMP(DATE(TIMESTAMP([month])))) AS [qr_month_ok],
  YEAR(TIMESTAMP(DATE(TIMESTAMP([month])))) AS [yr_month_ok]
FROM
  [vex-custom:samyUS.visitors_analysis] [visitors_analysis]
WHERE
  ((TIMESTAMP(DATE(TIMESTAMP([month]))) >= TIMESTAMP('2011-01-01 00:00:00'))
    AND (TIMESTAMP(DATE(TIMESTAMP([month]))) <= TIMESTAMP('2016-09-01 00:00:00')))
GROUP BY
  1,
  4,
  5,
  6
1
can you give a value example of [month] field. and what data type is itMikhail Berlyant
I think you just found my problem ... it's of STRING type instead of TIMESTAMP. We should probably fix that first (facepalm)Wouter
I'm afraid it made only a marginal difference in query run time. So the problem remainsWouter

1 Answers

0
votes

We have investigated this query - it absolutely should not be expensive. There is an internal problem inside BigQuery that makes it so. We are working on the fix, and it is already partially applied to your table, so the query in question should take about 25 seconds now. As more code changes roll out into production, it should improve even more.