Tableau executing all datasource joins with Spark

Question

I have a 2 tableau (v10.1.1) worksheets - 5 tables - 5 left outer joins in datasource - everything is same in both worksheets - Just that, one runs on SQL Sever (2012) and the other runs on Spark (v1.6).

The one on SQl Server runs ONLY those joins which are being referenced in the worksheet visualization. However, the Spark worksheet is executing all the 5 joins?

Bit surprised I am - Same tables, same model, same worksheet but different data-source generating different query.

Best Regards
Dev

Alex Blakemore Alex Blakemore · Accepted Answer · 2017-03-21T15:10:20

There is a setting called Assume Referential Integrity on each data source that tells Tableau whether or not you want it to avoid joining in unreferenced tables - an optimization known as join culling. Tableau gives you the option because it is possible the optimization would change the results in some cases.

I believe those problem edge cases only arise if your data violates normal referential integrity constraints, say by having a destination table that does not have a matching primary key for a foreign key in the source table. If your database does not have that problem, either because it is enforced by constraints, ETL processes, application logic etc, then checking the box allows Tableau to safely generate more efficient SQL.

Check whether the setting is the same in both data sources, under the data menu. I believe the default is false - which is the conservative but slower choice.

If both data sources have the same setting, then you may have found a bug in one of the drivers, probably the newer Spark SQL driver.

Tableau executing all datasource joins with Spark

1 Answers