Spark 2.2 introduced cost-based optimization (CBO, https://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2-2.html) which makes use of table statistics (as computed by ANALYZE TABLE COMPUTE STATISTICS....)
My question is: Are precomputed statistics also useful prior to Spark 2.2 (in my case 2.1) operating on (external hive) tables? Do statistics influence the optimizer? If yes, can I also compute the statistics in Impala instead of Hive?
UPDATE:
The only hint I have found so far is https://issues.apache.org/jira/browse/SPARK-15365
Apparently statistics are used to decide whether a broadcast-join is done are not