Spark Thrift server load full dataset into memory before transmission via JDBC

Question

Spark Thrift server trying to load full dataset into memory before transmission via JDBC, on JDBC client I'm receiving error:

SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)

Query: select * from table. Is it possible enable something like stream mode for Thrift Server? The main goal - grant access from Pentaho ETL to Hadoop cluster using SparkSQL via JDBC connection. But if Thrift Server should load full dataset into memory before transmission this approach will not work.

Triffids Triffids · Accepted Answer · 2018-11-03T07:59:55

3

votes

Solution: spark.sql.thriftServer.incrementalCollect=true

Spark Thrift server load full dataset into memory before transmission via JDBC

2 Answers