0
votes

Spark Thrift server trying to load full dataset into memory before transmission via JDBC, on JDBC client I'm receiving error:

SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)

Query: select * from table. Is it possible enable something like stream mode for Thrift Server? The main goal - grant access from Pentaho ETL to Hadoop cluster using SparkSQL via JDBC connection. But if Thrift Server should load full dataset into memory before transmission this approach will not work.

2

2 Answers

3
votes

Solution: spark.sql.thriftServer.incrementalCollect=true

1
votes

I your situation increase the spark driver memory and max result size as spark.driver.memory=xG ,spark.driver.maxResultSize=xG. according to https://spark.apache.org/docs/latest/configuration.html