I am confused with configuring executor and driver memory in Spark-1.5.2.
My environment settings are as below:
3 Node MAPR Cluster - Each Node: Memory 256G, 16 CPU
Hadoop 2.7.0
Spark 1.5.2 - Spark-on-Yarn
Input data information:
460 GB Parquet format table from Hive I'm using spark-sql for querying the hive context with spark-on-yarn,but it's lot slower than the Hive, and am not sure with the right memory configurations for Spark,
These are my config's,
export SPARK_DAEMON_MEMORY=1g
export SPARK_WORKER_MEMORY=88g
spark.executor.memory 2g
spark.logConf true
spark.eventLog.dir maprfs:///apps/spark
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.kryoserializer.buffer.max 1024m
How to avoid Spark java.lang.OutOfMemoryError: Java heap space exceptions and GC overhead limit exceeded exceptions!! ???
Really appreciate your assistance in this!