3
votes

I am using a Spark Databricks cluster and want to add a customized Spark configuration.
There is a Databricks documentation on this but I am not getting any clue how and what changes I should make. Can someone pls share the example to configure the Databricks cluster.
Is there any way to see the default configuration for Spark in the Databricks cluster.

1

1 Answers

3
votes

To fine tune Spark jobs, you can provide custom Spark configuration properties in a cluster configuration.

  1. On the cluster configuration page, click the Advanced Options toggle.
  2. Click the Spark tab.

enter image description here

[OR]

When you configure a cluster using the Clusters API, set Spark properties in the spark_conf field in the Create cluster request or Edit cluster request.

To set Spark properties for all clusters, create a global init script:

%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
  |#!/bin/bash
  |
  |cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
  |[driver] {
  |  "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
  |}
  |EOF
  """.stripMargin, true)

Reference: Databricks - Spark Configuration

Example: You can pick any spark configuration you want to test, here I want to specify "spark.executor.memory 4g",and the custom configuration looks like this.

enter image description here

After the cluster created, you can check out the result of custom configuration.

enter image description here Hope this helps.