Normally, there is a spark-defaults.conf
file located in /etc/spark/conf
after I create a spark cluster on EMR.
If I provide no custom configs, you'll find spark-defaults.conf
sitting happily in the conf directory:
[hadoop@ip-x-x-x-x ~]$ ls -la /etc/spark/conf/
total 64
drwxr-xr-x 2 root root 4096 Oct 4 08:08 .
drwxr-xr-x 3 root root 4096 Oct 4 07:41 ..
-rw-r--r-- 1 root root 987 Jul 26 21:56 docker.properties.template
-rw-r--r-- 1 root root 1105 Jul 26 21:56 fairscheduler.xml.template
-rw-r--r-- 1 root root 2373 Oct 4 07:42 hive-site.xml
-rw-r--r-- 1 root root 2024 Oct 4 07:42 log4j.properties
-rw-r--r-- 1 root root 2025 Jul 26 21:56 log4j.properties.template
-rw-r--r-- 1 root root 7239 Oct 4 07:42 metrics.properties
-rw-r--r-- 1 root root 7239 Jul 26 21:56 metrics.properties.template
-rw-r--r-- 1 root root 865 Jul 26 21:56 slaves.template
-rw-r--r-- 1 root root 2680 Oct 4 08:08 spark-defaults.conf
-rw-r--r-- 1 root root 1292 Jul 26 21:56 spark-defaults.conf.template
-rwxr-xr-x 1 root root 1563 Oct 4 07:42 spark-env.sh
-rwxr-xr-x 1 root root 3861 Jul 26 21:56 spark-env.sh.template
Following the instructions from http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html , i'm trying to add a jar to the driver and executor extraClassPath properties.
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.driver.extraClassPath": ":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/mysql-connector-java-5.1.39-bin.jar",
"spark.executor.extraClassPath": ":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/mysql-connector-java-5.1.39-bin.jar"
},
"Configurations":[
]
}
]
I dont see any errors upon creation of the cluster, but the spark-defaults.conf
file never appears when I add this config.
and here's an ls
showing that the file does not exist in the filesystem:
[hadoop@ip-x-x-x-x ~]$ ls -la /etc/spark/conf/
total 64
drwxr-xr-x 2 root root 4096 Oct 4 08:08 .
drwxr-xr-x 3 root root 4096 Oct 4 07:41 ..
-rw-r--r-- 1 root root 987 Jul 26 21:56 docker.properties.template
-rw-r--r-- 1 root root 1105 Jul 26 21:56 fairscheduler.xml.template
-rw-r--r-- 1 root root 2373 Oct 4 07:42 hive-site.xml
-rw-r--r-- 1 root root 2024 Oct 4 07:42 log4j.properties
-rw-r--r-- 1 root root 2025 Jul 26 21:56 log4j.properties.template
-rw-r--r-- 1 root root 7239 Oct 4 07:42 metrics.properties
-rw-r--r-- 1 root root 7239 Jul 26 21:56 metrics.properties.template
-rw-r--r-- 1 root root 865 Jul 26 21:56 slaves.template
-rw-r--r-- 1 root root 1292 Jul 26 21:56 spark-defaults.conf.template
-rwxr-xr-x 1 root root 1563 Oct 4 07:42 spark-env.sh
-rwxr-xr-x 1 root root 3861 Jul 26 21:56 spark-env.sh.template
What am I doing wrong?