1
votes

Normally, there is a spark-defaults.conf file located in /etc/spark/conf after I create a spark cluster on EMR.

If I provide no custom configs, you'll find spark-defaults.conf sitting happily in the conf directory:

[hadoop@ip-x-x-x-x ~]$ ls -la /etc/spark/conf/
total 64
drwxr-xr-x 2 root root 4096 Oct  4 08:08 .
drwxr-xr-x 3 root root 4096 Oct  4 07:41 ..
-rw-r--r-- 1 root root  987 Jul 26 21:56 docker.properties.template
-rw-r--r-- 1 root root 1105 Jul 26 21:56 fairscheduler.xml.template
-rw-r--r-- 1 root root 2373 Oct  4 07:42 hive-site.xml
-rw-r--r-- 1 root root 2024 Oct  4 07:42 log4j.properties
-rw-r--r-- 1 root root 2025 Jul 26 21:56 log4j.properties.template
-rw-r--r-- 1 root root 7239 Oct  4 07:42 metrics.properties
-rw-r--r-- 1 root root 7239 Jul 26 21:56 metrics.properties.template
-rw-r--r-- 1 root root  865 Jul 26 21:56 slaves.template
-rw-r--r-- 1 root root 2680 Oct  4 08:08 spark-defaults.conf
-rw-r--r-- 1 root root 1292 Jul 26 21:56 spark-defaults.conf.template
-rwxr-xr-x 1 root root 1563 Oct  4 07:42 spark-env.sh
-rwxr-xr-x 1 root root 3861 Jul 26 21:56 spark-env.sh.template

Following the instructions from http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html , i'm trying to add a jar to the driver and executor extraClassPath properties.

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.driver.extraClassPath": ":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/mysql-connector-java-5.1.39-bin.jar",
      "spark.executor.extraClassPath": ":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/mysql-connector-java-5.1.39-bin.jar"
    },
    "Configurations":[

    ]
  }
]

I dont see any errors upon creation of the cluster, but the spark-defaults.conf file never appears when I add this config.

enter image description here

and here's an ls showing that the file does not exist in the filesystem:

[hadoop@ip-x-x-x-x ~]$ ls -la /etc/spark/conf/
total 64
drwxr-xr-x 2 root root 4096 Oct  4 08:08 .
drwxr-xr-x 3 root root 4096 Oct  4 07:41 ..
-rw-r--r-- 1 root root  987 Jul 26 21:56 docker.properties.template
-rw-r--r-- 1 root root 1105 Jul 26 21:56 fairscheduler.xml.template
-rw-r--r-- 1 root root 2373 Oct  4 07:42 hive-site.xml
-rw-r--r-- 1 root root 2024 Oct  4 07:42 log4j.properties
-rw-r--r-- 1 root root 2025 Jul 26 21:56 log4j.properties.template
-rw-r--r-- 1 root root 7239 Oct  4 07:42 metrics.properties
-rw-r--r-- 1 root root 7239 Jul 26 21:56 metrics.properties.template
-rw-r--r-- 1 root root  865 Jul 26 21:56 slaves.template
-rw-r--r-- 1 root root 1292 Jul 26 21:56 spark-defaults.conf.template
-rwxr-xr-x 1 root root 1563 Oct  4 07:42 spark-env.sh
-rwxr-xr-x 1 root root 3861 Jul 26 21:56 spark-env.sh.template

What am I doing wrong?

1

1 Answers

1
votes

So I just tested this on EMR and the problem is that you have a : in front of your classpath specification:

"spark.driver.extraClassPath": ":/usr/lib/hadoop-lzo/...

needs to be

"spark.driver.extraClassPath": "/usr/lib/hadoop-lzo/....

Note that AWS also puts things on the classpath by setting extraClassPath and that stuff you specify in extraClassPath will overwrite and not append to that. In other words, you should make sure that your spark.xxx.extraClassPath includes the stuff that AWS puts there by default.