I'm trying to write a Dataframe into a csv file and put this csv file into a remote machine. The Spark job is running on Yarn into a Kerberos cluster.
Below, the error I get when the job tries to write the csv file on the remote machine :
diagnostics: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user=dev, access=WRITE, inode="/data/9/yarn/local/usercache/dev/appcache/application_1532962490515_15862/container_e05_1532962490515_15862_02_000001/tmp/spark_sftp_connection_temp178/_temporary/0":hdfs:hdfs:drwxr-xr-x
In order to write this csv file, i'm using the folowing parameters in a method that write this file in sftp mode :
def writeToSFTP(df: DataFrame, path: String) = {
df.write
.format("com.springml.spark.sftp")
.option("host", "hostname.test.fr")
.option("username", "test_hostname")
.option("password", "toto")
.option("fileType", "csv")
.option("delimiter", ",")
.save(path)
}
I'm using the Spark SFTP Connector library as described in the link : https://github.com/springml/spark-sftp
The script which is used to launch the job is :
#!/bin/bash
kinit -kt /home/spark/dev.keytab [email protected]
spark-submit --class fr.edf.dsp.launcher.LauncherInsertion \
--master yarn-cluster \
--num-executors 1 \
--driver-memory 5g \
--executor-memory 5g \
--queue dev \
--files /home/spark/dev.keytab#user.keytab,\
/etc/krb5.conf#krb5.conf,\
/home/spark/jar/dev-application-SNAPSHOT.conf#app.conf \
--conf "spark.executor.extraJavaOptions=-Dapp.config.path=./app.conf -Djava.security.auth.login.config=./jaas.conf" \
--conf "spark.driver.extraJavaOptions=-Dapp.config.path=./app.conf -Djava.security.auth.login.config=./jaas.conf" \
/home/spark/jar/dev-SNAPSHOT.jar > /home/spark/out.log 2>&1&
The csv files are not written into HDFS. Once the Dataframe is built i try to send it to the machine. I suspect a Kerberos issue with the sftp Spark connector : Yarn can't contact a remote machine...
Any help is welcome, thanks.