I want to use spring-boot and spark together in a Yarn cluster with kerberos enable. (I am sprint-boot newbie)
my prerequisite :
I can't use spark-submit, the application is launch this way :
java -jar <my_jar>I build the jar with spring-boot-maven-plugin.
Here is my simplified code :
@SpringBootApplication public class A implements ApplicationRunner {
@Autowired
B b;
public static void main(String[] args) {
SpringApplication.run(A.class, args);
}
@Override
public void run(ApplicationArguments args) {
b.run();
}
}
my B class:
@Component
public class B{
public void run() {
SparkSession ss = createSparkSession();
Dataset<Row> csv = readCsvFromHDFS()
// business logic here
writeCsvToHdfs();
}
}
This work well on localhost with master set to local[*], the main problem is when i try to set the sparkSession master to Yarn.
My idea was to pass all parameters from spark-submit to my spark session to avoid using spark-submit.
My sparkSession is created with this way :
SparkSession.builder()
.master("yarn")
.appName("appName")
.config("HADOOP_CONF_DIR", "/usr/hdp/current/hadoop-client/conf")
.config("SPARK_CONF_DIR", "/usr/hdp/current/spark2-client/conf")
.config("spark.driver.cores", "5")
.config("spark.driver.memory", "1g")
.config("spark.executor.memory", "1g")
.config("spark.logConf", "true")
.config("spark.submit.deployMode", "client")
.config("spark.executor.cores", "5")
.config("spark.hadoop.yarn.resourcemanager.address", "XXXX:8050")
.config("spark.hadoop.yarn.resourcemanager.hostname", "XXXX")
.config("spark.hadoop.security.authentication", "kerberos")
.config("hadoop.security.authorization","true")
.getOrCreate()
at the moment my error is :
java.lang.IllegalStateException: Failed to execute ApplicationRunner
...
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
my kerberos ticket is valid before launching application.
I think that my core-site, hdfs-site, yan-site ... is ignored because the SparkSession should be able to retrieve the information needed by himself.
I try to export it but it change nothing :
- export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf
- export HADOOP_HOME=/usr/hdp/current/hadoop-client/
- export SPARK_HOME=/usr/hdp/current/spark2-client
There is a better way to use spark + spring-boot + yarn + kerberos together and that respect my prerequisite ?
my version :
Java 8
HDP : 2.6.4
Spark : 2.3.2
Spring-boot : 2.3.0.RELEASE