launch spark/spring-boot job on yarn cluster with kerberos enable

Question

I want to use spring-boot and spark together in a Yarn cluster with kerberos enable. (I am sprint-boot newbie)

my prerequisite :

I can't use spark-submit, the application is launch this way :
```
java -jar <my_jar>
```
I build the jar with spring-boot-maven-plugin.

Here is my simplified code :

@SpringBootApplication public class A implements ApplicationRunner {

    @Autowired
    B b;

public static void main(String[] args) {
        SpringApplication.run(A.class, args);
    }

    @Override
    public void run(ApplicationArguments args) {           
        b.run();    
    }    
}

my B class:

@Component
public class B{
 public void run() {
  SparkSession ss = createSparkSession();
  Dataset<Row> csv = readCsvFromHDFS()
  //  business logic here
  writeCsvToHdfs();
 }
}

This work well on localhost with master set to local[*], the main problem is when i try to set the sparkSession master to Yarn. My idea was to pass all parameters from spark-submit to my spark session to avoid using spark-submit. My sparkSession is created with this way :

SparkSession.builder()
.master("yarn")
.appName("appName")
.config("HADOOP_CONF_DIR", "/usr/hdp/current/hadoop-client/conf")
.config("SPARK_CONF_DIR", "/usr/hdp/current/spark2-client/conf")
.config("spark.driver.cores", "5")
.config("spark.driver.memory", "1g")
.config("spark.executor.memory", "1g")
.config("spark.logConf", "true")
.config("spark.submit.deployMode", "client")
.config("spark.executor.cores", "5")
.config("spark.hadoop.yarn.resourcemanager.address", "XXXX:8050")
.config("spark.hadoop.yarn.resourcemanager.hostname", "XXXX")
.config("spark.hadoop.security.authentication", "kerberos")
.config("hadoop.security.authorization","true")
.getOrCreate()

at the moment my error is :

java.lang.IllegalStateException: Failed to execute ApplicationRunner
...

Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]

my kerberos ticket is valid before launching application.

I think that my core-site, hdfs-site, yan-site ... is ignored because the SparkSession should be able to retrieve the information needed by himself.

I try to export it but it change nothing :

export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf
export HADOOP_HOME=/usr/hdp/current/hadoop-client/
export SPARK_HOME=/usr/hdp/current/spark2-client

There is a better way to use spark + spring-boot + yarn + kerberos together and that respect my prerequisite ?

my version :
Java 8
HDP : 2.6.4
Spark : 2.3.2
Spring-boot : 2.3.0.RELEASE

andreoss andreoss · Accepted Answer · 2020-06-24T14:05:30

There are several options to solve that

You can handle keytab explicitly with UserGroupInformation and run your main code in privileged context

private String principal;
private File keytab;

public UserGroupInformation ugi() {
    final org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
    conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, "Kerberos");
    UserGroupInformation.setConfiguration(conf);
    return UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab.getAbsolutePath());
}

And then

ugi.doAs(() -> {
    // Start Spring context here
});

Doing spark-submit from java

You can locate your jar via reflection and submit it through deploy.SparkSubmit class, with provided keytab and principal.

You can embed a script which will handle submit inside your jar with embeddedLaunchScript in spring-boot-maven-plugin Note: you would have to start it with ./app.jar not java -jar app.jar

<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<executions>
  <execution>
    <goals>
      <goal>repackage</goal>
      <goal>build-info</goal>
    </goals>
  </execution>
</executions>
<configuration>
  <mainClass>your.Main</mainClass>
  <embeddedLaunchScript>src/main/sh/spark-submit.sh</embeddedLaunchScript>
</configuration>
</plugin>

Where spark-submit.sh is your own implementation of spark-submit similar to 2)

launch spark/spring-boot job on yarn cluster with kerberos enable

1 Answers