I am trying to run living spark session using spring boot. My aim is to run spark in Yarn mode with springboot.
- I would like to have only only one jar file as artifact and do not want to separate spark dependencies
- Apart from below code do I need to add any configuration? When I am trying it always try to connect to localhost instead of actual host. (RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 20/01/23 20:14:14 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032)
- Is there any separate configuration is required to log worker logs along with driver logs?
SparkConf conf = new SparkConf(). set("spark.driver.extraJavaOptions", "Dlog4j.configuration=file://src/main/resources/log4j.properties"). set("spark.executor.extraJavaOptions","Dlog4j.configuration=file://src/main/resources/log4j.properties"). set("yarn.resoursemanager.address","http://my-yarn-host"). set("spark.yarn.jars","BOOT-INF/lib/spark-*.jar"). setAppName("NG-Workbench").setMaster("yarn"); JavaSparkContext sc = new JavaSparkContext(conf); List<String> word = new ArrayList<>(); word.add("Sidd"); JavaRDD<String> words = sc.parallelize(Arrays.asList("Michel", "Steve")); Map<String, Long> wordCounts = words.countByValue(); wordCounts.forEach((k, v) -> System.out.println(k + " " + v)); sc.close();