I am trying to run the following code in my local mac where a spark cluster with master and slaves are running
public void run(String inputFilePath) {
String master = "spark://192.168.1.199:7077";
SparkConf conf = new SparkConf()
.setAppName(WordCountTask.class.getName())
.setMaster(master);
JavaSparkContext context = new JavaSparkContext(conf);
context.textFile(inputFilePath)
.flatMap(text -> Arrays.asList(text.split(" ")).iterator())
.mapToPair(word -> new Tuple2<>(word, 1))
.reduceByKey((a, b) -> a + b)
.foreach(result -> LOGGER.info(
String.format("Word [%s] count [%d].", result._1(), result._2)));
}
}
However I get the following exception both in the master console and
Error while invoking RpcHandler#receive() on RPC id 5655526795459682754 java.io.EOFException
and in the program console
18/07/01 22:35:19 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 192.168.1.199:7077 org.apache.spark.SparkException: Exception thrown in awaitResult
This runs well when I set the master as "local[*]" as given in this example.
I have seen examples where the jar is submited with spark-submit command but I am trying to run it programatically.
spark://localhost:7077
. If you're using a Docker image, ensure that all required ports are exposed by the container. – Bartosz Konieczny