I have a Spark streaming application (Spark 2.4.4) running on AWS EMR 5.28.0. In the driver application on master node, besides setting up the spark streaming job, I am also running a http server (Akka-http 10.1.6) which can query the driver application for data, I bind to port 6161 like the following:
val bindingFuture: Future[ServerBinding] = Http().bindAndHandle(myapiroutes, "127.0.0.1", 6161)
try {
bindingFuture.map { serverBinding =>
log.info(s"AlertRestApi bound to ${serverBinding.localAddress}")
}
} catch {
case ex: Exception => {
log.error(s"Failed to bind to 127.0.0:6161")
system.terminate()
}
}
then I start spark streaming:
ssc.start()
When I test this on local spark, I am able to access http://localhost:6161/myapp/v1/data and get data from spark streaming, everything is good so far.
However, when I run this application in AWS EMR, I could not access port 6161. I ssh into the driver node and try to curl my url, it gives me error message:
[hadoop@ip-xxx-xx-xx-x ~]$ curl http://xxx.xx.xx.x:6161/myapp/v1/data
curl: (7) Failed to connect to xxx.xx.xx.x port 6161: Connection refused
when I look into the log in the driver node, I do see the port is bound (why the host shows 0:0:0:0:0:0:0:0? I don't know, that is the way in my dev testing, and it works, I see the same log and able to access the url):
20/04/13 16:53:26 INFO MyApp: MyRestApi bound to /0:0:0:0:0:0:0:0:6161
So my question is, what should I do so that I can access the api at port 6161 on the driver node? I realize Yarn resource manager may be involved but I know nothing about Yarn resource manager to point myself where to investigate.
Please help. Thanks