I have a HortonWorks cluster running on AWS EC2 machine on which I would like to run a spark job using spark streaming that will swallow the tweet concernings Game of thrones. Before trying to run it on my cluster I did run it locally. The code is working, here it is:
import org.apache.spark.streaming.{StreamingContext, Seconds}
import org.apache.spark.streaming.twitter._
import org.apache.spark.{SparkConf, SparkContext}
object Twitter_Stream extends App {
val consumerKey = "hidden"
val consumerSecret = "hidden"
val accessToken = "hidden"
val accessTokenSecret = "hidden"
val sparkConf = new SparkConf().setAppName("GotTweets").setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(1))
val myStream = TwitterUtils.createStream(ssc, None, Array("#GoT","#WinterIsHere","#GameOfThrones"))
val rddTweets = myStream.foreachRDD(rdd =>
{
rdd.take(10).foreach(println)
})
ssc.start()
ssc.awaitTermination()
}
My question are more precisely about this specific code line :
val sparkConf = new SparkConf().setAppName("GotTweets").setMaster("local[2]")
I replaced the "local[2]" by "spark://ip-address-EC2:7077" wich correspond to one of my ec2 machine but I have a connection failure. I'm sure that the 7077 port is open on this machine.
Also when I run this code with this configuration (setMaster("local[2]")) on one of my EC2 machine , will my spark use all the machine of the cluster or will it run only on a single machine ?
Here the exception :
17/07/24 11:53:42 INFO AppClient$ClientEndpoint: Connecting to master spark://ip-adress:7077... 17/07/24 11:53:44 WARN AppClient$ClientEndpoint: Failed to connect to master ip-adress:7077 java.io.IOException: Failed to connect to spark://ip-adress:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)