
I am learning from the class. I have run the code as shown in the class and i get below errors. Any idea what i should do?

I have spark 1.6.1 and Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)

val datadir = "C:/Personal/V2Maestros/Courses/Big Data Analytics with Spark/Scala"

////   Building and saving the model

val tweetData = sc.textFile(datadir + "/movietweets.csv")

def convertToRDD(inStr : String) : (Double,String) = {
    val attList = inStr.split(",")
    val sentiment = attList(0).contains("positive") match {
            case  true => 0.0
            case  false    => 1.0
    return (sentiment, attList(1))
val tweetText=tweetData.map(convertToRDD)

//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
var ttDF = sqlContext.createDataFrame(tweetText).toDF("label","text")

The error is:

scala> ttDF.show()
[Stage 2:>                                                          (0 + 2) / 2]16/03/30 11:40:25 ERROR ExecutorClassLoader: Failed to check existence of class org.apache.spark.sql.catalyst.expressio
REPL class server at
java.net.ConnectException: Connection timed out: connect
        at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)
Can you show the line where you create the SparkContext, sc?pagoda_5b
i use the default one..user2543622

I'm no expert but the connection IP in the error message looks like a private node or even your router/modem local address.

As stated in the comment it could be that you're running the context with a wrong configuration that tries to spread the work to a cluster that's not there, instead of in your local jvm process.

For further information you can read here and experiment with something like

import org.apache.spark.SparkContext

val sc = new SparkContext(master = "local[4]", appName = "tweetsClass", conf = new SparkConf)


Since you're using the interactive shell and the provided SparkContext available there, I guess you should pass the equivalent parameters to the shell command as in

<your-spark-path>/bin/spark-shell --master local[4]

Which instructs the driver to assign a master for the spark cluster on the local machine, on 4 threads.


I think the problem comes with connectivity and not from within the code.

Check if you can actually connect to this address and port (54595).


Probably your spark master is not accessible at the specified port. Use local[*] to validate using a smaller dataset and local master. Then, ckeck if the port is accessible or change it based on Spark port configuration (http://spark.apache.org/docs/latest/configuration.html)