1
votes

I want to run my code on a Cluster: my code:

import java.util.Properties

import edu.stanford.nlp.ling.CoreAnnotations._
import edu.stanford.nlp.pipeline._
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.JavaConversions._
import scala.collection.mutable.ArrayBuffer

object Pre2 {

  def plainTextToLemmas(text: String, pipeline: StanfordCoreNLP): Seq[String] = {
    val doc = new Annotation(text)
    pipeline.annotate(doc)
    val lemmas = new ArrayBuffer[String]()
    val sentences = doc.get(classOf[SentencesAnnotation])
    for (sentence <- sentences; token <- sentence.get(classOf[TokensAnnotation])) {
      val lemma = token.get(classOf[LemmaAnnotation])
      if (lemma.length > 0 ) {
        lemmas += lemma.toLowerCase
      }
    }
    lemmas
  }
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()
      .setMaster("local")
      .setAppName("pre2")

    val sc = new SparkContext(conf)
      val plainText = sc.textFile("data/in.txt")
      val lemmatized = plainText.mapPartitions(p => {
        val props = new Properties()
        props.put("annotators", "tokenize, ssplit, pos, lemma")
        val pipeline = new StanfordCoreNLP(props)
        p.map(q => plainTextToLemmas(q, pipeline))
      })
      val lemmatized1 = lemmatized.map(l => l.head + l.tail.mkString(" "))
      val lemmatized2 = lemmatized1.filter(_.nonEmpty)
      lemmatized2.coalesce(1).saveAsTextFile("data/out.txt)
  }
}

and Cluster features:

2 Nodes

each node has : 60g RAM

each node has : 48 Cores

Shared Disk

I installed Spark on this cluster and one of these nodes is as a master and worker and another node is a worker .

when i run my code with this command in terminal :

./bin/spark-submit --master spark://192.168.1.20:7077 --class Main --deploy-mode cluster code/Pre2.jar

it shows :

15/08/19 15:27:21 WARN RestSubmissionClient: Unable to connect to server spark://192.168.1.20:7077. Warning: Master endpoint spark://192.168.1.20:7077 was not a REST server. Falling back to legacy submission gateway instead. 15/08/19 15:27:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Driver successfully submitted as driver-20150819152724-0002 ... waiting before polling master for driver state ... polling master for driver state State of driver-20150819152724-0002 is RUNNING Driver running on 1192.168.1.19:33485 (worker-20150819115013-192.168.1.19-33485)

How can i run above code on Spark standalone cluster ?

2
Your message says RUNNING, it appears to be running correctly.mattinbits
it doesn't return anything. in ui mode state is failedAHAD
... Does the UI give any more details on the reason for the failure?mattinbits
no it doesn't any more detail.AHAD
You're stating --class Main but you don't seem to have a class called Main also you're hard coding master to be localmattinbits

2 Answers

1
votes

Make sure you check out the WebUI using 8080 port. In your example it would be 192.168.1.20:8080.

If you are running it in Spark Standalone Cluster mode, try it without --deploy-mode cluster and hard code your nodes memory by adding --executor-memory 60g

0
votes

"Warning: Master endpoint spark://192.168.1.20:7077 was not a REST server" From the error, it also looks like the master rest url is different. The rest URL could be found on master_url:8080 UI