0
votes

I had installed the Hadoop + Spark cluster on the servers. It is working fine writing scala code in the spark-shell on the master server.

I put the Spark library (the jar files) in my project and I'm writing my first Scala code on my computer through Intellij.

When I run a simple code that just creates a SparkContext object for reading a file from the HDFS through the hdfs protocol, it outputs error messages.

The test function:

import org.apache.spark.SparkContext

class SpcDemoProgram {

  def demoPrint(): Unit ={
    println("class spe demoPrint")
    test()
  }

  def test(){

    var spark = new SparkContext();
  }
}

The messages is:

20/11/02 12:36:26 INFO SparkContext: Running Spark version 3.0.0 20/11/02 12:36:26 WARN Shell: Did not find winutils.exe: {} java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548) at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569) at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592) at org.apache.hadoop.util.Shell.(Shell.java:689) at org.apache.hadoop.util.StringUtils.(StringUtils.java:78) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1664) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:104) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412) at org.apache.spark.SparkContext.(SparkContext.scala:303) at org.apache.spark.SparkContext.(SparkContext.scala:120) at scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala) Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468) at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439) at org.apache.hadoop.util.Shell.(Shell.java:516) ... 19 more 20/11/02 12:36:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/11/02 12:36:27 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:380) at org.apache.spark.SparkContext.(SparkContext.scala:120) at scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala) 20/11/02 12:36:27 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:380) at org.apache.spark.SparkContext.(SparkContext.scala:120) at scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala)

Does that error message imply that Hadoop and Spark must be installed on my computer?

What configuration do I need to do?

1
Hello! In order to help you, could you provide all the error messages that are being showed to you? Or the Scala-written program you mention you used? Or the environment variables on the system you are operating?Coursal
Hello @Coursal Thank you for your reply. I have edited the post to show the code and error message. I don't install Hadoop on my computer so I have not put any environment variable or configure file about Hadoop on my computer.Ryan Chen
In case you need to run this program to an existing set of servers, you need to access and run it from there (maybe with ssh or whatever is your tool of preference) where Spark & Hadoop/HDFS are already installed.Coursal

1 Answers

1
votes

I assume, you are trying to read the file with the path as hdfs://<FILE_PATH> then yes you need to have Hadoop installed else if its just a local directory you could try without "hdfs://" in the file path.