I had installed the Hadoop + Spark cluster on the servers. It is working fine writing scala code in the spark-shell on the master server.
I put the Spark library (the jar files) in my project and I'm writing my first Scala code on my computer through Intellij.
When I run a simple code that just creates a SparkContext object for reading a file from the HDFS through the hdfs protocol, it outputs error messages.
The test function:
import org.apache.spark.SparkContext
class SpcDemoProgram {
def demoPrint(): Unit ={
println("class spe demoPrint")
test()
}
def test(){
var spark = new SparkContext();
}
}
The messages is:
20/11/02 12:36:26 INFO SparkContext: Running Spark version 3.0.0 20/11/02 12:36:26 WARN Shell: Did not find winutils.exe: {} java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548) at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569) at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592) at org.apache.hadoop.util.Shell.(Shell.java:689) at org.apache.hadoop.util.StringUtils.(StringUtils.java:78) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1664) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:104) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412) at org.apache.spark.SparkContext.(SparkContext.scala:303) at org.apache.spark.SparkContext.(SparkContext.scala:120) at scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala) Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468) at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439) at org.apache.hadoop.util.Shell.(Shell.java:516) ... 19 more 20/11/02 12:36:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/11/02 12:36:27 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:380) at org.apache.spark.SparkContext.(SparkContext.scala:120) at scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala) 20/11/02 12:36:27 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:380) at org.apache.spark.SparkContext.(SparkContext.scala:120) at scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala)
Does that error message imply that Hadoop and Spark must be installed on my computer?
What configuration do I need to do?
ssh
or whatever is your tool of preference) where Spark & Hadoop/HDFS are already installed. – Coursal