EDITI: by removing the conf setting in the app for 'setMaster' I'm able to run yarn-cluster successfully - if anyone coudl help with spark master as cluster deploy - that'd be fantastic
I'm trying to set up spark on a local testmachine so that I can read from an s3 bucket and then write back to it.
Running the jar/application using client works fine, well, fine in that it goes off to the bucket and creates a file and comes back again.
However I need this to work in cluster mode so that it more closely resembles our prod environment yet it's constantly failing - no real sensible messages in the logs that I can see and little feedback to go on.
Any help is greatly appreciated - I'm very new to spark/hadoop so may have overlooked something obvious.
I also tried running with yarn-cluster as the master but that failed for a different reason (saying it couldn't find the s3Native classes - which I pass in as jars)
This is one a windows ev
The command I'm running:
c:\>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --deploy-mode cluster --master spark://127.0.0.1:7077 --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://bucket/jar/fileInputRename.txt"
The output from this on the console is:
Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
master spark://127.0.0.1:7077
deployMode cluster
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile C:\Spark\bin\..\conf\spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass FileInputRename
primaryResource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
name FileInputRename
childArgs [s3://SessionCam-Steve/jar/fileInputRename.txt]
jars file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:
Running Spark using the REST application submission protocol.
Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
FileInputRename
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> FileInputRename
spark.jars -> file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar,file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
spark.submit.deployMode -> cluster
spark.master -> spark://127.0.0.1:7077
Classpath elements:
16/03/24 12:01:56 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://127.0.0.1:7077.
After a few more seconds it shows the c prompt and nothing else. The logs on 8080 :
Application ID Name Cores Memory per Node Submitted Time User State Duration
app-20160324120221-0016 FileInputRename 1 1024.0 MB 2016/03/24 12:02:21 Administrator FINISHED 3 s
where the error message only shows:
16/03/24 12:02:24 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
If I run the yarn-cluster as main so that this is my command:
c:>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --master yarn-cluster --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://SessionCam-Steve/jar/fileInputRename.txt"
The output and exception:
Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
master yarn-cluster
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile C:\Spark\bin\..\conf\spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass FileInputRename
primaryResource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
name FileInputRename
childArgs [s3://SessionCam-Steve/jar/fileInputRename.txt]
jars file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:
Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
FileInputRename
--addJars
file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
--jar
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
--class
FileInputRename
--arg
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.app.name -> FileInputRename
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:
16/03/24 12:05:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/03/24 12:05:23 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/03/24 12:05:23 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/03/24 12:05:23 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/03/24 12:05:23 INFO yarn.Client: Setting up container launch context for our AM
16/03/24 12:05:23 INFO yarn.Client: Setting up the launch environment for our AM container
16/03/24 12:05:23 INFO yarn.Client: Preparing resources for our AM container
16/03/24 12:05:24 WARN : Your hostname, WIN-EU4MXZ2GSIW resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a94:1d11%14, but we couldn't find any external IP address!
16/03/24 12:05:25 INFO yarn.Client: Uploading resource file:/C:/Spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/spark-assembly-1.6.1-had
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/sparkSubmit_NoJarSetInConf.j
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/hadoop-aws-2.
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/aws-java-sd
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/temp/2/spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1/__spark_conf__7363738392648975127.zip -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_14
16/03/24 12:05:28 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
16/03/24 12:05:28 INFO yarn.Client: Submitting application 4 to ResourceManager
16/03/24 12:05:29 INFO impl.YarnClientImpl: Submitted application application_1458817514983_0004
16/03/24 12:05:30 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:30 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458821128787
final status: UNDEFINED
tracking URL: http://WIN-EU4MXZ2GSIW:8088/proxy/application_1458817514983_0004/
user: Administrator
16/03/24 12:05:31 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:32 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:33 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:34 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:35 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:36 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:37 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:38 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:39 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:40 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:41 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:42 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:43 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:44 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:45 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:46 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:47 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:48 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:49 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:50 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:51 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:52 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:53 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:54 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:55 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:57 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:58 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:59 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:00 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:01 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:02 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:03 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:04 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:05 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:06 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:07 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:08 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:09 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:10 INFO yarn.Client: Application report for application_1458817514983_0004 (state: FAILED)
16/03/24 12:06:10 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1458817514983_0004 failed 2 times due to AM Container for appattempt_1458817514983_0004_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1458817514983_0004_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: 1 file(s) moved.
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458821128787
final status: FAILED
tracking URL: http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004
user: Administrator
Exception in thread "main" org.apache.spark.SparkException: Application application_1458817514983_0004 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/24 12:06:10 INFO util.ShutdownHookManager: Shutdown hook called
16/03/24 12:06:10 INFO util.ShutdownHookManager: Deleting directory C:\temp\2\spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1
this creates two application ids in the gui:
Application ID Name Cores Memory per Node Submitted Time User State Duration
app-20160324120600-0018 FileInputRename 2 1024.0 MB 2016/03/24 12:06:00 Administrator FINISHED 9 s
app-20160324120543-0017 FileInputRename 2 1024.0 MB 2016/03/24 12:05:43 Administrator FINISHED 8 s
both of which have this as the exception:
16/03/24 12:05:49 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:84)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 16 more
it'd be fantastic and a huge relief if I could get one of these working - thank you in advance for any help.