13
votes

I am trying to setup a spark cluster on k8s. I've managed to create and setup a cluster with three nodes by following this article: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

After that when I tried to deploy spark on the cluster it failed at spark submit setup. I used this command:

~/opt/spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://206.189.126.172:6443 \
--deploy-mode cluster \
--name word-count \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=docker.io/garfiny/spark:v2.3.0 \
—-conf spark.kubernetes.driver.pod.name=word-count \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

And it gives me this error:

Exception in thread "main" org.apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system.
    at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:122)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Shutdown hook called 2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/lz/0bb8xlyd247cwc3kvh6pmrz00000gn/T/spark-3967f4ae-e8b3-428d-ba22-580fc9c840cd

Note: I followed this article for installing spark on k8s. https://spark.apache.org/docs/latest/running-on-kubernetes.html

3

3 Answers

6
votes

The error message comes from commit 5d7c4ba4d73a72f26d591108db3c20b4a6c84f3f and include the page you mention: "Running Spark on Kubernetes" with the mention that you indicate:

// TODO(SPARK-23153): remove once submission client local dependencies are supported.
if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
  throw new SparkException("The Kubernetes mode does not yet support referencing application " +
    "dependencies in the local file system.")
}

This is described in SPARK-18278:

it wouldn't accept running a local: jar file, e.g. local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar, on my spark docker image (allowsMixedArguments and isAppResourceReq booleans in SparkSubmitCommandBuilder.java get in the way).

And this is linked to kubernetes issue 34377

The issue SPARK-22962 "Kubernetes app fails if local files are used" mentions:

This is the resource staging server use-case. We'll upstream this in the 2.4.0 timeframe.

In the meantime, that error message was introduced in PR 20320.

It includes the comment:

The manual tests I did actually use a main app jar located on gcs and http.
To be specific and for record, I did the following tests:

  • Using a gs:// main application jar and a http:// dependency jar. Succeeded.
  • Using a https:// main application jar and a http:// dependency jar. Succeeded.
  • Using a local:// main application jar. Succeeded.
  • Using a file:// main application jar. Failed.
  • Using a file:// dependency jar. Failed.

That issue should been fixed by now, and the OP garfiny confirms in the comments:

I used the newest spark-kubernetes jar to replace the one in spark-2.3.0-bin-hadoop2.7 package. The exception is gone.

2
votes

According to the mentioned documentation:

Dependency Management

If your application’s dependencies are all hosted in remote locations like HDFS or HTTP servers, they may be referred to by their appropriate remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies can be added to the classpath by referencing them with local:// URIs and/or setting the SPARK_EXTRA_CLASSPATH environment variable in your Dockerfiles. The local:// scheme is also required when referring to dependencies in custom-built Docker images in spark-submit.

Note that using application dependencies from the submission client’s local file system is currently not yet supported.

1
votes

I have the same case.

I do not know what to do? How to fix? Spark version 2.3.0.

Copied and renamed spark-kubernetes_2.11-2.3.1.jar -> spark-kubernetes_2.11-2.3.0.jar.

Spark does not find the corresponding kubernetes files.

bin/spark-submit \
--master k8s://https://lubernetes:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=spark \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/k8.crt \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ 
local:///usr/spark-2.3.0/examples/jars/spark-examples_2.11-2.3.0.jar

Thanks for the help!