1
votes

When use Spark-submit in cluster mode(yarn-cluster),jars and packages configuration confused me: for jars, i can put them in HDFS, instead of in local directory . But for packages, because they build with Maven, with HDFS,it can't work. my way like below:

spark-submit --jars hdfs:///mysql-connector-java-5.1.39-bin.jar --driver-class-path /home/liac/test/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar --conf "spark.mongodb.input.uri=mongodb://192.168.27.234/test.myCollection2?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://192.168.27.234/test.myCollection2"  --packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0 --py-files /home/liac/code/diagnose_disease/tool.zip main_disease_tag_spark.py --master yarn-client

error occur:

`Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0

Anyone can tell me how to use jars and packages in cluster mode? and what's wrong with my way?

1
In your script: --master yarn-client ?shuaiyuancn
yes, i also try '--master yarn-cluster'lac

1 Answers

0
votes

Your use of the --packages argument is wrong:

--packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0

It needs to be in the form of groupId:artifactId:version as the output suggests. You cannot use a URL with it.

An example for using mongoDB with spark with the built-in repository support:

$SPARK_HOME/bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0

If you insist on using your own jar you can provide it via --repositories. The value of the argument is

Comma-separated list of remote repositories to search for the Maven coordinates specified in packages.

For example, in your case, it could be

--repositories hdfs:///user/liac/package/jars/ --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0