Trying to run Spark-Wiki-Parser on a GCP Dataproc cluster. The code takes in two arguments "dumpfile" and "destloc". When I submit the following I get a [scallop] Error: Excess arguments provided: 'gs://enwiki-latest-pages-articles.xml.bz2 gs://output_dir/'
.
gcloud dataproc jobs submit spark --cluster $CLUSTER_NAME --project $CLUSTER_PROJECT \
--class 'com.github.nielsenbe.sparkwikiparser.wikipedia.sparkdbbuild.DatabaseBuildMain' \
--properties=^#^spark.jars.packages='com.databricks:spark-xml_2.11:0.5.0,com.github.nielsenbe:spark-wiki-parser_2.11:1.0' \
--region=$CLUSTER_REGION \
-- 'gs://enwiki-latest-pages-articles.xml.bz2' 'gs://output_dir/'
How do I get the code to recognize the input arguments?