I have created a HDInsight Cluster (v4, Spark 2.4) in Azure and want to run a Spark.Ne app on this cluster through an Azure Data Factory v2 activity. In the Spark Activity it is possible to specify path to the jar, --class parameter and arguments to pass to the Spark app. The arguments are prefixed automatically with "-args" when run. But being able set the "--files" is necessary as it tells spark-submit what files that needs to be deployed to the worker nodes. In this case it is for distributing dll's with UDF-definitions. These files are necessary for the Spark to run. Since UDF's are a key component to Spark apps, I would have thought that this should be possible.
If I SSH to the cluster and run the spark-submit command directly and specify the --files parameter, the Spark app works since the files are being distributed to the worker nodes.
spark-submit --deploy-mode cluster --master yarn --files wasbs://[email protected]/SparkJobs/mySparkApp.dll --class org.apache.spark.deploy.dotnet.DotnetRunner wasbs://[email protected]/SparkJobs/microsoft-spark-2.4.x-0.12.1.jar wasbs://[email protected]/SparkJobs/publish.zip mySparkApp
These are the guides that have been followed: