1) When submitting a job Spark needs to know what it is connecting to. The files are parsed and required configuration is being used to connect to Hadoop cluster. Note that in documentation they say that it is client side configuration (right in the first sentence), meaning that you actually do not need all the configurations to connect to the cluster in the file (to connect to non-secured Hadoop cluster with minimalist configuration) you will need at least the following configs present:
fs.defaultFS
(in case you intent to read from HDFS)
dfs.nameservices
yarn.resourcemanager.hostname
or yarn.resourcemanager.address
yarn.application.classpath
- (others might be required, depending on the configuration)
You can avoid having files, by setting the same settings in the code of the job you are submitting:
SparkConf sparkConfiguration = new SparkConf();
sparkConfiguration.set("spark.hadoop.fs.defaultFS", "...");
...
2) Spark submit can be located on any machine, not necessarily on the cluster, as long as it knows how to connect to the cluster (you can even run the submission from Eclipse, without installing anything, but project dependencies, related to Spark).
3) You should populate the configuration folders with:
- core-site.xml
- yarn-site.xml
- hdfs-site.xml
- mapred-site.xml
Copying those files from the server is an easiest approach to start with. After you can remove some configuration which is not required by spark-submit or may be security-sensitive.