I'm trying to use Google Cloud Dataproc to run a Spark ML job on CSV files stored in GCS. But, I'm having trouble figuring out how to compile the fat JAR for submission.
I can tell from the docs that cloud dataproc nodes have the connector pre-installed, but I don't know how to add the connector to my SBT config so I can develop and compile a fat JAR locally to submit to dataproc. Is there a line I can add to my build.sbt
so I have access to the connector locally (i.e. so it will compile)? And mark as "provided" if necessary so that it doesn't conflict with the version pre-installed on the worker nodes?
Any pointers or examples would be super appreciated.
TIA!