We're hoping to have a more thorough guide shortly on the official documentation, but to get started, visit the following API overview: https://developers.google.com/api-client-library/java/apis/dataproc/v1
It includes links to the Dataproc javadocs; if your server is making calls on behalf of your own project and not on behalf of your end-users' Google projects, then you probably want the keyfile-based service-account auth explained here to create the Credential
object you use to initialize the Dataproc
client stub.
As for the dataproc-specific parts, this just means you add the following dependency to your Maven pomfile if using Maven:
And then you'll have code like:
Dataproc dataproc = new Dataproc.Builder(new NetHttpTransport(), new JacksonFactory(), credential)
projectId, "global", new SubmitJobRequest()
.setJob(new Job()
.setPlacement(new JobPlacement()
.setSparkJob(new SparkJob()
"arg1", "arg2", "arg3")))))
Since different intermediary servers may do low-level retries or your request may throw an IOException where you don't know whether the job-submission succeeded or not, an addition step you may want to take is to generate your own jobId
; then you know what jobId to poll on to figure out if it got submitted even if your request times out or throws some unknown exception:
import java.util.UUID;
Dataproc dataproc = new Dataproc.Builder(new NetHttpTransport(), new JacksonFactory(), credential)
String curJobId = "json-agg-job-" + UUID.randomUUID().toString();
Job jobSnapshot = null;
try {
jobSnapshot = dataproc.projects().regions().jobs().submit(
projectId, "global", new SubmitJobRequest()
.setJob(new Job()
.setReference(new JobReference()
.setPlacement(new JobPlacement()
.setSparkJob(new SparkJob()
"arg1", "arg2", "arg3")))))
} catch (IOException ioe) {
try {
jobSnapshot = dataproc.projects().regions().jobs().get(
projectId, "global", curJobId).execute();
logger.info(ioe, "Despite exception, job was verified submitted");
} catch (IOException ioe2) {
// Handle differently; if it's a GoogleJsonResponseException you can inspect the error
// code, and if it's a 404, then it means the job didn't get submitted; you can add retry
// logic in that case.
// We can poll on dataproc.projects().regions().jobs().get(...) until the job reports being
// completed or failed now.