2
votes

I am writing an app to monitor and view Google dataflow jobs.

To get the metadata about google dataflow jobs, I am exploring the REST APIs listed here :

https://developers.google.com/apis-explorer/#search/dataflow/dataflow/v1b3/

I was wondering if there are any APIs that could do the following :

1) Get the job details if we provide a list of job Ids (there is an API for one individual job ID, but I wanted the same for a list of Ids)

2)Search or filter jobs on the basis of job name.Or for that matter, filtering of jobs of any other criteria apart from the job state.

3)Get log messages associated with a dataflow job

4)Get the records of "all" jobs, from the beginning of time. The current APIs seem to give records only of jobs in the last 30 days.

Any help would be greatly appreciated. Thank You

2

2 Answers

1
votes

There is additional documentation about the Dataflow REST API at: https://cloud.google.com/dataflow/docs/reference/rest/

Addressing each of your questions separately:

1) Get the job details if we provide a list of job Ids (there is an API for one individual job ID, but I wanted the same for a list of Ids)

No, there is no batch method for a list of jobs. You'll need to query them individually with projects.jobs.get.

2)Search or filter jobs on the basis of job name.Or for that matter, filtering of jobs of any other criteria apart from the job state.

The only other filter currently available is location.

3)Get log messages associated with a dataflow job

In Dataflow there are two types of log messages:

"Job Logs" are generated by the Dataflow service and provide high-level information about the overall job execution. These are available via the projects.jobs.messages.list API.

There are also "Worker Logs" written by the SDK and user code running in the pipeline. These are generated on the distributed VMs associated with a pipeline and ingested into Stackdriver. They can be queried via the Stackdriver Logging entries.list API by including in your filter:

resource.type="dataflow_step"
resource.labels.job_id="<YOUR JOB ID>"

4)Get the records of "all" jobs, from the beginning of time. The current APIs seem to give records only of jobs in the last 30 days.

Dataflow jobs are only retained by the service for 30 days. Older jobs are deleted and thus not available in the UI or APIs.

0
votes

In our case we implemented such functionality by tracking the job stages and by using schedulers/cron jobs to report the details of running job in one file. This file withing 1 bucket is watched by our job which just gives all status to our application