2
votes

We have aprox 100 Google Cloud PubSub topics/subscriptions, DataFlows, and BigQuery / BigTable tables.

I can list pubsub topics: gcloud beta pubsub topics list I could use xargs and for each topic, list its subscriptions: gcloud beta pubsub topics list-subscriptions $topic_id

I can list all BigQuery tables: bq ls [project_id:][dataset_id] and all BigTable tables: cbt -project $project -instance $instance ls

I can list all running DataFlow jobs: gcloud beta dataflow jobs list --status=active but I CANNOT list all sources and sinks: gcloud beta dataflow jobs describe $job_id - doesnt show this info

If we had 1000 flows, queues & tables - I dont see how we could easily track this complexity.

my questions is: using Google Cloud tools (console and/or CLU), how can I get a birds eye map of our system flow sources & sinks and avoid distributed spaghetti ?

1

1 Answers

0
votes

There is some of this information available in the console for each job.

If you click on say, a PubSubIO.Read step, you can see the Pubsub Topic there. In the Summary of the pipeline, you can see Pipeline Options that can contain output table names and other options you specified.

The latter summary is available vi the CLI . It's under "displayData" when you retrieve information with "--full".