0
votes

I am planning to spin up a dataflow instance on google cloud platform to run some experiments. I want to get familiar with, and experiment with using apache beam to pull data from BigQuery, run some ETL jobs (in python) and streaming jobs, and finally store the result in BigQuery.

However, I am also concerned with sending my company's GCP bill through the roof. What are the main cost considerations, or any methods to estimate what the cost will be, so I don't get an earful from my boss.

Any help would be greatly appreciated, thanks!

1
Here are some links that could be helpful : cloud.google.com/dataflow/pricing and cloud.google.com/bigquery/pricing. - norbjd
Also if you want to try things out in a sandbox project, Qwiklabs is a good option. qwiklabs.com/focuses/3460?parent=catalog - Lak

1 Answers

0
votes

You can use calculator to get an estimate of price of the job. One of the most important resource on the dataflow side is CPU per hour. To limit the cpu hours, you can set the maximum machines using option maxNumWorkers in your pipeline.

Here are more pipeline options that you can set while running your dataflow job https://cloud.google.com/dataflow/docs/guides/specifying-exec-params

For BQ, you can do a similar estimate using the calculator.