0
votes

I'm planning a project whereby I'd be hitting the (rate-limited) Reddit API and storing data in GCS and BigQuery. Initially, Cloud Functions would be the choice, but I'd have to create a Datastore implementation to manage the "pseudo" queue of requests and GAE for cron jobs.

Doing everything in Dataflow wouldn't make sense because it's not advised the make external requests (i.e. hitting the Reddit API) and perpetually running a single job.

Could I use Cloud Composer to read fields from a Google Sheet, then create a queue of requests based on the Google Sheet, then have a task queue execute those requests, store them in GCS and load into BigQuery?

1

1 Answers

1
votes

Sounds like a legitimate use case for Composer, additionally you could also leverage the pool concept in Airflow to manage concurrent calls to the same endpoint (e.g., Reddit API).