As I dive into the world of Cloud Composer, Airflow, Google Kubernetes Engine, and Kubernetes I've not yet found a good answer to what exactly makes Cloud Composer better than Helm and GKE.
Here are some things I've found that could be unique to Composer but mostly seem like they could be handled by GKE.
On their homepage:
- End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline.
On the features page:
- Identity-Aware Proxy protects the interface
- Cloud Composer associates a Cloud Storage bucket with the environment. The associated bucket stores the DAGs, logs, custom plugins, and data for the environment.
The downsides of Composer I've seen include:
- It takes many hours to spin up a new instance
- It doesn't support Kubernetes Executor
- It is risky to change the underlying GKE config because it could be changed back by a composer update
- There are often errors that happen when auto-scaling often happen but are documented as known
- Upgrading environments is still beta
To be clear, I'm not saying Cloud Composer is bad. I'm just having trouble seeing why people like it. When I've asked folks why it is better than Helm + GKE they haven't had any compelling answers despite that they can tell many stories of Composer being unpredictable and having lots of issues.