0
votes

As I dive into the world of Cloud Composer, Airflow, Google Kubernetes Engine, and Kubernetes I've not yet found a good answer to what exactly makes Cloud Composer better than Helm and GKE.

Here are some things I've found that could be unique to Composer but mostly seem like they could be handled by GKE.

On their homepage:

  • End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline.

On the features page:

  • Identity-Aware Proxy protects the interface
  • Cloud Composer associates a Cloud Storage bucket with the environment. The associated bucket stores the DAGs, logs, custom plugins, and data for the environment.

The downsides of Composer I've seen include:

  • It takes many hours to spin up a new instance
  • It doesn't support Kubernetes Executor
  • It is risky to change the underlying GKE config because it could be changed back by a composer update
  • There are often errors that happen when auto-scaling often happen but are documented as known
  • Upgrading environments is still beta

To be clear, I'm not saying Cloud Composer is bad. I'm just having trouble seeing why people like it. When I've asked folks why it is better than Helm + GKE they haven't had any compelling answers despite that they can tell many stories of Composer being unpredictable and having lots of issues.

1

1 Answers

0
votes

Are you comparing the same things?

On one side, GKE, you have a container orchestrator. Declare that you want, it will deploy and maintain the stability of the cluster according with declared configuration. This configuration can be packaged with helm to write it in an easier mode. Because you deploy container, you can use the language that you want in your services.

On the other side, you have a workflow manager, with scheduler, retry policies, parallel task, context forwarding. you write DAG in python (only!) and you have operators to interact with external product/services. It's mainly designed for data processing and used a lot by data scientist and data engineering team.

Note: Cloud Composer is deployed on top of GKE (scheduler and worker), redis, app engine and Cloud SQL.


You compare 2 different worlds: Ops world (GKE/Helm) and the App/Data world (Composer/Airflow). Have a look to this new video


Update 1:

My bad, I didn't understand!!! Anyway, personally I don't want to manage things by myself: a cluster, the update of K8S, VM patching, replicas, snapshot, backup/restore,...

If someone can do this for me, I prefer, and managed services are perfect for me!!

Do you ask yourselves this question about Cloud SQL and a database managed by yourselves on a Compute Engine instance? If not (because Cloud SQL solve a lot of boring issues), my opinion is the same for Composer.

But it's an opinion, I didn't test both and compare the performance, cost and easiness.