2
votes

I have noticed that 2 Pub/Sub topics & subscriptions getting created automatically while creating cloud composer environment , so what is the need of pub/sub here, how the internal architecture of composer is related with Pub/Sub.

I need this conceptual clarification as I did't found any documentation explained this.

I understood, cloud composer uses there pub/sub subscription to communicate with its Kubernetes Engine service agent ,but my question is why its creating 2 topics by default instead of one, also I noticed while I am changing kubernetes configuration from cloud composer (ex changing number of nodes of kubernetes cluster ) / updating cluster values its again creating 2 other topics & subscriptions for the same, so I want to understand how actually its working internally, why its creating new topic & subscription after each update , why its not using exiting topics / subscription. also how composer & Kubernetes Engine service agent communicating through pub/sub, is these any other GCP components are deployed automatically for the same, I want to know the whole internal architecture.

One more thing I want to understand, what is the functionally "airflow-redis-0" pod within GKE cluster used for Composer? Is it only for message queuing or it acts as a communication between scheduler & workers? Is there any way to check / visualize (through redis-cli commands) all the functionally of Redis pod here?

Thanks in advance.

1

1 Answers

4
votes

According to Cloud Composer documentation, Cloud Composer uses these topics/subscriptions to communicate with its Kubernetes Engine service agent and relies on Cloud Pub/Sub's default behavior to manage messages.

Two topics/subscriptions are needed to achieve 2-way communication. If you check their names, you will notice that one is "composer-agent-to-backend-topic" and the other "composer-backend-to-agent-topic". After each update, the Composer environment is re-initiated and it cannot use an already existing topic/subscription so it creates new ones. The internal way in which GKE and Composer communicate through Pub/Sub is not publicly documented but it is additionally used to relay data from tenant projects, such as logs from the managed webserver.

You should not delete these subscriptions as this will affect the functionality of your Composer environment.

Regarding Redis, this image from the documentation is quite clear about its role: Cloud Composer Architecture

Composer is using Redis as a backend between the Scheduler and the workers. Redis service acts as a message broker for the CeleryExecutor, it is provisioned using StatefulSet and saves a snapshot every 60 seconds to a persistent disk to prevent message loss from container restarts (documentation reference).

You could use the following command to connect to the airflow-redis container inside the airflow-redis-0 pod:

kubectl exec -it airflow-redis-0 -c airflow-redis bash

and then run there whichever redis-cli command you want. However, it is not recommended tampering with deep architecture components of a Composer environment.