0
votes

I'm using GCP Composer with newest image version composer-1.16.1-airflow-1.10.15.

Mine webservers are dying from time to time because of some missing cache files

{cli.py:1050} ERROR - [Errno 2] No such file or directory

enter image description here

Does anybody know how to solve it?


Additional info:

Workers: Node count 3 Disk size (GB) 20 Machine type n1-standard-1

Web server configuration: Machine type composer-n1-webserver-8 (8 vCPU, 7.6 GB memory)

Configuration overrides:

enter image description here


UPDATE 27.04.2021

I've managed to find the place responsible for killing the web-server

https://github.com/apache/airflow/blob/4aec433e48dcc66c9c7b74947c499260ab6be9e9/airflow/bin/cli.py#L1032-L1138

GCP Composer is using Celery Executor underneath - soo during the check it tries to read some cache files that are already removed by workers?

1

1 Answers

0
votes

I've found it! Aaand I'll report the bug to GCP Composer team

So if the config webserver.reload_on_plugin_change=True then cli is going into that section: https://github.com/apache/airflow/blob/4aec433e48dcc66c9c7b74947c499260ab6be9e9/airflow/bin/cli.py#L1118-L1138

 # if we should check the directory with the plugin,
    if self.reload_on_plugin_change:
        # compare the previous and current contents of the directory
        new_state = self._generate_plugin_state()
        # If changed, wait until its content is fully saved.
        if new_state != self._last_plugin_state:
            self.log.debug(
                '[%d / %d] Plugins folder changed. The gunicorn will be restarted the next time the '
                'plugin directory is checked, if there is no change in it.',
                num_ready_workers_running, num_workers_running
            )
            self._restart_on_next_plugin_check = True
            self._last_plugin_state = new_state
        elif self._restart_on_next_plugin_check:
            self.log.debug(
                '[%d / %d] Starts reloading the gunicorn configuration.',
                num_ready_workers_running, num_workers_running
            )
            self._restart_on_next_plugin_check = False
            self._last_refresh_time = time.time()
            self._reload_gunicorn()

def _generate_plugin_state(self):
    """
    Generate dict of filenames and last modification time of all files in settings.PLUGINS_FOLDER
    directory.
    """
    if not settings.PLUGINS_FOLDER:
        return {}
    all_filenames = []
    for (root, _, filenames) in os.walk(settings.PLUGINS_FOLDER):
        all_filenames.extend(os.path.join(root, f) for f in filenames)
    plugin_state = {f: self._get_file_hash(f) for f in sorted(all_filenames)}
    return plugin_state

It is generating files to check by calling os.walk(settings.PLUGINS_FOLDER) function.

In the same time gcsfuse is deciding to delete part of these files And an error happens - file is not found.

So disabling webserver.reload_on_plugin_change is making the work - but this option is really convenient so I'll create the bug ticket for google