10
votes

The docs specify instructions for the integration

What I want is that every time the scheduler stop working it will be restarted by it's own. Usually I start it manually with airflow scheduler -D but sometimes it stops when I'm not available.

Reading the docs I'm not sure about the configs.

The GitHub contains the following files:

airflow
airflow-scheduler.service
airflow.conf

I'm running Ubuntu 16.04

Airflow is installed on:

home/ubuntu/airflow

I have path of:

etc/systemd

The docs says to:

Copy (or link) them to /usr/lib/systemd/system

  1. Copy which of the files?

copy the airflow.conf to /etc/tmpfiles.d/

  1. What is tmpfiles.d ?

  2. What is # AIRFLOW_CONFIG= in the airflow file?

Or in another words... a more "down to earth" guide on how to do it?

3
Not an actual answer to your question, but I find it easier to run Airflow with supervisord.Georgi Raychev

3 Answers

6
votes

Integrating Airflow with systemd files makes watching your daemons easy as systemd can take care of restarting a daemon on failure. This also enables to automatically start airflow webserver and scheduler on system start.

Edit the airflow file from systemd folder in Airflow Github as per the current configuration to set the environment variables for AIRFLOW_CONFIG, AIRFLOW_HOME & SCHEDULER.

Copy the services files (the files with .service extension) to /usr/lib/systemd/system in the VM.

Copy the airflow.conf file to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/. Copying airflow.conf ensures /run/airflow is created with the right owner and permissions (0755 airflow airflow). Check whether /run/airflow exist with airflow:airflow owned by airflow user and airflow group if it doesn't create /run/airflowfolder with those permissions.

Enable this services by issuing systemctl enable <service> on command line as shown below.

sudo systemctl enable airflow-webserver
sudo systemctl enable airflow-scheduler

airflow-scheduler.service file should be as below:

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
1
votes

Your question dates a little, but I just discovered it, because I'm interested at the moment in the same subject. I think the answer to your question is here.

https://medium.com/@shahbaz.ali03/run-apache-airflow-as-a-service-on-ubuntu-18-04-server-b637c03f4722

0
votes

If your airflow library is installed in Python3 Virtual Environment, I have installed python3 virtual environment path in /opt/python3_venv/

set Airflow Home in profile.d file

sudo vi /etc/profile.d/airflow.sh

export AIRFLOW_HOME=/home/airflow/airflow
export AIRFLOW_CONFIG=/home/airflow/airflow/airflow.cfg 

sudo mkdir /run/airflow/

sudo chown -R airflow:airflow /run/airflow

Scheduler systemd:

sudo vi /etc/systemd/system/airflow-scheduler.service

[Unit]
Description=Airflow Scheduler daemon

[Service]
User=airflow
Type=simple
ExecStart=/opt/python3_venv/bin/python /opt/python3_venv/bin/airflow scheduler --pid /run/airflow/scheduler.pid
Environment="PATH=/opt/python3_venv/bin:/sbin:/bin:/usr/sbin:/usr/bin"

[Install]
WantedBy=multi-user.target

Webserver systemd:

sudo vi /etc/systemd/system/airflow-webserver.service

[Unit]
Description=Airflow Webserver daemon

[Service]
User=airflow
Type=simple
ExecStart=/opt/python3_venv/bin/python /opt/python3_venv/bin/airflow webserver  --pid /run/airflow/webserver.pid
Environment="PATH=/opt/python3_venv/bin:/sbin:/bin:/usr/sbin:/usr/bin"

[Install]
WantedBy=multi-user.target