In the article series What If...? Running Airflow Tasks without the workers, I introduced one of the new features of Airflow 2.10.0. It's time to introduce a feature in Airflow 2.7.0 (no, it's not).
Even though Grammarly suggested such a fancy post title, this article will only discuss the simple pull request Add default_deferrable config #31712. This was the first major thing I did since joining Astronomer.io and the first pull request that got voted PR of the month. Even though it turned out to be relatively easy, we tried some hacky things with Cluster Policty before it looks like this. But I think that a topic for another day.
So, what is this default_deferrable config?
It's a configuration that you can specify in the airflow.cfg
file.
# airflow.cfg
default_deferrable = true
or through the environment variable
AIRFLOW__OPERATORS__DEFAULT_DEFERRABLE=true
By enabling this configuration, all the current operators that support deferrable mode (see the complete list here) will now operate in deferrable mode. The primary advantage of deferrable operators is that part of the logic is run asynchronously by the Airflow triggerer. For more details, please refer to Deferrable Operators & Triggers.
How was it done?
Step 1: Add this configuration to airflow/config_templates/config.yml
default_deferrable:
description: |
The default value of attribute "deferrable" in operators and sensors.
version_added: ~
type: boolean
example: ~
default: "false"
Step 2: Change the default value of the argument deferrable
in every operator/sensor to conf.getboolean("operators", "default_deferrable", fallback=False)
; this is how Airflow retrieves the value from airflow.cfg
or environment variables.
deferrable: bool = conf.getboolean("operators", "default_deferrable", fallback=False),
Step 3: Kindly ask every contributor to add this default value to every operator/sensor and reviewer to check
Step 4: Pray that everyone will follow 🙏...... at least that is what we did at that moment