Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Lifecycle events:
on_starting
before_stopping
DagRun State Change Events:
on_dag_run_running
on_dag_run_success
on_dag_run_failed
TaskInstance State Change Events:
on_task_instance_running
on_task_instance_success
on_task_instance_failed
After discussion <https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4>
__
and a voting process <https://lists.apache.org/thread/pgcgmhf6560k8jbsmz8nlyoxosvltph2>
__,
the Airflow's PMC and Committers have reached a resolution to no longer maintain MsSQL as a supported Database Backend.
As of Airflow 2.9.0 support of MsSQL has been removed for Airflow Database Backend.
A migration script which can help migrating the database before upgrading to Airflow 2.9.0 is available in
airflow-mssql-migration repo on Github <https://github.com/apache/airflow-mssql-migration>
_.
Note that the migration script is provided without support and warranty.
This does not affect the existing provider packages (operators and hooks), DAGs can still access and process data from MsSQL.
Datasets must use a URI that conform to rules laid down in AIP-60, and the value
will be automatically normalized when the DAG file is parsed. See
documentation on Datasets <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html>
_ for
a more detailed description on the rules.
You may need to change your Dataset identifiers if they look like a URI, but are used in a less mainstream way, such as relying on the URI's auth section, or have a case-sensitive protocol name.
get_permitted_menu_items
in BaseAuthManager
has been renamed filter_permitted_menu_items
(#37627)The Audit Log event
name for REST API events will be prepended with api.
or ui.
, depending on if it came from the Airflow UI or externally.
There are a few caveats though:
Pendulum2 does not support Python 3.12. For Python 3.12 you need to use
Pendulum 3 <https://pendulum.eustace.io/blog/announcing-pendulum-3-0-0.html>
_
Minimum SQLAlchemy version supported when Pandas is installed for Python 3.12 is 1.4.36
released in
April 2022. Airflow 2.9.0 increases the minimum supported version of SQLAlchemy to 1.4.36
for all
Python versions.
Not all Providers support Python 3.12. At the initial release of Airflow 2.9.0 the following providers are released without support for Python 3.12:
apache.beam
- pending on Apache Beam support for 3.12 <https://github.com/apache/beam/issues/29149>
_papermill
- pending on Releasing Python 3.12 compatible papermill client version
including this merged issue <https://github.com/nteract/papermill/pull/771>
_There's now a limit to the length of data that can be stored in the Rendered Template Fields.
The limit is set to 4096 characters. If the data exceeds this limit, it will be truncated. You can change this limit
by setting the [core]max_template_field_length
configuration option in your airflow config.
Xcom table column value
type has changed from blob
to longblob
. This will allow you to store relatively big data in Xcom but process can take a significant amount of time if you have a lot of large data stored in Xcom.
To downgrade from revision: b4078ac230a1
, ensure that you don't have Xcom values larger than 65,535 bytes. Otherwise, you'll need to clean those rows or run airflow db clean xcom
to clean the Xcom table.
Matomo
as an option for analytics_tool. (#38221)hashable
(#37465)queuedEvent
endpoint to get/delete DatasetDagRunQueue (#37176)DatasetOrTimeSchedule
(#36710)on_skipped_callback
to BaseOperator
(#36374)@task.bash
TaskFlow decorator (#30176, #37875)ExternalPythonOperator
use version from sys.version_info
(#38377)run_id
column to log table (#37731)tryNumber
to grid task instance tooltip (#37911)ExternalPythonOperator
(#37409)Pathlike
(#36947)nowait
and skip_locked into with_row_locks (#36889)dag/dagRun
in the REST API (#36641)Connexion
from auth manager interface (#36209)total_entries
count on the event logs endpoint (#38625)tz
in next run ID info (#38482)chakra
styles to keep dropdowns
in filter bar (#38456)__exit__
is called in decorator context managers (#38383)BaseAuthManager.is_authorized_custom_view
abstract (#37915)/get_logs_with_metadata
endpoint (#37756)encoding
to the SQL engine in SQLAlchemy v2 (#37545)consuming_dags
attr eagerly before dataset listener (#36247)importlib_metadata
with compat to Python 3.10/3.12 stdlib
(#38366)__new__
magic method of BaseOperatorMeta to avoid bad mixing classic and decorated operators (#37937)sys.version_info
for determine Python Major.Minor (#38372)blinker
add where it requires (#38140)> 39.0.0
(#38112)assert
outside of the tests (#37718)flask._request_ctx_stack
(#37522)login
attribute in airflow.__init__.py
(#37565)datetime.datetime.utcnow
by airflow.utils.timezone.utcnow
in core (#35448)is_authorized_cluster_activity
from auth manager (#36175)exception
to templates ref list (#36656)No significant changes.
FixedTimezone
(#38139)ObjectStoragePath
(#37769)pytest_rewrites
(#38095, #38139)pandas
to <2.2
(#37748)croniter
to fix an issue with 29 Feb cron expressions (#38198)2.8.3
(#38036)The default Airflow image that is used with the Chart is now 2.8.3
, previously it was 2.8.2
.
.Values.airflowPodAnnotations
(#37917)multiNamespace
releases with the same name (#37197)airflow_pre_installed_providers.txt
artifact (#37679)BranchDayOfWeekOperator
(#37813)ERD
generating doc improvement (#37808)The default Airflow image that is used with the Chart is now 2.8.2, previously it was 2.8.1.
allowed_deserialization_classes
flag now follows a glob pattern (#36147).For example if one wants to add the class airflow.tests.custom_class
to the
allowed_deserialization_classes
list, it can be done by writing the full class
name (airflow.tests.custom_class
) or a pattern such as the ones used in glob
search (e.g., airflow.*
, airflow.tests.*
).
If you currently use a custom regexp path make sure to rewrite it as a glob pattern.
Alternatively, if you still wish to match it as a regexp pattern, add it under the new
list allowed_deserialization_classes_regexp
instead.
This was done under the policy that we do not want users like Viewer, Ops, and other users apart from Admin to have access to audit_logs. The intention behind this change is to restrict users with less permissions from viewing user details like First Name, Email etc. from the audit_logs when they are not permitted to.
The impact of this change is that the existing users with non admin rights won't be able to view or access the audit_logs, both from the Browse tab or from the DAG run.
AirflowTimeoutError
is no longer except
by default through Exception
(#35653).The AirflowTimeoutError
is now inheriting BaseException
instead of
AirflowException
->Exception
.
See https://docs.python.org/3/library/exceptions.html#exception-hierarchy
This prevents code catching Exception
from accidentally
catching AirflowTimeoutError
and continuing to run.
AirflowTimeoutError
is an explicit intent to cancel the task, and should not
be caught in attempts to handle the error and return some default value.
Catching AirflowTimeoutError
is still possible by explicitly except
ing
AirflowTimeoutError
or BaseException
.
This is discouraged, as it may allow the code to continue running even after
such cancellation requests.
Code that previously depended on performing strict cleanup in every situation
after catching Exception
is advised to use finally
blocks or
context managers. To perform only the cleanup and then automatically
re-raise the exception.
See similar considerations about catching KeyboardInterrupt
in
https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt
IMPORT_ERROR
from DAG related permissions to view related permissions (#37292)AirflowTaskTimeout
to inherit BaseException
(#35653)namedtuple
(#37168)Treeview
function (#37162)access_entity
is specified (#37290)dateTimeAttrFormat
constant (#37285)@Sentry.enrich_errors
(#37002)dryrun
auto-fetch (#36941)/variables
endpoint (#36820)pendulum.from_timestamp
usage (#37160)CLI
instead of specific one (#37651)undici
from 5.26.3
to 5.28.3
in /airflow/www
(#37493)3.12
exclusions in providers/pyproject.toml
(#37404)markdown
from core dependencies (#37396)pageSize
method. (#37319)Python 3.11
and 3.12
deprecations (#37478)airflow_pre_installed_providers.txt
into sdist
distribution (#37388)universal-pathlib to < 0.2.0
(#37311)queue_when
(#36997)config.yml
for environment variable sql_alchemy_connect_args
(#36526)Alembic to 1.13.1
(#36928)flask-session
to <0.6
(#36895)CLI
flags available (#37231)otel
config descriptions (#37229)Objectstore
tutorial with prereqs
section (#36983)package/module
names (#36927)__init__
of operators automatically (#33786)bitnami/postgresql
dependency (#34817)The version of bitnami/postgresql
subchart upgraded from 12.10.0
to 13.2.24
.
The version of PostgreSQL
binaries upgraded from 11
to 16.1.0
.
The change requires existing bitnami/postgresql
subchart users to perform manual major version upgrade using pg_dumpall
or pg_upgrade
.
As a reminder, it is recommended to set up an external database <https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#database>
_ in production.
2.8.1
(#36907)The default Airflow image that is used with the Chart is now 2.8.1
, previously it was 2.7.1
.
The PgBouncer and PgBouncer Exporter images are based on newer software/os.
pgbouncer
: 1.21.0 based on alpine 3.14 (airflow-pgbouncer-2024.01.19-1.21.0
)pgbouncer-exporter
: 0.16.0 based on alpine 3.19 (apache/airflow:airflow-pgbouncer-exporter-2024.01.19-0.16.0
)v0.26.0
(#37187)The default StatsD image that is used with the Chart is now v0.26.0
, previously it was v0.22.8
.
7-bookworm
(#37187)The default Redis image that is used with the Chart is now 7-bookworm
, previously it was 7-bullseye
.
securityContexts
in dag processors log groomer sidecar (#34499)securityContexts
in dag processors wait-for-migrations container (#35593)storageClassName
(#35581)volumeClaimTemplate
for worker (#34986)priorityClassName
on Redis pods (#34879)emptyDir
config (#34837)AIRFLOW_HOME
env var with airflowHome
value (#34839)safeToEvict
properly (#35130)useStandardNaming
(#34825)usePgbouncer
is false (#34741)useStandardNaming
(#34787)bitnami/postgresql
subchart to 13.2.24
(#36156)pgbouncer
and pgbouncer-exporter
images with newer versions (#36898)statsd
and redis
chart images (#37187)pendulum
package set to 3 (#36281).Support for pendulum 2.1.2 will be saved for a while, presumably until the next feature version of Airflow. It is advised to upgrade user code to use pendulum 3 as soon as possible.
We standardized Airflow dependency configuration to follow latest development in Python packaging by
using pyproject.toml
. Airflow is now compliant with those accepted PEPs:
PEP-440 Version Identification and Dependency Specification <https://www.python.org/dev/peps/pep-0440/>
__PEP-517 A build-system independent format for source trees <https://www.python.org/dev/peps/pep-0517/>
__PEP-518 Specifying Minimum Build System Requirements for Python Projects <https://www.python.org/dev/peps/pep-0518/>
__PEP-561 Distributing and Packaging Type Information <https://www.python.org/dev/peps/pep-0561/>
__PEP-621 Storing project metadata in pyproject.toml <https://www.python.org/dev/peps/pep-0621/>
__PEP-660 Editable installs for pyproject.toml based builds (wheel based) <https://www.python.org/dev/peps/pep-0660/>
__PEP-685 Comparison of extra names for optional distribution dependencies <https://www.python.org/dev/peps/pep-0685/>
__Also we implement multiple license files support coming from Draft, not yet accepted (but supported by hatchling) PEP:
PEP 639 Improving License Clarity with Better Package Metadata <https://peps.python.org/pep-0639/>
__This has almost no noticeable impact on users if they are using modern Python packaging and development tools, generally
speaking Airflow should behave as it did before when installing it from PyPI and it should be much easier to install
it for development purposes using pip install -e ".[devel]"
.
The differences from the user side are:
-
(following PEP-685) instead of _
and .
(as it was before in some extras). When you install airflow with such extras (for example dbt.core
or
all_dbs
) you should use -
instead of _
and .
.In most modern tools this will work in backwards-compatible way, but in some old version of those tools you might need to
replace _
and .
with -
. You can also get warnings that the extra you are installing does not exist - but usually
this warning is harmless and the extra is installed anyway. It is, however, recommended to change to use -
in extras in your dependency
specifications for all Airflow extras.
Released airflow package does not contain devel
, devel-*
, doc
and doc-gen
extras.
Those extras are only available when you install Airflow from sources in --editable
mode. This is
because those extras are only used for development and documentation building purposes and are not needed
when you install Airflow for production use. Those dependencies had unspecified and varying behaviour for
released packages anyway and you were not supposed to use them in released packages.
The all
and all-*
extras were not always working correctly when installing Airflow using constraints
because they were also considered as development-only dependencies. With this change, those dependencies are
now properly handling constraints and they will install properly with constraints, pulling the right set
of providers and dependencies when constraints are used.
The graphviz
dependency has been problematic as Airflow required dependency - especially for
ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a
limitation, but they also require to install graphviz Python bindings to be build and installed.
This does not work for older Linux installation but - more importantly - when you try to install
Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because
Python bindings compilation for M1 can only work for Python 3.10+.
This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you already have graphviz installed, it will continue working as it did before. The only problem when it does not work is where you do not have graphviz installed it will raise an error and inform that you need it.
Graphviz will remain to be installed for most users:
The only change will be a new installation of new version of Airflow from the scratch, where graphviz will need to be specified as extra or installed separately in order to enable DAG rendering option.
taskinstance
list (#36693)AUTH_ROLE_PUBLIC=admin
(#36750)op
subtypes (#35536)typing.Union
in _infer_multiple_outputs
for Python 3.10+ (#36728)multiple_outputs
is inferred correctly even when using TypedDict
(#36652)Dagrun.update_state
(#36712)EventsTimetable
schedule past events if catchup=False
(#36134)tis_query
in _process_executor_events
(#36655)call_regular_interval
(#36608)DagRun
fails while running dag test
(#36517)_manage_executor_state
by refreshing TIs in batch (#36502)MAX_CONTENT_LENGTH
(#36401)kubernetes
decorator type annotation consistent with operator (#36405)api/dag/*/dagrun
from anonymous user (#36275)DAG.is_fixed_time_schedule
(#36370)httpx
import in file_task_handler for performance (#36753)pyarrow-hotfix
for CVE-2023-47248
(#36697)graphviz
dependency optional (#36647)pandas
dependency to 1.2.5 for all providers and airflow (#36698)/airflow/www
(#36700)docker
decorator type annotations (#36406)batch_is_authorized_dag
to check if user has permission to read DAGs (#36279)numpy
example with practical exercise demonstrating top-level code (#35097)dags.rst
with information on DAG pausing (#36540)metrics.rst
for param dagrun.schedule_delay
(#36404)Raw HTML code in DAG docs and DAG params descriptions is disabled by default
To ensure that no malicious javascript can be injected with DAG descriptions or trigger UI forms by DAG authors
a new parameter webserver.allow_raw_html_descriptions
was added with default value of False
.
If you trust your DAG authors code and want to allow using raw HTML in DAG descriptions and params, you can restore the previous
behavior by setting the configuration value to True
.
To ensure Airflow is secure by default, the raw HTML support in trigger UI has been super-seeded by markdown support via
the description_md
attribute. If you have been using description_html
please migrate to description_md
.
The custom_html_form
is now deprecated. (#35460)
prev_end_date_success
method access (#34528)List Task Instances
view (#34529)clear_number
to track DAG run being cleared (#34126)multiselect
to run state in grid view (#35403)Connection.get_hook
in case of ImportError (#36005)taskinstance
(#35810)AIRFLOW_CONFIG
path (#35818)JSON-string
connection representation generator (#35723)BaseOperatorLink
into the separate module (#35032)cbreak
in execute_interactive
and handle SIGINT
(#35602)synchronize_log_template
function (#35366)BaseOperatorLink.operators
(#35003)SA2-compatible
syntax for TaskReschedule (#33720)EventScheduler
(#34808)update_forward_refs
(#34657)Dataset
from airflow
package in codebase (#34610)airflow.datasets.Dataset
in examples and tests (#34605)version
top-level element from docker compose files (#33831)NOT EXISTS
subquery instead of tuple_not_in_condition
(#33527)triggerer_heartbeat
(#33320)airflow variables export
to print to stdout (#33279)reset_user_sessions
to work from either CLI or web (#36056)overscroll
behaviour to auto (#35717)borderWidthRight
to grid for Firefox scrollbar
(#35346)processor_subdir
in serialized_dag table (#35661)get_dag_by_pickle
util function (#35339)mappedoperator
(#35257)Literal
from typing_extensions
(#33794)4.3.10
(#35991)Connection.to_json_dict
to Connection.to_dict
(#35894)moto
version to >= 4.2.9
(#35687)pyarrow-hotfix
to mitigate CVE-2023-47248 (#35650)axios
from 0.26.0 to 1.6.0
in /airflow/www/
(#35624)navbar_text_color
and rm
condition in style (#35553)dag_next_execution
(#35539)TCH004
and TCH005
rules (#35475)AirflowException
from airflow (#34541)postcss
from 8.4.25 to 8.4.31
in /airflow/www
(#34770)airflow.models.dag.DAG
in examples (#34617)re2
regex engine in the .airflowignore documentation. (#35663)best-practices.rst
(#35692)dag-run.rst
to mention Airflow's support for extended cron syntax through croniter (#35342)webserver.rst
to include information of supported OAuth2 providers (#35237)rst
code block format (#34708)No significant changes.
codemirror
and extra (#35122)get_plugin_info
for class based listeners. (#35022)all_skipped
trigger rule as skipped
if any task is in upstream_failed
state (#34392)pendulum
requirement to <3.0
(#35336)sentry_sdk
to 1.33.0
(#35298)@babel/traverse
from 7.16.0 to 7.23.2
in /airflow/www
(#34988)undici
from 5.19.1 to 5.26.3
in /airflow/www
(#34971)SchedulerJobRunner
(#34810)max_tis per query > parallelism
(#34742)connexion<3.0
upper bound (#35218)< 3.12
(#35123)3.1.0
(#34943)conn.extras
(#35165)mysql-connector-python
from recommended MySQL driver (#34287)set_downstream
example (#35075)airflow_local_settings.py
template (#34826)'>'
in provider section name (#34813)