Apache Airflow: Workflow orchestration and scheduling platform
Python platform for DAG-based task orchestration and scheduling.
Learn more about airflow
Apache Airflow is a workflow orchestration platform written in Python that manages complex data pipelines and task dependencies. It represents workflows as DAGs, where each node is a task and edges define dependencies between tasks. The platform includes a scheduler that triggers task execution based on defined intervals or external events, a web UI for monitoring and management, and an executor system that can run tasks locally or distribute them across multiple workers. Airflow is commonly deployed in data engineering, ETL/ELT operations, machine learning pipelines, and general automation scenarios where task coordination and observability are required.
Python-Based DAG Definition
Workflows are Python code, enabling version control, unit testing, and programmatic task generation. Dependencies and conditional logic are expressed directly in code rather than GUI configuration or static YAML files.
Pluggable Executor Backends
Single codebase deploys across execution environments from local processes to Kubernetes clusters or Celery workers. Switch executors without rewriting workflows, scaling from development laptops to distributed production systems.
Provider Package Ecosystem
Extends core functionality through provider packages for AWS, GCP, Azure, databases, and third-party services. Install only the integrations needed, keeping the base installation lean.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG(
dag_id='data_pipeline',
start_date=datetime(2024, 1, 1),
schedule='@daily'
) as dag:
extract = BashOperator(task_id='extract', bash_command='echo "Extracting data"')
transform = BashOperator(task_id='transform', bash_command='echo "Transforming data"')
extract >> transformApache Airflow 3.1.6
- –Protect against hanging thread in aiosqlite 0.22+ (#60217) (#60245)
- –Fix log task instance sqlalchemy join query (#59973) (#60222)
- –Fix invalid uri created when extras contains non string elements (#59339) (#60219)
- –Fix operator template fields via callable serialization that causes unstable DAG serialization (#60065) (#60221)
- –Fix real-time extra links updates for TriggerDagRunOperator (#59507) (#60225)
Apache Airflow Ctl (airflowctl) 0.1.1
- –Make pause/unpause commands positional for improved CLI consistency (#59936)
- –Remove deprecated export functionality from airflowctl (#59850)
- –Add ``team_name`` to connection commands (#59336)
- –Add ``team_id`` to variable commands (#57102)
- –Add pre-commit checks for airflowctl test coverage (#58856)
Apache Airflow 3.1.5
- –📦 PyPI: https://pypi.org/project/apache-airflow/3.1.5/ 📚 Docs: https://airflow.apache.org/docs/apache-airflow/3.1.5/ 📚 Task SDK Docs: https://airflow.apache.org/docs/task-sdk/1.1.5/ 🛠 Release Notes: https://airflow.apache.org/docs/apache-airflow/3.1.5/release_notes.html 🐳 Docker Image: "docker pull apache/airflow:3.1.5" 🚏 Constraints: https://github.com/apache/airflow/tree/constraints-3.1.5 Significant Changes No significant changes
- –Bug Fixes Handle invalid token in JWT
See how people are using airflow
Top in Data Engineering
Related Repositories
Discover similar tools and frameworks used by developers
pandas
Labeled data structures for tabular data analysis.
ClickHouse
Column-oriented database for real-time analytics with SQL support and distributed computing capabilities.
Fiona
Python library for reading and writing geographic data files like GeoPackage and Shapefile.
posthog
Event tracking, analytics, and experimentation platform.
luigi
Build complex batch pipelines with dependency management.