Apache Airflow: Workflow orchestration and scheduling platform
Python platform for DAG-based task orchestration and scheduling.
Learn more about airflow
Apache Airflow is a workflow orchestration platform written in Python that manages complex data pipelines and task dependencies. It represents workflows as DAGs, where each node is a task and edges define dependencies between tasks. The platform includes a scheduler that triggers task execution based on defined intervals or external events, a web UI for monitoring and management, and an executor system that can run tasks locally or distribute them across multiple workers. Airflow is commonly deployed in data engineering, ETL/ELT operations, machine learning pipelines, and general automation scenarios where task coordination and observability are required.
Python-Based DAG Definition
Workflows are Python code, enabling version control, unit testing, and programmatic task generation. Dependencies and conditional logic are expressed directly in code rather than GUI configuration or static YAML files.
Pluggable Executor Backends
Single codebase deploys across execution environments from local processes to Kubernetes clusters or Celery workers. Switch executors without rewriting workflows, scaling from development laptops to distributed production systems.
Provider Package Ecosystem
Extends core functionality through provider packages for AWS, GCP, Azure, databases, and third-party services. Install only the integrations needed, keeping the base installation lean.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG(
dag_id='data_pipeline',
start_date=datetime(2024, 1, 1),
schedule='@daily'
) as dag:
extract = BashOperator(task_id='extract', bash_command='echo "Extracting data"')
transform = BashOperator(task_id='transform', bash_command='echo "Transforming data"')
extract >> transformPatch release fixing 30+ bugs including DAG processor crashes, UI performance with large DAG counts, memory leaks in API client, and migration issues from Airflow 2 to 3.
- –Upgrade resolves DAG processor crashes on MySQL tag renames, bundle callback conflicts, and triggerer errors post-migration.
- –Apply fixes for memory leak from SSL context creation, n+1 query issues in XCom/Event APIs, and task retry failures after external kills.
Patch release fixing critical scheduler and DAG processor crashes during 3.0→3.1 upgrades, plus memory leaks in remote logging.
- –Upgrade immediately if migrating from 3.0 or earlier to avoid scheduler crashes with NULL dag_run.conf or None retry_delay.
- –Apply fix for remote logging memory leak and DAG processor crash when pre-import module optimization is enabled.
Adds Python 3.13 and SQLAlchemy 2.0 support, introduces HITL operators for human-in-the-loop workflows, and ships React plugin system (AIP-68) with UI internationalization for 16 languages.
- –Upgrade to Python 3.13 and SQLAlchemy 2.0; psycopg3 driver now supported for Postgres connections.
- –Use new HITLOperator/ApprovalOperator for human decision points; configure deadline alerts (AIP-86) for proactive DAG monitoring.
See how people are using airflow
Top in Data Engineering
Related Repositories
Discover similar tools and frameworks used by developers
HikariCP
Zero-overhead JDBC connection pooling for production Java applications.
dbt-core
SQL-based transformation framework for analytics data warehouses.
docling
Fast document parser for RAG and AI workflows.
pdfplumber
Python library for extracting PDF text and tables.
patroni
Automates PostgreSQL failover using distributed consensus systems.