Apache Airflow: Workflow orchestration and scheduling platform
Python platform for DAG-based task orchestration and scheduling.
Learn more about Apache Airflow
Apache Airflow is a workflow orchestration platform written in Python that manages complex data pipelines and task dependencies. It represents workflows as DAGs, where each node is a task and edges define dependencies between tasks. The platform includes a scheduler that triggers task execution based on defined intervals or external events, a web UI for monitoring and management, and an executor system that can run tasks locally or distribute them across multiple workers. Airflow is commonly deployed in data engineering, ETL/ELT operations, machine learning pipelines, and general automation scenarios where task coordination and observability are required.
Python-Based DAG Definition
Workflows are Python code, enabling version control, unit testing, and programmatic task generation. Dependencies and conditional logic are expressed directly in code rather than GUI configuration or static YAML files.
Pluggable Executor Backends
Single codebase deploys across execution environments from local processes to Kubernetes clusters or Celery workers. Switch executors without rewriting workflows, scaling from development laptops to distributed production systems.
Provider Package Ecosystem
Extends core functionality through provider packages for AWS, GCP, Azure, databases, and third-party services. Install only the integrations needed, keeping the base installation lean.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG(
dag_id='data_pipeline',
start_date=datetime(2024, 1, 1),
schedule='@daily'
) as dag:
extract = BashOperator(task_id='extract', bash_command='echo "Extracting data"')
transform = BashOperator(task_id='transform', bash_command='echo "Transforming data"')
extract >> transformBug fix release addressing DAG processing, UI functionality, API permissions, and scheduler stability issues.
- –Fix JWT token generation with unset issuer/audience config
- –Fix callback files losing priority during queue resort
- –Fix Dag callback for versioned bundles in the processor
- –Add 404 handling for non-existent Dag
- –Add guardrail to handle Dag deserialization errors in scheduler
Bug fix release adding HITL task authorization, masking proxy configurations, and resolving database issues.
- –Protect against hanging thread in aiosqlite 0.22+
- –Fix log task instance sqlalchemy join query
- –Fix invalid uri created when extras contains non string elements
- –Fix operator template fields via callable serialization causing unstable DAG serialization
- –Fix real-time extra links updates for TriggerDagRunOperator
Improves CLI consistency with positional commands and adds team support for connections and variables.
- –Make pause/unpause commands positional for improved CLI consistency
- –Remove deprecated export functionality from airflowctl
- –Add team_name to connection commands
- –Add team_id to variable commands
- –Add pre-commit checks for airflowctl test coverage
See how people are using Apache Airflow
Top in Data Engineering
Related Repositories
Discover similar tools and frameworks used by developers
COVID-19 Data
Archived NYT dataset of coronavirus cases and deaths across U.S. counties and states (2020-2023).
Flyway
Version-controlled SQL migrations with automated execution tracking.
Neo4j
Open-source graph database storing data as nodes and relationships with Cypher query language.
PostHog
Event tracking, analytics, and experimentation platform.
Fiona
Python library for reading and writing geographic data files like GeoPackage and Shapefile.