Navigate:
Apache Airflow
~$AIRFL0.3%

Apache Airflow: Workflow orchestration and scheduling platform

Python platform for DAG-based task orchestration and scheduling.

LIVE RANKINGS • 10:20 AM • STEADY
OVERALL
#163
3
DATA ENGINEERING
#6
2
30 DAY RANKING TREND
ovr#163
·Data#6
STARS
44.4K
FORKS
16.6K
7D STARS
+140
7D FORKS
+65
See Repo:
Share:

Learn more about Apache Airflow

Apache Airflow is a workflow orchestration platform written in Python that manages complex data pipelines and task dependencies. It represents workflows as DAGs, where each node is a task and edges define dependencies between tasks. The platform includes a scheduler that triggers task execution based on defined intervals or external events, a web UI for monitoring and management, and an executor system that can run tasks locally or distribute them across multiple workers. Airflow is commonly deployed in data engineering, ETL/ELT operations, machine learning pipelines, and general automation scenarios where task coordination and observability are required.

Apache Airflow

1

Python-Based DAG Definition

Workflows are Python code, enabling version control, unit testing, and programmatic task generation. Dependencies and conditional logic are expressed directly in code rather than GUI configuration or static YAML files.

2

Pluggable Executor Backends

Single codebase deploys across execution environments from local processes to Kubernetes clusters or Celery workers. Switch executors without rewriting workflows, scaling from development laptops to distributed production systems.

3

Provider Package Ecosystem

Extends core functionality through provider packages for AWS, GCP, Azure, databases, and third-party services. Install only the integrations needed, keeping the base installation lean.


from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id='data_pipeline',
    start_date=datetime(2024, 1, 1),
    schedule='@daily'
) as dag:
    extract = BashOperator(task_id='extract', bash_command='echo "Extracting data"')
    transform = BashOperator(task_id='transform', bash_command='echo "Transforming data"')
    
    extract >> transform


v3.1.7

Bug fix release addressing DAG processing, UI functionality, API permissions, and scheduler stability issues.

  • Fix JWT token generation with unset issuer/audience config
  • Fix callback files losing priority during queue resort
  • Fix Dag callback for versioned bundles in the processor
  • Add 404 handling for non-existent Dag
  • Add guardrail to handle Dag deserialization errors in scheduler
v3.1.6

Bug fix release adding HITL task authorization, masking proxy configurations, and resolving database issues.

  • Protect against hanging thread in aiosqlite 0.22+
  • Fix log task instance sqlalchemy join query
  • Fix invalid uri created when extras contains non string elements
  • Fix operator template fields via callable serialization causing unstable DAG serialization
  • Fix real-time extra links updates for TriggerDagRunOperator
vairflow-ctl/0.1.1

Improves CLI consistency with positional commands and adds team support for connections and variables.

  • Make pause/unpause commands positional for improved CLI consistency
  • Remove deprecated export functionality from airflowctl
  • Add team_name to connection commands
  • Add team_id to variable commands
  • Add pre-commit checks for airflowctl test coverage

See how people are using Apache Airflow

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers