Luigi: Python batch job pipeline orchestration
Build complex batch pipelines with dependency management.
Learn more about Luigi
Luigi is a Python framework for building and orchestrating complex batch job pipelines with built-in dependency management and scheduling capabilities. It works by defining tasks as Python classes that declare their dependencies, outputs, and execution logic, which the framework then resolves into a directed acyclic graph to determine the correct execution order. The framework includes a central scheduler that coordinates task execution across workers, automatically handling task parallelization, failure recovery, and preventing redundant execution of tasks whose outputs already exist. Luigi provides a web-based visualization interface that displays the dependency graph and monitors pipeline execution status in real-time. The framework is designed for long-running batch processes and integrates commonly with data processing systems like Hadoop, supporting both local execution and distributed computing environments.
Declarative Dependency Resolution
Tasks define inputs and outputs as Python objects, allowing Luigi to automatically compute execution order and determine which tasks need to run. Eliminates manual workflow orchestration and prevents redundant task execution.
Atomic File Operations
File system abstractions for local and HDFS storage ensure atomic writes that complete fully or not at all. Prevents pipelines from entering corrupted states when failures occur mid-task, eliminating manual cleanup.
Integrated Hadoop Ecosystem
Built-in templates for MapReduce, Hive, Pig, and Spark jobs with native HDFS support. Run Hadoop workflows without external orchestration layers or custom integration code.
import luigi
class ProcessData(luigi.Task):
def output(self):
return luigi.LocalTarget('output.txt')
def run(self):
with self.output().open('w') as f:
f.write('Processing complete')
luigi.build([ProcessData()], local_scheduler=True)Luigi v3.7.2 focuses on improving type annotations and fixing MyPy compatibility issues for Parameter classes.
- –refactor(parameter): Improve type stubs and standardize argument names
- –fix(mypy): handle Parameter subclasses without 'default'
This patch release fixes the formatting of the built README RST file.
- –fix: correct built readme rst format
Luigi v3.7.0 introduces enhanced type hinting support, improved Prometheus metrics control, and better mypy integration for parameter classes.
- –Fix Task.worker_timeout type annotation.
- –Allow more control over Prometheus metrics collection
- –Fixes: Add support for custom parameter classes in mypy plugin
Top in Data Engineering
Related Repositories
Discover similar tools and frameworks used by developers
n8n
Node-based automation platform with JavaScript and Python scripting.
COVID-19 Data
Archived NYT dataset of coronavirus cases and deaths across U.S. counties and states (2020-2023).
dbt
SQL-based transformation framework for analytics data warehouses.
pdfplumber
Python library for extracting PDF text and tables.
Zvec
Lightweight vector database that embeds directly into applications for similarity search and vector operations.