Navigate:
~$LUIGI0.1%

Luigi: Python batch job pipeline orchestration

Build complex batch pipelines with dependency management.

LIVE RANKINGS • 10:20 AM • STEADY
OVERALL
#361
74
DATA ENGINEERING
#13
30 DAY RANKING TREND
ovr#361
·Data#13
STARS
18.7K
FORKS
2.5K
7D STARS
+14
7D FORKS
+1
See Repo:
Share:

Learn more about Luigi

Luigi is a Python framework for building and orchestrating complex batch job pipelines with built-in dependency management and scheduling capabilities. It works by defining tasks as Python classes that declare their dependencies, outputs, and execution logic, which the framework then resolves into a directed acyclic graph to determine the correct execution order. The framework includes a central scheduler that coordinates task execution across workers, automatically handling task parallelization, failure recovery, and preventing redundant execution of tasks whose outputs already exist. Luigi provides a web-based visualization interface that displays the dependency graph and monitors pipeline execution status in real-time. The framework is designed for long-running batch processes and integrates commonly with data processing systems like Hadoop, supporting both local execution and distributed computing environments.

Luigi

1

Declarative Dependency Resolution

Tasks define inputs and outputs as Python objects, allowing Luigi to automatically compute execution order and determine which tasks need to run. Eliminates manual workflow orchestration and prevents redundant task execution.

2

Atomic File Operations

File system abstractions for local and HDFS storage ensure atomic writes that complete fully or not at all. Prevents pipelines from entering corrupted states when failures occur mid-task, eliminating manual cleanup.

3

Integrated Hadoop Ecosystem

Built-in templates for MapReduce, Hive, Pig, and Spark jobs with native HDFS support. Run Hadoop workflows without external orchestration layers or custom integration code.


import luigi

class ProcessData(luigi.Task):
    def output(self):
        return luigi.LocalTarget('output.txt')
    
    def run(self):
        with self.output().open('w') as f:
            f.write('Processing complete')

luigi.build([ProcessData()], local_scheduler=True)

vv3.7.2

Luigi v3.7.2 focuses on improving type annotations and fixing MyPy compatibility issues for Parameter classes.

  • refactor(parameter): Improve type stubs and standardize argument names
  • fix(mypy): handle Parameter subclasses without 'default'
vv3.7.1

This patch release fixes the formatting of the built README RST file.

  • fix: correct built readme rst format
vv3.7.0

Luigi v3.7.0 introduces enhanced type hinting support, improved Prometheus metrics control, and better mypy integration for parameter classes.

  • Fix Task.worker_timeout type annotation.
  • Allow more control over Prometheus metrics collection
  • Fixes: Add support for custom parameter classes in mypy plugin


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers