Navigate:
~$LUIGI0.0%

Luigi: Python batch job pipeline orchestration

Build complex batch pipelines with dependency management.

LIVE RANKINGS • 06:52 AM • STEADY
OVERALL
#257
13
DATA ENGINEERING
#20
1
30 DAY RANKING TREND
ovr#257
·Data#20
STARS
18.6K
FORKS
2.4K
DOWNLOADS
1
7D STARS
+2
7D FORKS
-1
See Repo:
Share:

Learn more about luigi

Luigi is a Python framework for building and orchestrating complex batch job pipelines with built-in dependency management and scheduling capabilities. It works by defining tasks as Python classes that declare their dependencies, outputs, and execution logic, which the framework then resolves into a directed acyclic graph to determine the correct execution order. The framework includes a central scheduler that coordinates task execution across workers, automatically handling task parallelization, failure recovery, and preventing redundant execution of tasks whose outputs already exist. Luigi provides a web-based visualization interface that displays the dependency graph and monitors pipeline execution status in real-time. The framework is designed for long-running batch processes and integrates commonly with data processing systems like Hadoop, supporting both local execution and distributed computing environments.


1

Declarative Dependency Resolution

Tasks define inputs and outputs as Python objects, allowing Luigi to automatically compute execution order and determine which tasks need to run. Eliminates manual workflow orchestration and prevents redundant task execution.

2

Atomic File Operations

File system abstractions for local and HDFS storage ensure atomic writes that complete fully or not at all. Prevents pipelines from entering corrupted states when failures occur mid-task, eliminating manual cleanup.

3

Integrated Hadoop Ecosystem

Built-in templates for MapReduce, Hive, Pig, and Spark jobs with native HDFS support. Run Hadoop workflows without external orchestration layers or custom integration code.


import luigi

class ProcessData(luigi.Task):
    def output(self):
        return luigi.LocalTarget('output.txt')
    
    def run(self):
        with self.output().open('w') as f:
            f.write('Processing complete')

luigi.build([ProcessData()], local_scheduler=True)

vv3.6.0

Drops Python 3.5 and 3.6 support; fixes multiple security issues including sensitive logging, file permissions, and tarfile extraction vulnerabilities.

  • Upgrade to Python 3.7+ before deploying; Python 3.5 and 3.6 are no longer supported.
  • Review pai.py, lock.py, lsf.py, and runner modules for patched security flaws affecting credentials and file handling.
vv3.5.2

Maintenance release updating Azure Blob Storage dependency to 12.x series and fixing batch email configuration documentation.

  • Upgrade azure.storage.blob to 12.x.y if using luigi.contrib.azureblob; verify compatibility with your Azure storage code.
  • Review batch email configuration docs for corrections; release notes do not specify other breaking changes or security fixes.
v3.5.1

Maintenance release adding Python 3.12 support and fixing parameter handling, error messages, and SVG visualization bugs.

  • Upgrade to Python 3.12 if needed; this release officially supports it alongside existing versions.
  • Review TupleParameter usage; str-to-tuple conversion bug is fixed, and optional parameter execution summaries now display correctly.


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers