Paperless-ngx: Document management system with OCR
Self-hosted OCR document archive with ML classification.
Learn more about Paperless-ngx
Paperless-ngx is a self-hosted document management system built with Django backend and Angular frontend. It processes scanned documents and PDFs through OCR pipelines to extract text and metadata, storing indexed documents in a searchable database. The system supports document classification through machine learning models, custom tagging schemes, and metadata assignment. It is typically deployed using Docker Compose and can be integrated with document scanners for automated ingestion workflows.
Full-Text OCR Search
Documents are processed through OCR pipelines to extract and index all text content. Enables searching across document bodies, not just filenames or manual metadata, making large archives instantly queryable.
Docker-First Deployment
Ships with Docker Compose configurations for self-hosted deployment with minimal setup. Includes automated migration paths from legacy Paperless versions and supports multiple installation methods.
Community-Maintained Fork
Actively developed by an open community after the original project ended maintenance. Regular releases incorporate user contributions, security patches, and feature requests through collaborative governance.
import requests
url = "http://localhost:8000/api/documents/post_document/"
files = {"document": open("invoice.pdf", "rb")}
headers = {"Authorization": "Token YOUR_API_TOKEN"}
response = requests.post(url, files=files, headers=headers)
print(f"Document uploaded: {response.json()['id']}")Security fixes and improvements for document management, tagging performance, and date calculations.
- –Fix: extract all ids for nested tags
- –Fix: prevent note deletion outside doc
- –Performance: improve treenode inefficiencies
- –Fix: change date calculation for 'this year' to include future documents
- –Fix: Running management scripts under rootless could fail
Fixes UI display issues with tag names and workflow action ordering.
- –Fix: ensure horizontal scroll for long tag names in list, wrap tags without parent
- –Fix: use explicit order field for workflow actions
Security fixes and improvements for metadata handling, database validation, and workflow functionality.
- –Fix: propagate metadata override created value
- –Fix: support ordering by storage path name
- –Fix: validate cf integer values within PostgreSQL range
- –Fix: add error handling and retry when opening index
- –Fix: fix recurring workflow to respect latest run time
See how people are using Paperless-ngx
Related Repositories
Discover similar tools and frameworks used by developers
LeRobot
PyTorch library for robot imitation learning and sim-to-real transfer.
Kimi-K2
Trillion-parameter MoE model with Muon-optimized training.
MediaPipe
Graph-based framework for streaming media ML pipelines.
OpenAI Python
Type-safe Python client for OpenAI's REST API.
LLaMA-Factory
Parameter-efficient fine-tuning framework for 100+ LLMs.