Paperless-ngx: Document management system with OCR
Self-hosted OCR document archive with ML classification.
Learn more about paperless-ngx
Paperless-ngx is a self-hosted document management system built with Django backend and Angular frontend. It processes scanned documents and PDFs through OCR pipelines to extract text and metadata, storing indexed documents in a searchable database. The system supports document classification through machine learning models, custom tagging schemes, and metadata assignment. It is typically deployed using Docker Compose and can be integrated with document scanners for automated ingestion workflows.
Full-Text OCR Search
Documents are processed through OCR pipelines to extract and index all text content. Enables searching across document bodies, not just filenames or manual metadata, making large archives instantly queryable.
Docker-First Deployment
Ships with Docker Compose configurations for self-hosted deployment with minimal setup. Includes automated migration paths from legacy Paperless versions and supports multiple installation methods.
Community-Maintained Fork
Actively developed by an open community after the original project ended maintenance. Regular releases incorporate user contributions, security patches, and feature requests through collaborative governance.
import requests
url = "http://localhost:8000/api/documents/post_document/"
files = {"document": open("invoice.pdf", "rb")}
headers = {"Authorization": "Token YOUR_API_TOKEN"}
response = requests.post(url, files=files, headers=headers)
print(f"Document uploaded: {response.json()['id']}")Patch release fixing custom field query propagation and change detection; no breaking changes or new requirements.
- –Apply this release to resolve issues with custom field queries not propagating correctly in the UI.
- –Docker base image bumps uv from 0.9.4 to 0.9.7; no action required unless pinning specific versions.
Maintenance release fixing workflow email attachments, UI rendering bugs, and improving logs/tag-list performance.
- –Workflow emails now attach the original file instead of processed versions when triggered.
- –Virtual scrolling re-enabled for large lists; Firefox print dialog errors and tag filter duplicates resolved.
Patch release fixing migration warnings from 2.19.2, email attachment handling, and UI bugs in tags and workflows.
- –Upgrade to resolve migration warning introduced in 2.19.2 and fix email endpoint permission checks.
- –Django bumped to 5.2.7; workflow runs now use soft-delete model for better data retention.
See how people are using paperless-ngx
Related Repositories
Discover similar tools and frameworks used by developers
TTS
PyTorch toolkit for deep learning text-to-speech synthesis.
StabilityMatrix
Multi-backend inference UI manager with embedded dependencies.
cai
LLM-powered Python framework for automated penetration testing workflows.
CodeFormer
Transformer-based face restoration using vector-quantized codebook lookup.
pix2pix
Torch implementation for paired image-to-image translation using cGANs.