Navigate:

All ReposDocling

~$DOCLI3.2%

Docling: Document parsing for generative AI

Fast document parser for RAG and AI workflows.

LIVE RANKINGS • 12:26 AM

TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50

OVERALL

#32

AI & ML

#23

30 DAY RANKING TREND

STARS

58.3K

FORKS

4.0K

7D STARS

+1.8K

7D FORKS

+158

Tags:

AI & ML

See Repo:

Learn more about Docling

Docling is a Python library for parsing and converting documents across multiple formats into structured representations suitable for AI applications. It uses layout analysis models and OCR to extract content from PDFs, scanned documents, and other file types, producing a unified DoclingDocument format. The tool supports local execution for sensitive data processing and includes integrations with frameworks like LangChain, LlamaIndex, and Haystack. Common deployment contexts include document preprocessing pipelines for retrieval-augmented generation, knowledge extraction workflows, and document conversion services.

Multi-Format Processing Pipeline

Handles PDF, DOCX, PPTX, XLSX, HTML, audio, and image formats through a single unified interface. Includes format-specific optimizations like advanced PDF layout analysis and OCR for scanned documents without switching tools.

Unified Document Structure

Converts all input formats into a consistent DoclingDocument representation with standardized extraction APIs. Exports to Markdown, HTML, JSON, or DocTags without format-specific parsing logic.

Local-First Architecture

Processes all documents locally without sending data to external services. Sensitive documents remain on-premises throughout the parsing pipeline, meeting compliance requirements for regulated industries and privacy-conscious organizations.

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("document.pdf")

# Export to markdown
markdown_text = result.document.export_to_markdown()
print(markdown_text)

vv2.72.0

Add chart extraction models and improve Excel table bounds detection

–Add chart extraction models
–backend: Improve Excel table bounds detection and flatten merged cells
–pptx: Handle picture shapes with external image references
–Add granite vision for charts

vv2.71.0

Add WebVTT support, Word document comments extraction, and Ollama presets

–Webvtt and source tracker
–Add support for Word document comments extraction
–Allow newer typer versions
–rapidocr: Use new model links for RapidOCR
–Presets for ollama

vv2.70.0

Drop Python 3.9 support and improve PPTX parsing with comprehensive documentation updates

–Drop support for Python 3.9
–md: Handle pipe symbols that are not table markers
–Remove direct vllm dependency
–PPTX parsing: bullet points not grouped correctly under subheadings
–Add comprehensive docstrings to PdfPipelineOptions

See how people are using Docling

Loading tweets...

Top in AI & ML

Trending Repos

Claude Code

116,661#1

Pi Mono

38,370#2

Goose

42,955#3

WiFi DensePose

48,951#4

Codex CLI

76,838#5

See all →

LIVE RANKINGS • 12:26 AM

TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50

OVERALL

#32

AI & ML

#23

30 DAY RANKING TREND

STARS

58.3K

FORKS

4.0K

7D STARS

+1.8K

7D FORKS

+158

[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers

Docling: Document parsing for generative AI

Learn more about Docling

What is Docling for?

What makes Docling different?

Multi-Format Processing Pipeline

Unified Document Structure

Local-First Architecture

Example code snippets

From YouTube

Recent Changes

See how people are using Docling

Top in AI & ML

Claude Code

Pi Mono

Goose

WiFi DensePose

Codex CLI

Trending Repos

Claude Code

Pi Mono

Goose

WiFi DensePose

Codex CLI

Related Repositories

StabilityMatrix

OpenVINO

Civitai

Text Generation WebUI

Pi Mono