Navigate:

All ReposPaddleOCR

~$PADDL↑0.6%

PaddleOCR: Optical character recognition and document parsing

Multilingual OCR toolkit with document structure extraction.

LIVE RANKINGS • 12:29 PM • STEADY

TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100

OVERALL

#67

AI & ML

#34

30 DAY RANKING TREND

ovr#67

·AI#34

STARS

71.3K

FORKS

9.9K

7D STARS

+406

7D FORKS

+36

Tags:

AI & ML

See Repo:

Learn more about PaddleOCR

PaddleOCR is an optical character recognition system implemented in Python using the PaddlePaddle deep learning framework. It combines text detection and recognition models to process document images end-to-end, extracting both raw text and structured layout information. The toolkit includes pre-trained models for multiple languages, handwriting detection, and document structure analysis (tables, forms, key-value pairs). Common deployment scenarios include document digitization pipelines, PDF extraction for RAG systems, and integration with language models for document understanding tasks.

Multi-Language Pre-Trained Models

Ships with production-ready models for 100+ languages including CJK, Arabic, and Latin scripts. Eliminates cold-start training and dataset collection for most deployment scenarios.

Modular Detection-Recognition Pipeline

Decouples text localization from character recognition into swappable components. Enables per-region model selection and independent optimization of detection versus recognition accuracy.

Document Structure Extraction

Parses tables, forms, and key-value pairs beyond raw text output. Produces structured JSON suitable for direct ingestion into RAG pipelines or database workflows.

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('invoice.jpg', cls=True)

for line in result[0]:
    text = line[1][0]
    confidence = line[1][1]
    print(f"{text} (confidence: {confidence:.2f})")

vv3.4.0

PaddleOCR-VL-1.5 is a new iterative version of the PaddleOCR-VL series. Based on comprehensive optimization of the core capabilities of version 1.0, t

–Release the PaddleOCR-VL-1.5 complex document parsing solution.**
–Add support for calling MLX-VLM inference services.
–PaddleOCR-VL now supports cross-page table merging and multi-level heading reconstruction.
–PP-StructureV3 adds support for the `formatblockcontent` and `markdownignorelabels` parameters.
–Fixed an issue where accessing the `/docs` endpoint in the official PaddleOCR-VL image would result in an error.

vv3.3.3

PaddleOCR-VL now supports specifying custom model names and API keys, and can seamlessly integrate w

–PaddleOCR-VL now supports specifying custom model names and API keys, and can seamlessly integrate with inference services from third-party platforms such as SiliconFlow and Novita AI.
–The PP-StructureV3 MCP Server supports using hosted services on the Qianfan platform as the underlying inference engine.
–The documentation for PP-OCRv5 and PaddleOCR-VL has been comprehensively improved, with known errors fixed to enhance readability and accuracy.
–Added support for inference on Muxi GPUs, further expanding hardware compatibility and deployment flexibility.
–PaddleOCR-VL 现已支持指定自定义模型名称与 API Key，并可无缝对接硅基流动、Novita AI 等第三方平台的推理服务。

vv3.3.2

2025.11.13 v3.3.2 released Full Changelog: https://github.com/PaddlePaddle/PaddleOCR/compare/v3.3.1.

–2025.11.13 v3.3.2 released Full Changelog: https://github.com/PaddlePaddle/PaddleOCR/compare/v3.3.1...v3.3.2

See how people are using PaddleOCR

Loading tweets...

Top in AI & ML

Trending Repos

Pi Mono

17,222#1

OpenClaw

233,443#2

Zvec

8,089#3

Claude Code

70,649#4

Heretic

9,761#5

See all →

LIVE RANKINGS • 12:29 PM • STEADY

TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100

OVERALL

#67

AI & ML

#34

30 DAY RANKING TREND

ovr#67

·AI#34

STARS

71.3K

FORKS

9.9K

7D STARS

+406

7D FORKS

+36

[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers

PaddleOCR: Optical character recognition and document parsing

Learn more about PaddleOCR

What is PaddleOCR for?

What makes PaddleOCR different?

Multi-Language Pre-Trained Models

Modular Detection-Recognition Pipeline

Document Structure Extraction

Example code snippets

Recent Changes

See how people are using PaddleOCR

Top in AI & ML

Pi Mono

OpenClaw

Claude Code

Heretic

Rowboat

Trending Repos

Pi Mono

OpenClaw

Zvec

Claude Code

Heretic

Related Repositories

PyTorch

LLaMA-Factory

Civitai

StabilityMatrix

Wan2.2

Product

Company

Helpful Links