Tesseract OCR: Open source optical character recognition engine
LSTM-based OCR engine supporting 100+ languages.
Learn more about Tesseract
Tesseract is an open-source optical character recognition engine that converts images containing text into machine-readable character data. The system employs a Long Short-Term Memory neural network architecture as its primary recognition engine, processing text line images through multiple layers that analyze character patterns and linguistic context to produce accurate transcriptions. It maintains a modular design that supports over 100 languages through trained data models, processes standard image formats, and generates structured output in multiple document formats including searchable PDFs and XML-based representations. The engine implements a multi-stage pipeline that performs image preprocessing, layout analysis to detect text regions, line segmentation, and finally character recognition through the neural network. Originally developed at Hewlett-Packard and later maintained by Google, it balances recognition accuracy with processing speed by leveraging both statistical language models and neural network predictions.
Dual Recognition Engines
Includes both LSTM neural network and legacy pattern recognition engines with runtime switching via --oem flag. Enables modern accuracy while maintaining compatibility with older trained models and specialized use cases.
100+ Language Support
Recognizes text in over 100 languages out-of-the-box using pre-trained data files. Custom language training supported through documented pipeline for specialized fonts, domains, or historical scripts.
Multiple Output Formats
Generates plain text, hOCR with positioning data, searchable PDFs, TSV structured output, and PAGE/ALTO XML. Integrates directly into document processing workflows without format conversion layers.
import pytesseract
from PIL import Image
image = Image.open('document.png')
text = pytesseract.image_to_string(image)
print(text)Code simplification and build improvements with cmake optimizations.
- –Simplify code for osdetect
- –Fix and improve configuration for cmake builds
- –Modernize some for loops and fix some signed/unsigned issues
- –Cmake optimization with warp2
Fixed random number generator and improved template classes with CLI enhancements.
- –Fix linear congruential random number generator
- –Make list classes templated
- –Add cli -c parameter(s) to init vectors
- –Handle colormaps correctly
- –Use constexpr for kDawgMagicNumber
Fixed static linking issues and improved installation with better XML output handling.
- –Fix TARGETPDBFILE error for static linking
- –Make regular usage of CMAKE_INSTALL_LIBDIR and GNUInstallDirs
- –Ignore illegal TESSDATA_PREFIX (not existing filesystem entry)
- –Fix confidence output for the PAGE XML renderer
- –Set hOCR capabilities ocr_pdir and ocr_plang unconditionally
Related Repositories
Discover similar tools and frameworks used by developers
ByteTrack
Multi-object tracker associating low-confidence detections across frames.
Stable Diffusion
Text-to-image diffusion in compressed latent space.
Docling
Fast document parser for RAG and AI workflows.
Mask2Former
Unified transformer architecture for multi-task image segmentation.
PaddleOCR
Multilingual OCR toolkit with document structure extraction.