Tesseract OCR: Open source optical character recognition engine
LSTM-based OCR engine supporting 100+ languages.
Learn more about Tesseract
Tesseract is an open-source optical character recognition engine that converts images containing text into machine-readable character data. The system employs a Long Short-Term Memory neural network architecture as its primary recognition engine, processing text line images through multiple layers that analyze character patterns and linguistic context to produce accurate transcriptions. It maintains a modular design that supports over 100 languages through trained data models, processes standard image formats, and generates structured output in multiple document formats including searchable PDFs and XML-based representations. The engine implements a multi-stage pipeline that performs image preprocessing, layout analysis to detect text regions, line segmentation, and finally character recognition through the neural network. Originally developed at Hewlett-Packard and later maintained by Google, it balances recognition accuracy with processing speed by leveraging both statistical language models and neural network predictions.
Dual Recognition Engines
Includes both LSTM neural network and legacy pattern recognition engines with runtime switching via --oem flag. Enables modern accuracy while maintaining compatibility with older trained models and specialized use cases.
100+ Language Support
Recognizes text in over 100 languages out-of-the-box using pre-trained data files. Custom language training supported through documented pipeline for specialized fonts, domains, or historical scripts.
Multiple Output Formats
Generates plain text, hOCR with positioning data, searchable PDFs, TSV structured output, and PAGE/ALTO XML. Integrates directly into document processing workflows without format conversion layers.
import pytesseract
from PIL import Image
image = Image.open('document.png')
text = pytesseract.image_to_string(image)
print(text)Code simplification and build improvements with cmake optimizations.
- –Simplify code for osdetect
- –Fix and improve configuration for cmake builds
- –Modernize some for loops and fix some signed/unsigned issues
- –Cmake optimization with warp2
Fixed random number generator and improved template classes with CLI enhancements.
- –Fix linear congruential random number generator
- –Make list classes templated
- –Add cli -c parameter(s) to init vectors
- –Handle colormaps correctly
- –Use constexpr for kDawgMagicNumber
Fixed static linking issues and improved installation with better XML output handling.
- –Fix TARGETPDBFILE error for static linking
- –Make regular usage of CMAKE_INSTALL_LIBDIR and GNUInstallDirs
- –Ignore illegal TESSDATA_PREFIX (not existing filesystem entry)
- –Fix confidence output for the PAGE XML renderer
- –Set hOCR capabilities ocr_pdir and ocr_plang unconditionally
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
ALLWEONE Presentation AI
AI-powered slide generator with multi-model integration and themes.
tiktoken
Fast BPE tokenizer for OpenAI language models.
Megatron-LM
Library for training large transformer models with distributed computing and GPU-optimized building blocks.
DINOv2
PyTorch vision transformers pretrained on 142M unlabeled images.
OpenAI Python
Type-safe Python client for OpenAI's REST API.