MediaPipe: On-device machine learning for media
Graph-based framework for streaming media ML pipelines.
Learn more about mediapipe
MediaPipe is a graph-based framework for building machine learning pipelines that process streaming media on mobile, web, desktop, and edge devices. The framework uses a calculator-based architecture where data flows through interconnected nodes as packets, enabling efficient on-device inference without cloud dependencies. Core components include pre-trained models, cross-platform task libraries for Android, iOS, web, and Python, and a customizable framework for building domain-specific pipelines. Common applications include pose estimation, hand tracking, object detection, face recognition, audio classification, and text processing.

Graph-based pipeline architecture
MediaPipe uses a directed acyclic graph model where calculators process packets of data, allowing complex media processing workflows to be composed from reusable components. This design separates pipeline logic from implementation details and enables efficient streaming processing.
Cross-platform deployment
Solutions built with MediaPipe run on Android, iOS, web browsers, desktop systems, and embedded devices with a single codebase. The framework handles platform-specific optimizations while maintaining consistent APIs across targets.
Layered abstraction model
MediaPipe offers both high-level task APIs for common problems and a lower-level framework for custom pipeline development. Users can start with pre-built solutions and progressively customize the underlying graph and calculator implementations as needed.
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Write a product description for wireless earbuds"
)
print(response.text)Adds Android 16KB page size support and updates build tooling to Android SDK/NDK 28 and JDK 21; ARM v7 support in patch 0.10.26.1.
- –Upgrade build environments to Android SDK/NDK 28 and JDK 21 if building from source.
- –Use audio modality in LLM Inference API; ARM v7 (32-bit) requires patch release 0.10.26.1.
Incremental release focused on API3 graph builder foundations and internal refactoring; no breaking changes or new requirements documented.
- –Review API3 graph builder changes if extending MediaPipe internals; default timestamp offset now matches API2 at zero.
- –Note LlmTaskRunner is now internal and Android manifest duplication removed; no action required for standard integrations.
Adds Gemma V2-2B and V3-1B model support via XNNPACK; expands LLM inference APIs across Android, iOS, and Web with vision modality and prompt templates.
- –Enable Gemma V2-2B and V3-1B models using XNNPACK; update LLM bundler to include vision models in .task files for multi-modal processing.
- –Use new async generation cancellation and prompt template support on Android; iOS and Web gain vision modality and improved error messaging.
Related Repositories
Discover similar tools and frameworks used by developers
open_clip
PyTorch library for contrastive language-image pretraining.
lerobot
PyTorch library for robot imitation learning and sim-to-real transfer.
xformers
Memory-efficient PyTorch components for transformer architectures.
deepface
Python library wrapping multiple face recognition deep learning models.
opencv
Cross-platform C++ library for real-time computer vision algorithms.