MediaPipe: On-device machine learning for media
Graph-based framework for streaming media ML pipelines.
Learn more about MediaPipe
MediaPipe is a graph-based framework for building machine learning pipelines that process streaming media on mobile, web, desktop, and edge devices. The framework uses a calculator-based architecture where data flows through interconnected nodes as packets, enabling efficient on-device inference without cloud dependencies. Core components include pre-trained models, cross-platform task libraries for Android, iOS, web, and Python, and a customizable framework for building domain-specific pipelines. Common applications include pose estimation, hand tracking, object detection, face recognition, audio classification, and text processing.

Graph-based pipeline architecture
MediaPipe uses a directed acyclic graph model where calculators process packets of data, allowing complex media processing workflows to be composed from reusable components. This design separates pipeline logic from implementation details and enables efficient streaming processing.
Cross-platform deployment
Solutions built with MediaPipe run on Android, iOS, web browsers, desktop systems, and embedded devices with a single codebase. The framework handles platform-specific optimizations while maintaining consistent APIs across targets.
Layered abstraction model
MediaPipe offers both high-level task APIs for common problems and a lower-level framework for custom pipeline development. Users can start with pre-built solutions and progressively customize the underlying graph and calculator implementations as needed.
import cv2
import mediapipe as mp
# Initialize MediaPipe hands
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
# Create hands detector
hands = mp_hands.Hands(
static_image_mode=False,
max_num_hands=2,
min_detection_confidence=0.5,
min_tracking_confidence=0.5
)
# Initialize webcam
cap = cv2.VideoCapture(0)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Convert BGR to RGB
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# Process the frame
results = hands.process(rgb_frame)
# Draw hand landmarks
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)
cv2.imshow('Hand Tracking', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()This release focuses on migrating components to MediaPipe API3, updating build dependencies including Bazel and Protobuf, and adding new platform support.
- –Enables ml drift metal delegate as inference calculator backend
- –Added support for armv7 (32 bit) in mediapipe tasks
- –Fixed canvas BGRA assumption in RenderToWebGpuCanvas
- –Fixed sampling logic in ImageToTensorConverterWebGpu
- –Migrated GlShaderCalculator to API3
MediaPipe v0.10.26 adds Android 16kb page size support and includes numerous Bazel improvements with API migrations and calculator updates.
- –All the latest Android packages from Google Maven are now supporting the Android 16kb page size
- –Added support for ARM v7 CPUs (32-bit) in version 0.10.26.1
- –Introduced new variant of TFLiteModelLoader::LoadFromPath that allows to specify the mmap mode
- –Added DefaultSidePacketCalculator unit test under calculators/core
This release focuses on API3 development with graph builder improvements, Java/Kotlin ThreadPool support, and web LLM inference enhancements.
- –Made ThreadPoolExecutorOptions callable from Java/Kotlin
- –Added contract validator for API3
- –Extracted reusable part of API2 graph builder for API3
See how people are using MediaPipe
Related Repositories
Discover similar tools and frameworks used by developers
LivePortrait
PyTorch implementation for animating portraits by transferring expressions from driving videos.
Awesome Nano Banana
Curated collection of images and prompts from Google's Gemini-2.5-Flash-Image model with model comparisons.
AI-Trader
LLM agent benchmarking framework for autonomous market trading.
Megatron-LM
Library for training large transformer models with distributed computing and GPU-optimized building blocks.
Codex CLI
OpenAI's command-line coding assistant that runs locally with ChatGPT integration for terminal use.