OpenVINO: Toolkit for optimizing and deploying AI inference
Convert and deploy deep learning models across Intel hardware.
Learn more about OpenVINO
OpenVINO is a toolkit designed to optimize and deploy deep learning models for inference workloads. It accepts trained models from frameworks including PyTorch, TensorFlow, ONNX, Keras, PaddlePaddle, and JAX/Flax, converting them to an optimized intermediate representation. The toolkit includes runtime components that execute inference on diverse hardware including x86 and ARM CPUs, Intel integrated and discrete GPUs, and Intel NPU accelerators. Common deployment scenarios include computer vision tasks, natural language processing with large language models, generative AI applications, speech recognition, and recommendation systems.
Multi-Framework Model Support
Accepts models from PyTorch, TensorFlow, ONNX, Keras, PaddlePaddle, and JAX/Flax without requiring original training frameworks. Direct Hugging Face Hub integration through Optimum Intel eliminates conversion dependencies.
Heterogeneous Hardware Targeting
Single optimized model runs across x86/ARM CPUs, Intel integrated and discrete GPUs, and NPU accelerators. Runtime hardware selection enables deployment flexibility without recompiling or maintaining separate model variants.
Optimized Intermediate Representation
Converts models from various frameworks into a unified intermediate representation optimized for Intel hardware. This abstraction layer enables hardware-specific optimizations while maintaining model portability across CPU, GPU, and VPU accelerators.
import openvino as ov
core = ov.Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "CPU")
input_data = np.random.rand(1, 3, 224, 224)
result = compiled_model(input_data)
print(result[0].shape)Bug fix release with continued MoE model preview support for CPUs and GPUs
- –Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for GPT-OSS 20B model
New model support including Qwen3-Embedding, Mistral-Small-24B, and MoE model preview
- –On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B, Mistral-Small-24B-Instruct-2501
- –On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct
- –Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for Qwen3-30B-A3B
New model support for Phi-4, Gemma-3 series, and NPU optimization for Qwen3 models
- –New models supported: Phi-4-mini-reasoning, AFM-4.5B, Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B
- –NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B
- –LLMs optimized for NPU now available on OpenVINO Hugging Face collection
- –Preview: Intel Core Ultra Processor and Windows-based AI PCs can now leverage the OpenVINO Execution Provider for Windows ML
See how people are using OpenVINO
Related Repositories
Discover similar tools and frameworks used by developers
Crush
LLM-powered coding agent with LSP and MCP integration.
SGLang
High-performance inference engine for LLMs and VLMs.
Segment Anything
Transformer-based promptable segmentation with zero-shot generalization.
Civitai
Community platform for sharing Stable Diffusion models, embeddings, and AI generation assets.
xFormers
Memory-efficient PyTorch components for transformer architectures.