OpenVINO: Toolkit for optimizing and deploying AI inference
Convert and deploy deep learning models across Intel hardware.
Learn more about OpenVINO
OpenVINO is a toolkit designed to optimize and deploy deep learning models for inference workloads. It accepts trained models from frameworks including PyTorch, TensorFlow, ONNX, Keras, PaddlePaddle, and JAX/Flax, converting them to an optimized intermediate representation. The toolkit includes runtime components that execute inference on diverse hardware including x86 and ARM CPUs, Intel integrated and discrete GPUs, and Intel NPU accelerators. Common deployment scenarios include computer vision tasks, natural language processing with large language models, generative AI applications, speech recognition, and recommendation systems.
Multi-Framework Model Support
Accepts models from PyTorch, TensorFlow, ONNX, Keras, PaddlePaddle, and JAX/Flax without requiring original training frameworks. Direct Hugging Face Hub integration through Optimum Intel eliminates conversion dependencies.
Heterogeneous Hardware Targeting
Single optimized model runs across x86/ARM CPUs, Intel integrated and discrete GPUs, and NPU accelerators. Runtime hardware selection enables deployment flexibility without recompiling or maintaining separate model variants.
Optimized Intermediate Representation
Converts models from various frameworks into a unified intermediate representation optimized for Intel hardware. This abstraction layer enables hardware-specific optimizations while maintaining model portability across CPU, GPU, and VPU accelerators.
import openvino as ov
core = ov.Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "CPU")
input_data = np.random.rand(1, 3, 224, 224)
result = compiled_model(input_data)
print(result[0].shape)Bug fix release with continued MoE model preview support for CPUs and GPUs
- –Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for GPT-OSS 20B model
New model support including Qwen3-Embedding, Mistral-Small-24B, and MoE model preview
- –On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B, Mistral-Small-24B-Instruct-2501
- –On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct
- –Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for Qwen3-30B-A3B
New model support for Phi-4, Gemma-3 series, and NPU optimization for Qwen3 models
- –New models supported: Phi-4-mini-reasoning, AFM-4.5B, Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B
- –NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B
- –LLMs optimized for NPU now available on OpenVINO Hugging Face collection
- –Preview: Intel Core Ultra Processor and Windows-based AI PCs can now leverage the OpenVINO Execution Provider for Windows ML
See how people are using OpenVINO
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
LLaMA-Factory
Parameter-efficient fine-tuning framework for 100+ LLMs.
Qwen
Alibaba Cloud's pretrained LLMs supporting Chinese/English with up to 32K context length.
DALL-E
Official PyTorch package implementing the discrete VAE component for image tokenization used in OpenAI's DALL-E system.
Civitai
Community platform for sharing Stable Diffusion models, embeddings, and AI generation assets.
vLLM
Fast, memory-efficient LLM inference engine with PagedAttention for production deployments at scale.