ONNX Runtime: ML inference and training accelerator
Cross-platform engine for optimized ONNX model execution.
Learn more about onnxruntime
ONNX Runtime is a runtime engine that executes machine learning models in the ONNX format across different platforms and hardware configurations. It accepts models from frameworks like PyTorch, TensorFlow, and scikit-learn, converting them to an intermediate representation for optimized execution. The runtime applies graph transformations, operator fusions, and hardware-specific optimizations to improve performance. Common deployment scenarios include inference serving in production environments and distributed training on GPU clusters.
Multi-framework model support
Accepts trained models from PyTorch, TensorFlow, scikit-learn, LightGBM, XGBoost, and other libraries through the ONNX format, allowing a single runtime to handle models from different ecosystems.
Hardware abstraction layer
Provides consistent inference and training APIs across CPUs, GPUs, and specialized accelerators while handling platform-specific optimizations internally, reducing the need for framework-specific deployment code.
Graph-level optimization
Performs operator fusion, constant folding, and other graph transformations at the ONNX level before execution, independent of the original training framework used to create the model.
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
outputs = session.run(None, {input_name: input_data})
print(outputs[0])Release notes do not specify changes, breaking issues, or fixes included in this patch release.
- –Review commit history or changelog separately to identify any critical fixes or dependency updates.
- –Test thoroughly in staging before upgrading production workloads due to lack of documented changes.
Patch release fixing CPU Attention GQA, TensorRT subgraph generation, and WebAssembly macOS/arm64 builds; adds MoE kernels and new Python/C# APIs.
- –Update if using Attention GQA on CPU or TensorRT RTX EP, as critical bugs in both implementations are now fixed.
- –Leverage new Python and C# APIs for plugin execution providers with MemcpyFromHost/MemcpyToHost support.
Next release drops macOS x86_64 binaries, raises minimum macOS to 14.0, and removes Python 3.10 wheels; ROCM EP removed from source.
- –Plan migration off ROCM EP to AMD Migraphx or Vitis AI EPs before upgrading to the next release.
- –Cross-compilation may fail due to KleidiAI build.py changes; see issue #26175 for workarounds.
Related Repositories
Discover similar tools and frameworks used by developers
llama_index
Connect LLMs to external data via RAG workflows.
AI-Trader
LLM agent benchmarking framework for autonomous market trading.
YOLOX
PyTorch anchor-free object detector with scalable model variants.
adk-python
Modular Python framework for building production AI agents.
stablediffusion
Text-to-image diffusion in compressed latent space.