Bringing Deep Learning Models to the Edge Efficiently Using ONNX.
Book Description
ONNX has emerged as the de facto standard for deploying portable, framework-agnostic machine learning models across diverse hardware platforms.
Ultimate ONNX for Deep Learning Optimization provides a structured, end-to-end guide to the ONNX ecosystem, starting with ONNX fundamentals, model representation, and framework integration. You will learn how to export models from PyTorch, TensorFlow, and Scikit-Learn, inspect and modify ONNX graphs, and leverage ONNX Runtime and ONNX Simplifier for inference optimization. Each chapter builds technical depth, equipping you with the tools required to move models beyond experimentation.
The book focuses on performance-critical optimization techniques, including quantization, pruning, and knowledge distillation, followed by practical deployment on edge devices such as Raspberry Pi. Through complete, real-world case studies covering object detection, speech recognition, and compact language models, you can implement custom operators, follow deployment best practices, and understand production constraints. Thus, by the end of this book, you will be capable of designing, optimizing, and deploying efficient ONNX-based AI systems for edge environments.
Table of Contents
Introduction to ONNX and Edge Computing
Getting Started with ONNX
ONNX Integration with Deep Learning Frameworks
Model Optimization Using ONNX Simplifier and ONNX Runtime
Model Quantization Using ONNX Runtime
Model Pruning in Pytorch and Exporting to ONNX
Knowledge Distillation for Edge AI
Deploying ONNX Models on Edge Devices
End to End Execution of YOLOv12
End to End Execution of Whisper Speech Recognition Model
End to End Execution of SmolLM Model
ONNX Model from Scratch and Custom Operators
Real-World Applications, Best Practices, Security, and Future Trends in ONNX for Edge AI
Index
Share This eBook: