Get Free Shipping on orders over $49
ONNX Runtime GenAI : Portable Inference Across CPU, GPU, and Edge - Trex Team

ONNX Runtime GenAI

Portable Inference Across CPU, GPU, and Edge

By: Trex Team

eBook | 7 May 2026

At a Glance

eBook


$13.76

or 4 interest-free payments of $3.44 with

Instant Digital Delivery to your Kobo Reader App

"ONNX Runtime GenAI: Portable Inference Across CPU, GPU, and Edge"

Modern AI systems rarely live on a single hardware target. They must run correctly and efficiently across CPUs, GPUs, mobile devices, and specialized edge accelerators—often while serving increasingly demanding generative workloads. This book is written for experienced practitioners who already know the basics of model deployment and now need a rigorous, architecture-level guide to making ONNX Runtime and ONNX Runtime GenAI work reliably in heterogeneous environments.

Readers will build a deep understanding of ONNX Runtime's execution model, inference sessions, execution providers, graph partitioning, and optimization pipeline, then apply that knowledge to real deployment decisions: choosing between ONNX and ORT formats, tuning CPU and GPU backends, diagnosing fallback and operator gaps, engineering reproducible environments, and optimizing performance under production constraints. The book also covers edge and mobile deployment, quantization, and the full GenAI packaging and generation workflow for portable large-model inference.

Rather than treating portability as a marketing promise, this book treats it as an engineering discipline shaped by compatibility boundaries, hardware capabilities, memory behavior, and version-sensitive tooling. Its focus is practical and advanced: not just how to run a model, but how to design, measure, debug, and evolve portable inference systems that remain dependable as runtimes, providers, and model artifacts change.

on

More in Algorithms & Data Structures

Algorithms for Validation - Mykel J. Kochenderfer

eBOOK

RRP $216.06

$172.91

20%
OFF
The Metaverse : Hype or Hoax? - Kapil Sharma

eTEXT