Get Free Shipping on orders over $89
Hands-On LLM Serving and Optimization : Hosting LLMs at Scale - Chi Wang

Hands-On LLM Serving and Optimization

Hosting LLMs at Scale

By: Chi Wang, Peiheng Hu

eText | 28 April 2026 | Edition Number 1

At a Glance

eText


$85.79

or 4 interest-free payments of $21.45 with

 or 

Instant online reading in your Booktopia eTextbook Library *

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Large language models (LLMs) are the reasoning engines of modern AI. Today, a major inflection point has arrived: as the world races to deploy AI at scale, model inference has moved to the center of the stack. Welcome to the inference era.

Without proper optimization, however, LLMs can be expensive and slow to serve. Hands-On LLM Serving and Optimization is a comprehensive guide to the complexities of deploying and optimizing LLMs at scale.

In this hands-on, engineering-focused book, authors Chi Wang and Peiheng Hu combine practical examples, code, and strategies for building robust, performant, and cost-efficient AI token factories. Whether you're building the LLM inference infrastructure or the applications that consume it, a deep understanding of LLM serving will make you a more effective, future-ready engineer as AI transforms how we work and build.

  • Learn the foundations of model serving with core concepts, design paradigms, and industry best practices
  • Understand the common challenges of hosting LLMs at scale
  • Balance latency and throughput to meet the demands of AI applications and business requirements
  • Host LLMs cost-effectively with practical, code-backed techniques
on
Desktop
Tablet
Mobile

More in Natural Language & Machine Translation