Get Free Shipping on orders over $79
Mastering Site Reliability Engineering in Enterprise : A Complete Guide to Resilient Systems & Chaos Engineering - Florian Hoeppner

Mastering Site Reliability Engineering in Enterprise

A Complete Guide to Resilient Systems & Chaos Engineering

By: Florian Hoeppner, Francesco Sbaraglia

eText | 7 October 2025

At a Glance

eText


$74.99

or 4 interest-free payments of $18.75 with

 or 

Instant online reading in your Booktopia eTextbook Library *

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Transform enterprise IT by adopting site reliability engineering (SRE) practices that reduce downtime, build resilience, and drive business value. This book is a comprehensive guide designed to help site reliability engineers, DevOps teams, and platform engineers identify, address, and mitigate system weaknesses before they become significant critical failures.

Authors Francesco Sbaraglia and Florian Hoeppner highlight the paradigm shift from IT as a cost center to a core business function, emphasizing the central role of developers and the need for speed and reliability. They detail the challenges of transitioning to SRE, including overcoming cultural resistance and legacy infrastructure limitations, while bringing to the forefront the importance of building resilience in systems and processes. Specific SRE capabilities like chaos engineering, observability, and toil management are explored, along with strategies for successful implementation, including building a Center of Excellence, selecting the right tools, and fostering a culture of collaboration and continuous improvement.

Looking ahead, the book examines emerging trends like Agentic AI SRE Agents, the use of generative AI (GenAI) in SRE and the future evolution of chaos engineering. You'll learn how to embed SRE practices into your existing enterprise tech operating model and unlock tangible business outcomes: reduced downtime, increased resilience, and measurable gains in stability. Additionally, discover how GenAI can support SRE teams in planning, executing, and optimizing reliability experiments and automating toil reduction and continuous improvement efforts.

By the end of this book, you'll know how to apply core SRE practices to strengthen reliability: establishing a chaos engineering practice led by SREs, running reliability-focused "game days," improving observability, troubleshooting failure scenarios, and fortifying the digital resilience of your systems and teams.

What You Will Learn

  • Understand the key terms and history of SRE and its guiding principles
  • Get insights into the SRE role and its evolution
  • Overcome the challenges in adopting SRE at any level of the organisation
  • Identify site reliability building blocks maturity readiness to improve digital resilience

Who This Book Is For

Professionals, architects, engineers, and practitioners eager to design, plan and implement enterprise system resilience with proven SRE practices.

on
Desktop
Tablet
Mobile

More in Computer Networking & Communications

Think Distributed Systems - Dominik Tornow

eBOOK

Network Security : A Systems Approach - Larry L Peterson

eBOOK

Cyberethics 8E - Richard A. Spinello

eTEXT

$83.67