🚀 AI Generated Incident Summaries Feature is Now Live! See it in action! 🎉
Blog
SRE
Must Read DevOps & SRE Books for all Engineers

Must Read DevOps & SRE Books for all Engineers

March 24, 2020
Must Read DevOps & SRE Books for all Engineers
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

It has been a while since the entire world went under lockdown in light of the COVID-19 situation. Being under quarantine or social distancing isn’t easy for everyone - especially those that are used to the hustle and bustle of rushing in and out of work. The best way to tackle it is to remain calm and take the necessary preventive measures.

And, while you are at it, we thought we could help a little by providing some essential DevOps and SRE reading suggestions for all you tech folks out there.

We’ve spoken to a bunch of our good friends a.k.a SRE/DevOps folks in the community to understand what changed their perspective of software engineering as a role. Based on their suggestions, we’ve put together a “Must Read” list of books specific to the Incident Management space.

We hope this keeps you good company.

1. The Phoenix Project

The Phoenix Project is a fictional novel about a business that’s tanking for several technical and cultural (mostly cultural) issues within the organization. It paints a very real picture of how bad things can get if you don’t fix the culture at hand. This book weaves in several DevOps, Agile and Lean practices throughout the story and is super useful for people trying to understand why DevOps came about.

Bonus: Beyond The Phoenix Project is a podcast where the practices mentioned in the book are discussed more deeply.

2. The Unicorn Project

The Unicorn Project came out as a sequel to The Phoenix Project and discusses a similar approach that was taken in the prequel but with “The Five Ideals” about software development and the culture.

“The First Ideal of Locality and Simplicity; 
The Second Ideal of Focus, Flow, and Joy; 
The Third Ideal of Improvement of Daily Work; 
The Fourth Ideal of Psychological Safety; 
and the Fifth Ideal of Focus on the Customer.”

3. The Goal: A Process of Ongoing Improvement

The Goal is a Business Management work of fiction. This book is not specific to just the DevOps or IT and walks you through a story that outlines the Theory of Constraints. It’s a great read for anyone in any line of business. You can draw a lot of parallels to Alex Rogo, the plant manager to your own production practices and how you can drive continuous improvement for yourself and your teams.

Bonus: If you’re like me and love more pictures in your books, there’s a graphic novel edition of this too!

4. Effective DevOps

Effective DevOps brings out the fundamentals of implementing DevOps practices in your organization.

“Some companies think that adopting devops means bringing in specialists or a host of new tools. With this practical guide, you’ll learn why devops is a professional and cultural movement that calls for change from inside your organization."

5. Site Reliability Engineering

Site Reliability Engineering is the holy grail to understanding what SRE is all about. This book is a collection of principles, practices, and examples of everything that enables Google to be more scalable, reliable and efficient.

Each chapter covers a core functionality of the role and explains what you can do to make it better. While not everything from the book can be copy-paste implemented in your organization, it gives a big-picture idea of what the culture demands and why it is a necessity in today’s always-on world.

Bonus: You can check out the SRE Workbook which provides a hands-on approach to understand how you can implement these principles and practices in your organization.

6. Practical DevOps

Practical DevOps is a primer on DevOps and helps understand how CI/CD along with implementing other good DevOps practices can help you accelerate your software development lifecycle.

"A few things you’ll learn from the book:

  • Understand how all deployment systems fit together to form a larger system
  • Set up and familiarize yourself with all the tools you need to be efficient with DevOps
  • Design an application suitable for continuous deployment systems with DevOps in mind
  • Store and manage your code effectively using Git, Gerrit, Gitlab, and more
  • Configure a job to build a sample CRUD application
  • Test your code using automated regression testing with Jenkins Selenium
  • Deploy your code using tools such as Puppet, Ansible, Palletops, Chef, and Vagrant"

7. Real World SRE

Real World SRE provides a stepwise guide to handling a system outage. It lays down tools, strategies, and practices that’ll help you implement a more proactive SRE culture. It will also help you build your own incident response practice to ensure that you jump back up from an outage as soon as possible.

"A few things you’ll learn from the book:

  • Monitor for approaching catastrophic failure
  • Alert your team to an outage emergency 
  • Dissect your incident response strategies
  • Test automation tools and build your own software
  • Predict bottlenecks and fight for user experience
  • Eliminate the competition in an SRE interview"

8. Accelerate: Building & Scaling High Performing Technology Organization

Accelerate is a data heavy book on understanding factors that influence or affect high-performance tech teams. The data is from several DevOps reports and studies. You will be able to understand what metrics were used to measure performance and how implementing practices can affect these metrics positively.

9. Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE is a curated collection of different conversations about running the Google production systems.

"A few things you’ll learn from this book:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE"

10. A Seat at the Table: IT Leadership in the Age of Agility

A Seat at the Table explores the role of IT leadership in the world today and lays out what it should be to be a part of a healthy, successful IT organization.

While this seems to target CIOs specifically, anyone in the IT industry would benefit from reading this book!

11. The Human Side of Postmortems

The Human Side of Postmortems is a report that focuses on bringing out how our mental stress and cognitive biases affect how we handle outages. You will learn how our mental models work in these conditions and what can be done to make all of this better - Mindful Ops. 

“Mindful Ops — can reduce the effects of stress and cognitive biases, ultimately help us build more resilient systems and teams, and reduce the duration and severity of outages.” 

12. Thinking in Systems

Thinking in Systems provides a methodology of problem solving by breaking down everyday things to simple functioning systems. It is useful to get a different perspective of how things around you function, independently and as a standalone unit. This book does just that. This was also recommended reading list in our SRESpeak series. 

“It has tools, heuristics and approaches for understanding systems and interconnected components, which I’ve found especially relevant for Site Reliability Engineering. When errors happen they aren’t one off events but have many interconnected dependencies and relationships. Thinking in Systems is a toolkit for understanding these relationships and reasoning about the effects of them in a structured way.”

You can also find more such awesome DevOps and SRE resources at Site Reliability Engineering Resources. Meanwhile, we’d love to hear from you on other books that should make it to this list! Leave us a comment or reach out over a DM via Twitter and let us know your thoughts.

Written By:
March 24, 2020
Prakya Vasudevan
Prakya Vasudevan
March 24, 2020
SRE
DevOps
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Must Read DevOps & SRE Books for all Engineers

Mar 24, 2020
Last Updated:
October 4, 2024
Share this post:
Must Read DevOps & SRE Books for all Engineers

Here's a curated list of “Must Read” books specific to the Incident Management space, suggested by folks from the SRE and DevOps community to help you understand what changed their perspective of software engineering as a role.

Table of Contents:

    It has been a while since the entire world went under lockdown in light of the COVID-19 situation. Being under quarantine or social distancing isn’t easy for everyone - especially those that are used to the hustle and bustle of rushing in and out of work. The best way to tackle it is to remain calm and take the necessary preventive measures.

    And, while you are at it, we thought we could help a little by providing some essential DevOps and SRE reading suggestions for all you tech folks out there.

    We’ve spoken to a bunch of our good friends a.k.a SRE/DevOps folks in the community to understand what changed their perspective of software engineering as a role. Based on their suggestions, we’ve put together a “Must Read” list of books specific to the Incident Management space.

    We hope this keeps you good company.

    1. The Phoenix Project

    The Phoenix Project is a fictional novel about a business that’s tanking for several technical and cultural (mostly cultural) issues within the organization. It paints a very real picture of how bad things can get if you don’t fix the culture at hand. This book weaves in several DevOps, Agile and Lean practices throughout the story and is super useful for people trying to understand why DevOps came about.

    Bonus: Beyond The Phoenix Project is a podcast where the practices mentioned in the book are discussed more deeply.

    2. The Unicorn Project

    The Unicorn Project came out as a sequel to The Phoenix Project and discusses a similar approach that was taken in the prequel but with “The Five Ideals” about software development and the culture.

    “The First Ideal of Locality and Simplicity; 
    The Second Ideal of Focus, Flow, and Joy; 
    The Third Ideal of Improvement of Daily Work; 
    The Fourth Ideal of Psychological Safety; 
    and the Fifth Ideal of Focus on the Customer.”

    3. The Goal: A Process of Ongoing Improvement

    The Goal is a Business Management work of fiction. This book is not specific to just the DevOps or IT and walks you through a story that outlines the Theory of Constraints. It’s a great read for anyone in any line of business. You can draw a lot of parallels to Alex Rogo, the plant manager to your own production practices and how you can drive continuous improvement for yourself and your teams.

    Bonus: If you’re like me and love more pictures in your books, there’s a graphic novel edition of this too!

    4. Effective DevOps

    Effective DevOps brings out the fundamentals of implementing DevOps practices in your organization.

    “Some companies think that adopting devops means bringing in specialists or a host of new tools. With this practical guide, you’ll learn why devops is a professional and cultural movement that calls for change from inside your organization."

    5. Site Reliability Engineering

    Site Reliability Engineering is the holy grail to understanding what SRE is all about. This book is a collection of principles, practices, and examples of everything that enables Google to be more scalable, reliable and efficient.

    Each chapter covers a core functionality of the role and explains what you can do to make it better. While not everything from the book can be copy-paste implemented in your organization, it gives a big-picture idea of what the culture demands and why it is a necessity in today’s always-on world.

    Bonus: You can check out the SRE Workbook which provides a hands-on approach to understand how you can implement these principles and practices in your organization.

    6. Practical DevOps

    Practical DevOps is a primer on DevOps and helps understand how CI/CD along with implementing other good DevOps practices can help you accelerate your software development lifecycle.

    "A few things you’ll learn from the book:

    • Understand how all deployment systems fit together to form a larger system
    • Set up and familiarize yourself with all the tools you need to be efficient with DevOps
    • Design an application suitable for continuous deployment systems with DevOps in mind
    • Store and manage your code effectively using Git, Gerrit, Gitlab, and more
    • Configure a job to build a sample CRUD application
    • Test your code using automated regression testing with Jenkins Selenium
    • Deploy your code using tools such as Puppet, Ansible, Palletops, Chef, and Vagrant"

    7. Real World SRE

    Real World SRE provides a stepwise guide to handling a system outage. It lays down tools, strategies, and practices that’ll help you implement a more proactive SRE culture. It will also help you build your own incident response practice to ensure that you jump back up from an outage as soon as possible.

    "A few things you’ll learn from the book:

    • Monitor for approaching catastrophic failure
    • Alert your team to an outage emergency 
    • Dissect your incident response strategies
    • Test automation tools and build your own software
    • Predict bottlenecks and fight for user experience
    • Eliminate the competition in an SRE interview"

    8. Accelerate: Building & Scaling High Performing Technology Organization

    Accelerate is a data heavy book on understanding factors that influence or affect high-performance tech teams. The data is from several DevOps reports and studies. You will be able to understand what metrics were used to measure performance and how implementing practices can affect these metrics positively.

    9. Seeking SRE: Conversations About Running Production Systems at Scale

    Seeking SRE is a curated collection of different conversations about running the Google production systems.

    "A few things you’ll learn from this book:

    • Different ways of implementing SRE and SRE principles in a wide variety of settings
    • How SRE relates to other approaches such as DevOps
    • Specialties on the cutting edge that will soon be commonplace in SRE
    • Best practices and technologies that make practicing SRE easier
    • The important but rarely explored human side of SRE"

    10. A Seat at the Table: IT Leadership in the Age of Agility

    A Seat at the Table explores the role of IT leadership in the world today and lays out what it should be to be a part of a healthy, successful IT organization.

    While this seems to target CIOs specifically, anyone in the IT industry would benefit from reading this book!

    11. The Human Side of Postmortems

    The Human Side of Postmortems is a report that focuses on bringing out how our mental stress and cognitive biases affect how we handle outages. You will learn how our mental models work in these conditions and what can be done to make all of this better - Mindful Ops. 

    “Mindful Ops — can reduce the effects of stress and cognitive biases, ultimately help us build more resilient systems and teams, and reduce the duration and severity of outages.” 

    12. Thinking in Systems

    Thinking in Systems provides a methodology of problem solving by breaking down everyday things to simple functioning systems. It is useful to get a different perspective of how things around you function, independently and as a standalone unit. This book does just that. This was also recommended reading list in our SRESpeak series. 

    “It has tools, heuristics and approaches for understanding systems and interconnected components, which I’ve found especially relevant for Site Reliability Engineering. When errors happen they aren’t one off events but have many interconnected dependencies and relationships. Thinking in Systems is a toolkit for understanding these relationships and reasoning about the effects of them in a structured way.”

    You can also find more such awesome DevOps and SRE resources at Site Reliability Engineering Resources. Meanwhile, we’d love to hear from you on other books that should make it to this list! Leave us a comment or reach out over a DM via Twitter and let us know your thoughts.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    March 24, 2020
    March 24, 2020
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Prakya Vasudevan
    On-call On-boarding Checklist
    On-call On-boarding Checklist
    May 20, 2020
    Best Practices in Incident Management
    Best Practices in Incident Management
    May 7, 2020
    Configure an Intuitive Service Dashboard & Reduce Response Time
    Configure an Intuitive Service Dashboard & Reduce Response Time
    April 30, 2020
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.