🎉 We are live on Product Hunt right now  🎉

Must Read DevOps & SRE Books for all Engineers

Here's a curated list of “Must Read” books specific to the Incident Management space, suggested by folks from the SRE and DevOps community to help you understand what changed their perspective of software engineering as a role.

It has been a while since the entire world went under lockdown in light of the COVID-19 situation. Being under quarantine or social distancing isn’t easy for everyone - especially those that are used to the hustle and bustle of rushing in and out of work. The best way to tackle it is to remain calm and take the necessary preventive measures.

And, while you are at it, we thought we could help a little by providing some essential DevOps and SRE reading suggestions for all you tech folks out there.

We’ve spoken to a bunch of our good friends a.k.a SRE/DevOps folks in the community to understand what changed their perspective of software engineering as a role. Based on their suggestions, we’ve put together a “Must Read” list of books specific to the Incident Management space.

We hope this keeps you good company.

1. The Phoenix Project

The Phoenix Project is a fictional novel about a business that’s tanking for several technical and cultural (mostly cultural) issues within the organization. It paints a very real picture of how bad things can get if you don’t fix the culture at hand. This book weaves in several DevOps, Agile and Lean practices throughout the story and is super useful for people trying to understand why DevOps came about.

Bonus: Beyond The Phoenix Project is a podcast where the practices mentioned in the book are discussed more deeply.

2. The Unicorn Project

The Unicorn Project came out as a sequel to The Phoenix Project and discusses a similar approach that was taken in the prequel but with “The Five Ideals” about software development and the culture.

“The First Ideal of Locality and Simplicity; 
The Second Ideal of Focus, Flow, and Joy; 
The Third Ideal of Improvement of Daily Work; 
The Fourth Ideal of Psychological Safety; 
and the Fifth Ideal of Focus on the Customer.”

3. The Goal: A Process of Ongoing Improvement

The Goal is a Business Management work of fiction. This book is not specific to just the DevOps or IT and walks you through a story that outlines the Theory of Constraints. It’s a great read for anyone in any line of business. You can draw a lot of parallels to Alex Rogo, the plant manager to your own production practices and how you can drive continuous improvement for yourself and your teams.

Bonus: If you’re like me and love more pictures in your books, there’s a graphic novel edition of this too!

4. Effective DevOps

Effective DevOps brings out the fundamentals of implementing DevOps practices in your organization.

“Some companies think that adopting devops means bringing in specialists or a host of new tools. With this practical guide, you’ll learn why devops is a professional and cultural movement that calls for change from inside your organization."

5. Site Reliability Engineering

Site Reliability Engineering is the holy grail to understanding what SRE is all about. This book is a collection of principles, practices, and examples of everything that enables Google to be more scalable, reliable and efficient.

Each chapter covers a core functionality of the role and explains what you can do to make it better. While not everything from the book can be copy-paste implemented in your organization, it gives a big-picture idea of what the culture demands and why it is a necessity in today’s always-on world.

Bonus: You can check out the SRE Workbook which provides a hands-on approach to understand how you can implement these principles and practices in your organization.

6. Practical DevOps

Practical DevOps is a primer on DevOps and helps understand how CI/CD along with implementing other good DevOps practices can help you accelerate your software development lifecycle.

"A few things you’ll learn from the book:

  • Understand how all deployment systems fit together to form a larger system
  • Set up and familiarize yourself with all the tools you need to be efficient with DevOps
  • Design an application suitable for continuous deployment systems with DevOps in mind
  • Store and manage your code effectively using Git, Gerrit, Gitlab, and more
  • Configure a job to build a sample CRUD application
  • Test your code using automated regression testing with Jenkins Selenium
  • Deploy your code using tools such as Puppet, Ansible, Palletops, Chef, and Vagrant"

7. Real World SRE

Real World SRE provides a stepwise guide to handling a system outage. It lays down tools, strategies, and practices that’ll help you implement a more proactive SRE culture. It will also help you build your own incident response practice to ensure that you jump back up from an outage as soon as possible.

"A few things you’ll learn from the book:

  • Monitor for approaching catastrophic failure
  • Alert your team to an outage emergency 
  • Dissect your incident response strategies
  • Test automation tools and build your own software
  • Predict bottlenecks and fight for user experience
  • Eliminate the competition in an SRE interview"

8. Accelerate: Building & Scaling High Performing Technology Organization

Accelerate is a data heavy book on understanding factors that influence or affect high-performance tech teams. The data is from several DevOps reports and studies. You will be able to understand what metrics were used to measure performance and how implementing practices can affect these metrics positively.

9. Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE is a curated collection of different conversations about running the Google production systems.

"A few things you’ll learn from this book:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE"

10. A Seat at the Table: IT Leadership in the Age of Agility

A Seat at the Table explores the role of IT leadership in the world today and lays out what it should be to be a part of a healthy, successful IT organization.

While this seems to target CIOs specifically, anyone in the IT industry would benefit from reading this book!

11. The Human Side of Postmortems

The Human Side of Postmortems is a report that focuses on bringing out how our mental stress and cognitive biases affect how we handle outages. You will learn how our mental models work in these conditions and what can be done to make all of this better - Mindful Ops. 

“Mindful Ops — can reduce the effects of stress and cognitive biases, ultimately help us build more resilient systems and teams, and reduce the duration and severity of outages.” 

12. Thinking in Systems

Thinking in Systems provides a methodology of problem solving by breaking down everyday things to simple functioning systems. It is useful to get a different perspective of how things around you function, independently and as a standalone unit. This book does just that. This was also recommended reading list in our SRESpeak series. 

“It has tools, heuristics and approaches for understanding systems and interconnected components, which I’ve found especially relevant for Site Reliability Engineering. When errors happen they aren’t one off events but have many interconnected dependencies and relationships. Thinking in Systems is a toolkit for understanding these relationships and reasoning about the effects of them in a structured way.”

You can also find more such awesome DevOps and SRE resources at Site Reliability Engineering Resources. Meanwhile, we’d love to hear from you on other books that should make it to this list! Leave us a comment or reach out over a DM via Twitter and let us know your thoughts.

Learn more about Squadcast:
March 24, 2020
Prakya Vasudevan
About the Author:

Must Read DevOps & SRE Books for all Engineers

March 24, 2020
Here's a curated list of “Must Read” books specific to the Incident Management space, suggested by folks from the SRE and DevOps community to help you understand what changed their perspective of software engineering as a role.

It has been a while since the entire world went under lockdown in light of the COVID-19 situation. Being under quarantine or social distancing isn’t easy for everyone - especially those that are used to the hustle and bustle of rushing in and out of work. The best way to tackle it is to remain calm and take the necessary preventive measures.

And, while you are at it, we thought we could help a little by providing some essential DevOps and SRE reading suggestions for all you tech folks out there.

We’ve spoken to a bunch of our good friends a.k.a SRE/DevOps folks in the community to understand what changed their perspective of software engineering as a role. Based on their suggestions, we’ve put together a “Must Read” list of books specific to the Incident Management space.

We hope this keeps you good company.

1. The Phoenix Project

The Phoenix Project is a fictional novel about a business that’s tanking for several technical and cultural (mostly cultural) issues within the organization. It paints a very real picture of how bad things can get if you don’t fix the culture at hand. This book weaves in several DevOps, Agile and Lean practices throughout the story and is super useful for people trying to understand why DevOps came about.

Bonus: Beyond The Phoenix Project is a podcast where the practices mentioned in the book are discussed more deeply.

2. The Unicorn Project

The Unicorn Project came out as a sequel to The Phoenix Project and discusses a similar approach that was taken in the prequel but with “The Five Ideals” about software development and the culture.

“The First Ideal of Locality and Simplicity; 
The Second Ideal of Focus, Flow, and Joy; 
The Third Ideal of Improvement of Daily Work; 
The Fourth Ideal of Psychological Safety; 
and the Fifth Ideal of Focus on the Customer.”

3. The Goal: A Process of Ongoing Improvement

The Goal is a Business Management work of fiction. This book is not specific to just the DevOps or IT and walks you through a story that outlines the Theory of Constraints. It’s a great read for anyone in any line of business. You can draw a lot of parallels to Alex Rogo, the plant manager to your own production practices and how you can drive continuous improvement for yourself and your teams.

Bonus: If you’re like me and love more pictures in your books, there’s a graphic novel edition of this too!

4. Effective DevOps

Effective DevOps brings out the fundamentals of implementing DevOps practices in your organization.

“Some companies think that adopting devops means bringing in specialists or a host of new tools. With this practical guide, you’ll learn why devops is a professional and cultural movement that calls for change from inside your organization."

5. Site Reliability Engineering

Site Reliability Engineering is the holy grail to understanding what SRE is all about. This book is a collection of principles, practices, and examples of everything that enables Google to be more scalable, reliable and efficient.

Each chapter covers a core functionality of the role and explains what you can do to make it better. While not everything from the book can be copy-paste implemented in your organization, it gives a big-picture idea of what the culture demands and why it is a necessity in today’s always-on world.

Bonus: You can check out the SRE Workbook which provides a hands-on approach to understand how you can implement these principles and practices in your organization.

6. Practical DevOps

Practical DevOps is a primer on DevOps and helps understand how CI/CD along with implementing other good DevOps practices can help you accelerate your software development lifecycle.

"A few things you’ll learn from the book:

  • Understand how all deployment systems fit together to form a larger system
  • Set up and familiarize yourself with all the tools you need to be efficient with DevOps
  • Design an application suitable for continuous deployment systems with DevOps in mind
  • Store and manage your code effectively using Git, Gerrit, Gitlab, and more
  • Configure a job to build a sample CRUD application
  • Test your code using automated regression testing with Jenkins Selenium
  • Deploy your code using tools such as Puppet, Ansible, Palletops, Chef, and Vagrant"

7. Real World SRE

Real World SRE provides a stepwise guide to handling a system outage. It lays down tools, strategies, and practices that’ll help you implement a more proactive SRE culture. It will also help you build your own incident response practice to ensure that you jump back up from an outage as soon as possible.

"A few things you’ll learn from the book:

  • Monitor for approaching catastrophic failure
  • Alert your team to an outage emergency 
  • Dissect your incident response strategies
  • Test automation tools and build your own software
  • Predict bottlenecks and fight for user experience
  • Eliminate the competition in an SRE interview"

8. Accelerate: Building & Scaling High Performing Technology Organization

Accelerate is a data heavy book on understanding factors that influence or affect high-performance tech teams. The data is from several DevOps reports and studies. You will be able to understand what metrics were used to measure performance and how implementing practices can affect these metrics positively.

9. Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE is a curated collection of different conversations about running the Google production systems.

"A few things you’ll learn from this book:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE"

10. A Seat at the Table: IT Leadership in the Age of Agility

A Seat at the Table explores the role of IT leadership in the world today and lays out what it should be to be a part of a healthy, successful IT organization.

While this seems to target CIOs specifically, anyone in the IT industry would benefit from reading this book!

11. The Human Side of Postmortems

The Human Side of Postmortems is a report that focuses on bringing out how our mental stress and cognitive biases affect how we handle outages. You will learn how our mental models work in these conditions and what can be done to make all of this better - Mindful Ops. 

“Mindful Ops — can reduce the effects of stress and cognitive biases, ultimately help us build more resilient systems and teams, and reduce the duration and severity of outages.” 

12. Thinking in Systems

Thinking in Systems provides a methodology of problem solving by breaking down everyday things to simple functioning systems. It is useful to get a different perspective of how things around you function, independently and as a standalone unit. This book does just that. This was also recommended reading list in our SRESpeak series. 

“It has tools, heuristics and approaches for understanding systems and interconnected components, which I’ve found especially relevant for Site Reliability Engineering. When errors happen they aren’t one off events but have many interconnected dependencies and relationships. Thinking in Systems is a toolkit for understanding these relationships and reasoning about the effects of them in a structured way.”

You can also find more such awesome DevOps and SRE resources at Site Reliability Engineering Resources. Meanwhile, we’d love to hear from you on other books that should make it to this list! Leave us a comment or reach out over a DM via Twitter and let us know your thoughts.

Prakya Vasudevan
Want to share the awesomeness?
🎉 We are live on Product Hunt right now  🎉