🚀 Take control of your Incident Management process with Squadcast's new Audit Logs feature.

How to improve your influence as an SRE

Nov 10, 2021
Last Updated:
May 2, 2024
Share this post:
How to improve your influence as an SRE

Improving your influence over the company will help you deliver high quality work as your goals will be closely aligned with those of the company. In this blog piece, Ricardo has explained how to improve your influence as an SRE.

Table of Contents:

    Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task. SRE is an opinionated implementation of DevOps and is defined by Ben Sloss, VP of Engineering at Google as “what happens when you ask a software engineer to design an operations function”. And it even comes with a completely free manual and workbook.

    Although SRE aims to be a “prescription” on how to run complex systems the right way, reliability can mean different things in different contexts. And, usually, unless things go wrong it’s hard to prioritize reliability work ahead of features and bug fixes.

    How can SREs encourage teams to think about their operational excellence? How can SREs go about making reliability part of everyone’s daily practices? How can SREs effectively influence people to take reliability seriously and incorporate SRE concepts and practices into their routines? It turns out this is one of the most important question every SRE faces.

    Influence vs Authority

    When trying to disseminate a practice and cultivate change, you can usually go one of two routes: influence or authority. Influence is “the capacity to have an effect on the character, development, or behavior of someone or something, or the effect itself”. In your context, the goal is to provide best practices, resources, and tools in the hope teams adopt them.

    In contrast, authority is “the power or right to give orders, make decisions, and enforce obedience”. Applied to your context, it would be to dictate that teams must follow and adopt several operational practices.

    Both approaches can be valuable, depending on the context. For example, for critical systems (e.g. healthcare, aviation) authority might be required to ensure a certain level of safety. On the other hand, authority can be a detriment for teams, making them not feel part of the decision process, not taking into consideration their unique context, and alienating them. With that in mind, you want to be as influential as possible regarding reliability.

    Improving SRE influence

    When it comes to improving influence there are several ways you can go about it. Applied to an SRE context, there are several tactics you can employ that will help you make reliability part of the discussion when building complex systems.

    Make sure people understand your purpose and motivations

    When people don’t understand your motivations and goals it’s natural for them to become defensive; it’s human nature. It will be a lot easier to spread a reliability mindset if people understand what you’re trying to achieve.

    If you’re a central SRE team make it crystal clear what your team does. Clarify if it’s an operational team that takes care of services in production or if it’s a team more focused on guidelines and tooling. Create artifacts that people can access asynchronously (e.g. internal documentation, blog posts) that makes it easy to understand what you’re working towards. Give internal presentations about your work and roadmap and make your team available (e.g. mailing list, slack channel) for consultation or even just for informal chats.

    In general, people will be more receptive and address your concerns if they know what you’re all about and that they can reach you when necessary.

    Drive an understanding that SRE work ties directly to business goals

    A system’s reliability is determined, fundamentally, by its ability to do what its users need it to do. It will then be determined by how happy users are and you know those happy users are good for business. Accepting that reliability is one of the most important requirements of any service, users determine its reliability.

    SRE work will be intimately tied to business goals. Satisfied users will generate value (e.g. revenue, product popularity) and reliability is a huge contributor to that perception. It will be important to drive this understanding that you’re not focusing on reliability just to be picky but because that concern will make your business prosper.

    Create a common language to talk about reliability

    You’ve probably been in a situation where you’re trying to communicate with someone that does not speak the same language as you. Maybe you’ve gone to a foreign country, you’re asking for directions and you don’t speak the local language. Eventually, you make whoever you’re trying to speak with understand, you just want the directions to that awesome attraction and they do their best to point you in the right direction. More often than not, it’s a difficult exchange.

    Similarly, if you approach product development teams without a clear way to talk about and measure reliability, it will be hard to reason about it. Creating a shared language to talk about reliability, assess it and prioritize work will be detrimental to the success of your quest. The reliability stack will give you the basis for a framework that will make reliability conversations a lot easier. SLIs will provide you with the necessary reliability measurements while SLOs will allow you to assess, within a certain period of time, how reliable your system is. With those pieces in place, Error Budgets will make it easier to prioritize work that addresses reliability concerns.

    Get buy-in

    Because there will always be bugs to fix and features to deliver, reliability will often be an afterthought. Getting buy-in will ensure teams have reliability in mind and will advocate for it on your behalf.

    Identify key stakeholders that will help you “spread the message” and treat reliability like an obvious requirement. This will be highly dependent on your organization, but before diving into processes and tools, you should first focus on people. Maybe you need to get product development teams onboard first so that management feels that, not only, reliability is important but that teams are taking it seriously and are receptive to prioritize reliability work. Or maybe you need to address it the other way around, getting buy-in from executives so that development teams understand the need for reliability and feel safe spending time working on it.

    Whatever route makes more sense in your context, it will definitely help you improve reliability awareness when key stakeholders are “on your side” and they themselves drive those discussions with peers as well as management.

    Make your work visible

    SRE work is, effectively, work. There’s engineering work but there’s also a lot of work related to advocating, coaching, or consulting. And it should be clear to everyone what you’re working on, what your goals are, what you’re trying to achieve, what your roadmap is.

    Whatever tools and processes product development teams are using, you should be using similar ones. This will help standardize how work is tracked and prioritized. It will allow teams to easily understand what you’re focusing on, if they require something from you that you’re not prioritizing or if your shared goals might be compromised.

    Making sure your work is visible will assure teams you don’t have any hidden agenda and that you’re all working with the same goals in mind.

    Communicate extensively; always be on sales mode; build bridges

    It’s very important to build bridges. People are a lot more receptive to hearing your thoughts and ideas if they trust you. If you start, out of the gate, telling people what they need to fix and prioritize, you’ll be met with a lot of resistance.

    Start by sitting down with teams and understanding what they do, what their products are, what pain points they have, and what they would like to improve. Actively listening can take you a long way. Ask questions, clarify issues, and understand what’s at stake. You’ll want teams to see you as a partner, that wants to make their lives easier and not as an adversary.

    An “us vs them” mentality can be attained quickly if you start imposing reliability-concern gates. Either with lengthy manual processes or automated blocking checks, enforcing might come at the expense of the team’s goodwill. Instead, work with them to make sure those concerns are valid, share engineering work that will address those concerns, and communicate well in advance about when, and if any blocking gate will be put in place.

    Be self-service, not a bottleneck

    SRE work is never-ending. It’s a day-to-day practice that needs to scale with the organization. And one of the best ways to scale it is to be self-service.

    You should be building tools that would make it easy for teams to incorporate reliability concerns into their work. For example, you could be building and maintaining libraries that export the necessary metrics to your monitoring system or that standardize logging. Or you could be building automation capabilities that would help diagnose problems in your systems. Or you could be building tools that automate manual tasks and address known issues within your systems. Most importantly, you don’t want to be a bottleneck. You want teams to be as independent as possible to do their work and deliver value to the business.

    Conclusion

    SRE involves a lot of engineering work encompassing, at the same time, a lot of communication. A lot of that communication is targeted at influencing teams to take reliability seriously and make it part of their work.

    Making sure that teams understand what you're working towards is critical to get people on board. You should communicate extensively, create artifacts, give internal talks, partner with teams, and make your work visible. Make sure teams understand that reliability is measured by how happy users are with your services, that reliability work is focused on making sure they are satisfied, and that the business will thrive on that.

    Get people on board, get their buy-in. Your message and goals will be easier to spread if you have advocates on your side. Identify key stakeholders, work with them, understand their concerns and build with them a common understanding of what reliability looks like, that they would happily share with others. Having a shared language or framework, like the reliability stack, will make it a lot easier to talk about, assess, and prioritize reliability work.

    At the end of the day, you want to deliver value and enable teams to focus on delivering value to the business. Make yourself self-service. Build tools that help improve reliability, which is a no-brainer to use. Help teams solve pain points through automation. Build self-remediation tools to help address known issues in your system. And make sure you don’t become just another 'pain' or 'gate' that teams complain and dread about. You should be seen as a partner, as a team that works with the same goals in mind and that is there to help when necessary.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    November 10, 2021
    November 10, 2021
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Ricardo Castro
    How to Implement Global View and High Availability for Prometheus
    How to Implement Global View and High Availability for Prometheus
    March 11, 2022
    The Critical Role of Observability in SRE
    The Critical Role of Observability in SRE
    December 3, 2021
    Going from Zero to SRE
    Going from Zero to SRE
    September 14, 2021
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
    Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
    Users love Squadcast on G2
    Copyright © Squadcast Inc. 2017-2024
    Blog
    SRE
    How to improve your influence as an SRE

    How to improve your influence as an SRE

    Ricardo Castro
    Ricardo Castro
    November 10, 2021
    How to improve your influence as an SRE

    Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task. SRE is an opinionated implementation of DevOps and is defined by Ben Sloss, VP of Engineering at Google as “what happens when you ask a software engineer to design an operations function”. And it even comes with a completely free manual and workbook.

    Although SRE aims to be a “prescription” on how to run complex systems the right way, reliability can mean different things in different contexts. And, usually, unless things go wrong it’s hard to prioritize reliability work ahead of features and bug fixes.

    How can SREs encourage teams to think about their operational excellence? How can SREs go about making reliability part of everyone’s daily practices? How can SREs effectively influence people to take reliability seriously and incorporate SRE concepts and practices into their routines? It turns out this is one of the most important question every SRE faces.

    Influence vs Authority

    When trying to disseminate a practice and cultivate change, you can usually go one of two routes: influence or authority. Influence is “the capacity to have an effect on the character, development, or behavior of someone or something, or the effect itself”. In your context, the goal is to provide best practices, resources, and tools in the hope teams adopt them.

    In contrast, authority is “the power or right to give orders, make decisions, and enforce obedience”. Applied to your context, it would be to dictate that teams must follow and adopt several operational practices.

    Both approaches can be valuable, depending on the context. For example, for critical systems (e.g. healthcare, aviation) authority might be required to ensure a certain level of safety. On the other hand, authority can be a detriment for teams, making them not feel part of the decision process, not taking into consideration their unique context, and alienating them. With that in mind, you want to be as influential as possible regarding reliability.

    Improving SRE influence

    When it comes to improving influence there are several ways you can go about it. Applied to an SRE context, there are several tactics you can employ that will help you make reliability part of the discussion when building complex systems.

    Make sure people understand your purpose and motivations

    When people don’t understand your motivations and goals it’s natural for them to become defensive; it’s human nature. It will be a lot easier to spread a reliability mindset if people understand what you’re trying to achieve.

    If you’re a central SRE team make it crystal clear what your team does. Clarify if it’s an operational team that takes care of services in production or if it’s a team more focused on guidelines and tooling. Create artifacts that people can access asynchronously (e.g. internal documentation, blog posts) that makes it easy to understand what you’re working towards. Give internal presentations about your work and roadmap and make your team available (e.g. mailing list, slack channel) for consultation or even just for informal chats.

    In general, people will be more receptive and address your concerns if they know what you’re all about and that they can reach you when necessary.

    Drive an understanding that SRE work ties directly to business goals

    A system’s reliability is determined, fundamentally, by its ability to do what its users need it to do. It will then be determined by how happy users are and you know those happy users are good for business. Accepting that reliability is one of the most important requirements of any service, users determine its reliability.

    SRE work will be intimately tied to business goals. Satisfied users will generate value (e.g. revenue, product popularity) and reliability is a huge contributor to that perception. It will be important to drive this understanding that you’re not focusing on reliability just to be picky but because that concern will make your business prosper.

    Create a common language to talk about reliability

    You’ve probably been in a situation where you’re trying to communicate with someone that does not speak the same language as you. Maybe you’ve gone to a foreign country, you’re asking for directions and you don’t speak the local language. Eventually, you make whoever you’re trying to speak with understand, you just want the directions to that awesome attraction and they do their best to point you in the right direction. More often than not, it’s a difficult exchange.

    Similarly, if you approach product development teams without a clear way to talk about and measure reliability, it will be hard to reason about it. Creating a shared language to talk about reliability, assess it and prioritize work will be detrimental to the success of your quest. The reliability stack will give you the basis for a framework that will make reliability conversations a lot easier. SLIs will provide you with the necessary reliability measurements while SLOs will allow you to assess, within a certain period of time, how reliable your system is. With those pieces in place, Error Budgets will make it easier to prioritize work that addresses reliability concerns.

    Get buy-in

    Because there will always be bugs to fix and features to deliver, reliability will often be an afterthought. Getting buy-in will ensure teams have reliability in mind and will advocate for it on your behalf.

    Identify key stakeholders that will help you “spread the message” and treat reliability like an obvious requirement. This will be highly dependent on your organization, but before diving into processes and tools, you should first focus on people. Maybe you need to get product development teams onboard first so that management feels that, not only, reliability is important but that teams are taking it seriously and are receptive to prioritize reliability work. Or maybe you need to address it the other way around, getting buy-in from executives so that development teams understand the need for reliability and feel safe spending time working on it.

    Whatever route makes more sense in your context, it will definitely help you improve reliability awareness when key stakeholders are “on your side” and they themselves drive those discussions with peers as well as management.

    Make your work visible

    SRE work is, effectively, work. There’s engineering work but there’s also a lot of work related to advocating, coaching, or consulting. And it should be clear to everyone what you’re working on, what your goals are, what you’re trying to achieve, what your roadmap is.

    Whatever tools and processes product development teams are using, you should be using similar ones. This will help standardize how work is tracked and prioritized. It will allow teams to easily understand what you’re focusing on, if they require something from you that you’re not prioritizing or if your shared goals might be compromised.

    Making sure your work is visible will assure teams you don’t have any hidden agenda and that you’re all working with the same goals in mind.

    Communicate extensively; always be on sales mode; build bridges

    It’s very important to build bridges. People are a lot more receptive to hearing your thoughts and ideas if they trust you. If you start, out of the gate, telling people what they need to fix and prioritize, you’ll be met with a lot of resistance.

    Start by sitting down with teams and understanding what they do, what their products are, what pain points they have, and what they would like to improve. Actively listening can take you a long way. Ask questions, clarify issues, and understand what’s at stake. You’ll want teams to see you as a partner, that wants to make their lives easier and not as an adversary.

    An “us vs them” mentality can be attained quickly if you start imposing reliability-concern gates. Either with lengthy manual processes or automated blocking checks, enforcing might come at the expense of the team’s goodwill. Instead, work with them to make sure those concerns are valid, share engineering work that will address those concerns, and communicate well in advance about when, and if any blocking gate will be put in place.

    Be self-service, not a bottleneck

    SRE work is never-ending. It’s a day-to-day practice that needs to scale with the organization. And one of the best ways to scale it is to be self-service.

    You should be building tools that would make it easy for teams to incorporate reliability concerns into their work. For example, you could be building and maintaining libraries that export the necessary metrics to your monitoring system or that standardize logging. Or you could be building automation capabilities that would help diagnose problems in your systems. Or you could be building tools that automate manual tasks and address known issues within your systems. Most importantly, you don’t want to be a bottleneck. You want teams to be as independent as possible to do their work and deliver value to the business.

    Conclusion

    SRE involves a lot of engineering work encompassing, at the same time, a lot of communication. A lot of that communication is targeted at influencing teams to take reliability seriously and make it part of their work.

    Making sure that teams understand what you're working towards is critical to get people on board. You should communicate extensively, create artifacts, give internal talks, partner with teams, and make your work visible. Make sure teams understand that reliability is measured by how happy users are with your services, that reliability work is focused on making sure they are satisfied, and that the business will thrive on that.

    Get people on board, get their buy-in. Your message and goals will be easier to spread if you have advocates on your side. Identify key stakeholders, work with them, understand their concerns and build with them a common understanding of what reliability looks like, that they would happily share with others. Having a shared language or framework, like the reliability stack, will make it a lot easier to talk about, assess, and prioritize reliability work.

    At the end of the day, you want to deliver value and enable teams to focus on delivering value to the business. Make yourself self-service. Build tools that help improve reliability, which is a no-brainer to use. Help teams solve pain points through automation. Build self-remediation tools to help address known issues in your system. And make sure you don’t become just another 'pain' or 'gate' that teams complain and dread about. You should be seen as a partner, as a team that works with the same goals in mind and that is there to help when necessary.

    Written By:
    Ricardo Castro
    Ricardo Castro
    November 10, 2021
    SRE
    DevOps
    Share this blog:
    Get reliability insights delivered straight to your inbox.
    Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    If you wish to unsubscribe, we won't hold it against you. Privacy policy.
    Get reliability insights delivered straight to your inbox.
    Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    If you wish to unsubscribe, we won't hold it against you. Privacy policy.