🔥 Now Live: Our Latest Enterprise-Grade Feature - Live Call Routing!

How to configure services in Squadcast: Best practices to reduce MTTR

Mar 31, 2021
Last Updated:
Mar 31, 2021
Share this post:
How to configure services in Squadcast: Best practices to reduce MTTR

With a rise in digital platforms, IT infrastructure has grown exponentially complex to a level where multiple application interdependencies coexist with varied architecture & oncall team types. This blog looks at how you can model your infrastructure in Squadcast to reduce your time to respond & resolve incidents.

Table of Contents:

    As an SRE of an organization with a rapidly growing infrastructure with several interdependencies, you may have struggled with configuring things on an incident management platform. If you have a smaller team with a monolithic architecture in place it is still relatively easier to connect the infrastructure to your incident management platform and create rules for escalations and alerting. But what happens if you have a large on-call team spread across time zones looking after the infrastructure that has hundreds of microservices running concurrently? How do you configure it all in your incident management platform while keeping in mind the load your on-call team will be under?

    Since most platforms let you create services that accept alerts from monitoring tools, should you create 100 such services for every component of your infrastructure?

    We will be tackling similar questions in this blog. But before we dive deeper, here are few things to be aware of.

    Q: What are the key aspects this article would be addressing?

    A: In this blog, we look at ways your team can configure incident management platform, in particular Squadcast, to ensure that you don't waste precious time responding to incidents.

    Q: What this article won't cover?

    A: Unfortunately, we cannot have a single solution that will work for every type of situation. This post seeks to provide some clarity to this problem. We have put together a set of best practices that should cover most production systems out there.

    Some of the concerns you may have while modelling your services are

    • Will I be alerted on time?
    • How to avoid irrelevant alerts?
    • Is the alert getting routed to the right person?
    • Am I getting alerts for the most critical pieces of my infrastructure?

    As a modern incident management platform, Squadcast aggregates and routes alerts from monitoring tools and provides a centralised dashboard for tracking and prioritising alerts along with taking action and ultimately resolving the incident (the latter part will be covered in our blog titled “Intelligent Incident Response Plan”). Owing to its flexible configuration capabilities, there are many ways you can set-up alerting for services within Squadcast.

    This blog takes into account the different kinds of infrastructure (monolithic/microservices or distributed) and types of on-call teams that are present.

    Unified Incident Response Platform
    Try for free
    Seamlessly integrate On-Call Management, Incident Response and SRE Workflows for efficient operations.
    Automate Incident Response, minimize downtime and enhance your tech teams' productivity with our Unified Platform.
    Manage incidents anytime, anywhere with our native iOS and Android mobile apps.
    Try for free

    Before we get started with the best practices, here are some Squadcast specific features that you need to know while configuring the platform.

    • Squads: These are groups of on-call engineers and non-technical users that can be organized by business function or technology.
    • Services: Services are a logical group of alert sources that can be tagged, deduplicated or routed to the right person/team. They are most commonly used to represent individual parts of your infrastructure. Please note that services can receive alerts from more than one monitoring tool.
    • Tags: Tags in Squadcast can be auto-created to include context rich information with alerts. You can create your own rules for tagging alerts.
    • Routing: Routing in Squadcast is used when you want alerts to be sent to someone who is not the default recipient. This is helpful when a specific part of your infrastructure is facing issues that require more specialised knowledge.
    • Escalation Policies: These policies see to it that a critical alert is never missed. You can configure them to ensure that the right users and squads are alerted at the right time.
    • On-call Rotations: On-call schedules are used to determine who will be notified when an incident is triggered. This helps you build a balanced on-call culture and ensures that no critical alerts are missed.

    These features provide the backbone for the best practices in alerting for your organisation. While the solutions described in this blog are generic, with a little tweaking, chances are they will work for you. We have tried to be as inclusive as possible while creating these best practices. Before we get started on modelling your system in Squadcast, here are the assumptions we are making about the alerting systems you have in place.

    Monitoring: We are assuming that you are already monitoring all the important aspects of your infrastructure. This includes alerting, metric collection, log aggregation and tracing/instrumentation practices. We are also assuming that you have a good mix of proactive, reactive and investigative alerts in place. Further, you have also categorised the alerts based on whether they are related to the infrastructure or to the application side(business dependent).

    Relevant Alerting: The alerts you have in place are linked to important parts of your infrastructure and are already optimised. This includes alerts that are actionable and not over sensitive (the right threshold). This also includes having the right deduplication rules in place to mitigate alert noise. We are also assuming that you can add identifying information to your alert payloads.

    Our recommendations assume that the alerting system you have in place presently is well suited to the type of business and tech stack that you are using.

    The way you model your system will depend on several factors. First we will be looking at the kind of architecture you have in place.

    Architecture

    For the purpose of this blog post, we will consider the following as different types of architecture that you may be using :

    • Monolithic Architecture: All of your core functionality is concentrated to a single executable application with related infrastructure dependencies like app server, databases, load balancers etc. Your SRE team is responsible for maintenance of this part of the infrastructure.
    • Distributed: A distributed architecture has multiple interdependent executable applications that intercommunicate with their related infrastructure dependencies. These may or may not be replicated. We will assume that the number of internal units is low enough, that they can be committed to memory
    • Microservices: A distributed architecture with a very large number of components. Due to the sheer number of these services, it is not feasible to create individual Squadcast services for each component.
    • Multiple Unrelated Applications: Though less commonly found, these can be treated as a special case of the types of architecture mentioned above. This scenario may come into being when you need an incident management system with a proprietary application framework that doesn't fit into any of the above. This kind of architecture may be seen in organisations that require compartmentalised applications for security or compliance reasons.
    • Kubernetes based architecture: Some types of alerts from this kind of infrastructure are eliminated or automatically resolved by Kubernetes itself. Other than this, there is no significant difference from a common microservice architecture.

    Response Team Organisation

    • All-in-One Incident Response Team: In this type of setup, all responders are organised into one team. Due to the nature of this setup it is possible to have lesser or negligible routing for alerts in your incident management platform.
    • Service based: For larger organisations with more complex infrastructure, each application may have a dedicated team. Each team maintains their application and the infrastructure it depends on. Some examples are:
      • Public API Team
      • Inventory Service Team
    • Infrastructure Layer based: This type of team organisation can be found in larger companies. In addition to application teams, there are teams that specialise in managing certain kinds of technology. Examples include
      • Inventory System Team
      • Database Team
      • Load balancer Team
      • Networking Team
    • L1/L2/L3 Teams:  In this system, teams are organised into first responders and escalation teams. This type or team organisation can be considered a special case of the types mentioned above and for the sake of simplicity, we will not be discussing these separately.
    Integrated Reliability Automation Platform
    Platform
    PagerDuty
    FireHydrant
    Squadcast
    Incident Retrospectives
    APM, Monitoring, ITSM,Ticketing Integrations
    Incident
    Notes
    On Call Rotations
    Built-In Public and Private Status Page
    Advanced Error Budget Tracking
    Try For free
    Platform
    Incident Retrospectives
    APM, Monitoring, ITSM,Ticketing Integrations
    Incident
    Notes
    On Call Rotations
    Built-In Public and Private Status Page
    Advanced Error Budget Tracking
    PagerDuty
    FireHydrant
    Squadcast
    Try For free

    Recommendations for Configuring Services

    Before we recommend the best way to configure Squadcast for your organization, please select the type of architecture and on-call team you have.

    1: What kind of architecture does your application have?
    Monolithic
    Distributed
    Microservice
    2: What kind of on-call team do you have in your organization?
    All-in-One Incident Response Teams
    Service Based Teams
    Business Specific Teams
    Infrastructure Layer Based Teams

    For the above choices, this is the ideal Squadcast configuration for your architecture and on-call team type.

    Monolithic Architecture with an All-in-One Incident Response Teams

    Squads: Creation of one squad in Squadcast is sufficient for this kind of architecture. This squad will have members of the on-call team or any non-technical stakeholder if required.
    Services: Creation of a single service in Squadcast will suffice and all alerts from monitoring tools can be sent to this service.
    Tagging: Event tagging is optional in this scenario.
    Routing: Alert Routing is not strictly necessary, unless you have an on-call team with varying levels of expertise.

    Monolithic Architecture for Service Based Teams

    Squads: Each business specific team will need their own squad with relevant escalation policies. One additional cross-functional team may be required for handling infrastructure related issues.
    Services: Individual services have to be created in Squadcast for each function/team specific area and one for infrastructure related issues.
    Tagging: Event tagging is optional in this scenario.
    Routing: Alert Routing is optional in this scenario.

    Monolithic Architecture for Infrastructure Layer Based Teams

    Squads: One team will be required for each infrastructure layer being monitored in the backend. Alerts will be sent to the team responsible for handling the incident.
    Services: You will need to create separate services in Squadcast for each layer of infrastructure being monitored.
    Tagging: Event Tagging is optional in this scenario.
    Routing: Alert Routing is optional in this scenario.

    Distributed Architecture All-in-One Incident Response Teams

    Squads: One squad has to be created in Squadcast. This squad will include all members of the on-call team or any other non-technical stakeholder.
    Services: One service needs to be created in Squadcast for each critical application service being monitored.
    Tagging: Event tagging is optional in this scenario.
    Routing: Alert routing is optional in this scenario.

    Distributed Architecture for Service Based Teams

    Squads: Multiple squads have to be created in Squadcast for respective business teams.
    Services: One service needs to be created in Squadcast for each application in the distributed system
    Tagging: Event Tagging is optional in this scenario.
    Routing: Alert Routing is optional in this scenario.

    Distributed Architecture for Infrastructure Layer Based Teams

    Squads: One squad has to be created in Squadcast for each infrastructure team. This is required since each team needs separate routing and escalation policies.
    Services: You have two options to choose from while configuring services for this type:

    • One squadcast service for each source.This will require information in the alert payload to distinguish whether it is application related or infra layer related.
    • Multiple services for each infrastructure layer and for application related alerts

    Tagging: Event Tagging is required if you are using one squadcast service per infra layer
    Routing: Alert Routing is required to send alerts to specific engineers in charge of respective infrastructure layers.

    Microservice Architecture for All-in-One Incident Response Teams

    Squads: One squad has to be created in Squadcast for the on-call team. Escalation policies will need to be created depending on the nature of your application.
    Services: Alert payloads will need to be customised to have information regarding the affected service and instance/node. This information will be used to add visible contextual information to the incident tags.
    Tagging: Tags will need to include information about the affected service and the instance or node.
    Routing: Routing is optional in this scenario.

    Microservice Architecture for Business Specific Teams

    Squads: One squad to be created in Squadcast for each team.
    Services: One service for each team. Alert payloads will need to be customised to have information regarding the affected service and instance or node. This information will be used to add visible contextual information to the incident tags.
    Tagging: Tags will need to include information about the affected service and the instance or node.
    Routing: Alerts will need to be routed to each specific squad. For example, if it’s an error related to the payment gateway monitoring service it has to be routed to that specific business team.

    Microservice Architecture for Infrastructure Layer Based Teams

    Squads: One squad to be created in Squadcast for each team looking after specific parts of the infrastructure.
    Services: One service connected to the infra layer for each team. Alert payloads will need to be customised to have information regarding the affected infra layer, and instance or node. Without this identifying information it will be much harder to fix issues.
    Tagging: Tags will need to include information about the affected service and the instance or node.
    Routing: Alerts will need to be routed to each specific squad.
    Note: All other scenarios involving this kind of setup can be modelled like this but is not recommended as you may lose some amount of analytics and access control capabilities.

    Conclusion: Depending on the nature of your infrastructure as well as the size and composition of your on-call staff, combinations of the above guidelines would be ideal for your organization. Initially, you may need to do several tests to determine the best way to model services in Squadcast depending on your specific needs. If you are a large organization with multiple interconnected services, our recommendations will assist you in implementing a framework that will optimize your alerting processes and help reduce your MTTR (Mean Time To Resolve).

    Our next blog in this series titled “Intelligent Incident Response”, will help you understand what needs to be done to mitigate impact or fix the issue with help of Squadcast and all the while ensuring that you learn from every incident, which should be the biggest takeaway from your Incident Response process.

    Squadcast is an incident management tool that’s purpose-built for SRE. Your team can get rid of unwanted alerts, receive relevant notifications, work in collaboration using the virtual incident war rooms, and use automated tools like runbooks to eliminate toil.

    squadcast
    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    March 31, 2021
    March 31, 2021
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQ
    More from
    Biju Chacko
    Scaling Site Reliability Engineering Teams the Right Way
    Scaling Site Reliability Engineering Teams the Right Way
    April 25, 2023
    What are Canary Deployments and Why are they Important?
    What are Canary Deployments and Why are they Important?
    August 25, 2022
    Classifying Severity Levels for Your Organization
    Classifying Severity Levels for Your Organization
    July 5, 2022

    How to configure services in Squadcast: Best practices to reduce MTTR

    How to configure services in Squadcast: Best practices to reduce MTTR
    Mar 31, 2021
    Last Updated:
    Mar 31, 2021

    With a rise in digital platforms, IT infrastructure has grown exponentially complex to a level where multiple application interdependencies coexist with varied architecture & oncall team types. This blog looks at how you can model your infrastructure in Squadcast to reduce your time to respond & resolve incidents.

    As an SRE of an organization with a rapidly growing infrastructure with several interdependencies, you may have struggled with configuring things on an incident management platform. If you have a smaller team with a monolithic architecture in place it is still relatively easier to connect the infrastructure to your incident management platform and create rules for escalations and alerting. But what happens if you have a large on-call team spread across time zones looking after the infrastructure that has hundreds of microservices running concurrently? How do you configure it all in your incident management platform while keeping in mind the load your on-call team will be under?

    Since most platforms let you create services that accept alerts from monitoring tools, should you create 100 such services for every component of your infrastructure?

    We will be tackling similar questions in this blog. But before we dive deeper, here are few things to be aware of.

    Q: What are the key aspects this article would be addressing?

    A: In this blog, we look at ways your team can configure incident management platform, in particular Squadcast, to ensure that you don't waste precious time responding to incidents.

    Q: What this article won't cover?

    A: Unfortunately, we cannot have a single solution that will work for every type of situation. This post seeks to provide some clarity to this problem. We have put together a set of best practices that should cover most production systems out there.

    Some of the concerns you may have while modelling your services are

    • Will I be alerted on time?
    • How to avoid irrelevant alerts?
    • Is the alert getting routed to the right person?
    • Am I getting alerts for the most critical pieces of my infrastructure?

    As a modern incident management platform, Squadcast aggregates and routes alerts from monitoring tools and provides a centralised dashboard for tracking and prioritising alerts along with taking action and ultimately resolving the incident (the latter part will be covered in our blog titled “Intelligent Incident Response Plan”). Owing to its flexible configuration capabilities, there are many ways you can set-up alerting for services within Squadcast.

    This blog takes into account the different kinds of infrastructure (monolithic/microservices or distributed) and types of on-call teams that are present.

    Unified Incident Response Platform
    Try for free
    Seamlessly integrate On-Call Management, Incident Response and SRE Workflows for efficient operations.
    Automate Incident Response, minimize downtime and enhance your tech teams' productivity with our Unified Platform.
    Manage incidents anytime, anywhere with our native iOS and Android mobile apps.
    Try for free

    Before we get started with the best practices, here are some Squadcast specific features that you need to know while configuring the platform.

    • Squads: These are groups of on-call engineers and non-technical users that can be organized by business function or technology.
    • Services: Services are a logical group of alert sources that can be tagged, deduplicated or routed to the right person/team. They are most commonly used to represent individual parts of your infrastructure. Please note that services can receive alerts from more than one monitoring tool.
    • Tags: Tags in Squadcast can be auto-created to include context rich information with alerts. You can create your own rules for tagging alerts.
    • Routing: Routing in Squadcast is used when you want alerts to be sent to someone who is not the default recipient. This is helpful when a specific part of your infrastructure is facing issues that require more specialised knowledge.
    • Escalation Policies: These policies see to it that a critical alert is never missed. You can configure them to ensure that the right users and squads are alerted at the right time.
    • On-call Rotations: On-call schedules are used to determine who will be notified when an incident is triggered. This helps you build a balanced on-call culture and ensures that no critical alerts are missed.

    These features provide the backbone for the best practices in alerting for your organisation. While the solutions described in this blog are generic, with a little tweaking, chances are they will work for you. We have tried to be as inclusive as possible while creating these best practices. Before we get started on modelling your system in Squadcast, here are the assumptions we are making about the alerting systems you have in place.

    Monitoring: We are assuming that you are already monitoring all the important aspects of your infrastructure. This includes alerting, metric collection, log aggregation and tracing/instrumentation practices. We are also assuming that you have a good mix of proactive, reactive and investigative alerts in place. Further, you have also categorised the alerts based on whether they are related to the infrastructure or to the application side(business dependent).

    Relevant Alerting: The alerts you have in place are linked to important parts of your infrastructure and are already optimised. This includes alerts that are actionable and not over sensitive (the right threshold). This also includes having the right deduplication rules in place to mitigate alert noise. We are also assuming that you can add identifying information to your alert payloads.

    Our recommendations assume that the alerting system you have in place presently is well suited to the type of business and tech stack that you are using.

    The way you model your system will depend on several factors. First we will be looking at the kind of architecture you have in place.

    Architecture

    For the purpose of this blog post, we will consider the following as different types of architecture that you may be using :

    • Monolithic Architecture: All of your core functionality is concentrated to a single executable application with related infrastructure dependencies like app server, databases, load balancers etc. Your SRE team is responsible for maintenance of this part of the infrastructure.
    • Distributed: A distributed architecture has multiple interdependent executable applications that intercommunicate with their related infrastructure dependencies. These may or may not be replicated. We will assume that the number of internal units is low enough, that they can be committed to memory
    • Microservices: A distributed architecture with a very large number of components. Due to the sheer number of these services, it is not feasible to create individual Squadcast services for each component.
    • Multiple Unrelated Applications: Though less commonly found, these can be treated as a special case of the types of architecture mentioned above. This scenario may come into being when you need an incident management system with a proprietary application framework that doesn't fit into any of the above. This kind of architecture may be seen in organisations that require compartmentalised applications for security or compliance reasons.
    • Kubernetes based architecture: Some types of alerts from this kind of infrastructure are eliminated or automatically resolved by Kubernetes itself. Other than this, there is no significant difference from a common microservice architecture.

    Response Team Organisation

    • All-in-One Incident Response Team: In this type of setup, all responders are organised into one team. Due to the nature of this setup it is possible to have lesser or negligible routing for alerts in your incident management platform.
    • Service based: For larger organisations with more complex infrastructure, each application may have a dedicated team. Each team maintains their application and the infrastructure it depends on. Some examples are:
      • Public API Team
      • Inventory Service Team
    • Infrastructure Layer based: This type of team organisation can be found in larger companies. In addition to application teams, there are teams that specialise in managing certain kinds of technology. Examples include
      • Inventory System Team
      • Database Team
      • Load balancer Team
      • Networking Team
    • L1/L2/L3 Teams:  In this system, teams are organised into first responders and escalation teams. This type or team organisation can be considered a special case of the types mentioned above and for the sake of simplicity, we will not be discussing these separately.
    Integrated Reliability Automation Platform
    Platform
    PagerDuty
    FireHydrant
    Squadcast
    Incident Retrospectives
    APM, Monitoring, ITSM,Ticketing Integrations
    Incident
    Notes
    On Call Rotations
    Built-In Public and Private Status Page
    Advanced Error Budget Tracking
    Try For free
    Platform
    Incident Retrospectives
    APM, Monitoring, ITSM,Ticketing Integrations
    Incident
    Notes
    On Call Rotations
    Built-In Public and Private Status Page
    Advanced Error Budget Tracking
    PagerDuty
    FireHydrant
    Squadcast
    Try For free

    Recommendations for Configuring Services

    Before we recommend the best way to configure Squadcast for your organization, please select the type of architecture and on-call team you have.

    1: What kind of architecture does your application have?
    Monolithic
    Distributed
    Microservice
    2: What kind of on-call team do you have in your organization?
    All-in-One Incident Response Teams
    Service Based Teams
    Business Specific Teams
    Infrastructure Layer Based Teams

    For the above choices, this is the ideal Squadcast configuration for your architecture and on-call team type.

    Monolithic Architecture with an All-in-One Incident Response Teams

    Squads: Creation of one squad in Squadcast is sufficient for this kind of architecture. This squad will have members of the on-call team or any non-technical stakeholder if required.
    Services: Creation of a single service in Squadcast will suffice and all alerts from monitoring tools can be sent to this service.
    Tagging: Event tagging is optional in this scenario.
    Routing: Alert Routing is not strictly necessary, unless you have an on-call team with varying levels of expertise.

    Monolithic Architecture for Service Based Teams

    Squads: Each business specific team will need their own squad with relevant escalation policies. One additional cross-functional team may be required for handling infrastructure related issues.
    Services: Individual services have to be created in Squadcast for each function/team specific area and one for infrastructure related issues.
    Tagging: Event tagging is optional in this scenario.
    Routing: Alert Routing is optional in this scenario.

    Monolithic Architecture for Infrastructure Layer Based Teams

    Squads: One team will be required for each infrastructure layer being monitored in the backend. Alerts will be sent to the team responsible for handling the incident.
    Services: You will need to create separate services in Squadcast for each layer of infrastructure being monitored.
    Tagging: Event Tagging is optional in this scenario.
    Routing: Alert Routing is optional in this scenario.

    Distributed Architecture All-in-One Incident Response Teams

    Squads: One squad has to be created in Squadcast. This squad will include all members of the on-call team or any other non-technical stakeholder.
    Services: One service needs to be created in Squadcast for each critical application service being monitored.
    Tagging: Event tagging is optional in this scenario.
    Routing: Alert routing is optional in this scenario.

    Distributed Architecture for Service Based Teams

    Squads: Multiple squads have to be created in Squadcast for respective business teams.
    Services: One service needs to be created in Squadcast for each application in the distributed system
    Tagging: Event Tagging is optional in this scenario.
    Routing: Alert Routing is optional in this scenario.

    Distributed Architecture for Infrastructure Layer Based Teams

    Squads: One squad has to be created in Squadcast for each infrastructure team. This is required since each team needs separate routing and escalation policies.
    Services: You have two options to choose from while configuring services for this type:

    • One squadcast service for each source.This will require information in the alert payload to distinguish whether it is application related or infra layer related.
    • Multiple services for each infrastructure layer and for application related alerts

    Tagging: Event Tagging is required if you are using one squadcast service per infra layer
    Routing: Alert Routing is required to send alerts to specific engineers in charge of respective infrastructure layers.

    Microservice Architecture for All-in-One Incident Response Teams

    Squads: One squad has to be created in Squadcast for the on-call team. Escalation policies will need to be created depending on the nature of your application.
    Services: Alert payloads will need to be customised to have information regarding the affected service and instance/node. This information will be used to add visible contextual information to the incident tags.
    Tagging: Tags will need to include information about the affected service and the instance or node.
    Routing: Routing is optional in this scenario.

    Microservice Architecture for Business Specific Teams

    Squads: One squad to be created in Squadcast for each team.
    Services: One service for each team. Alert payloads will need to be customised to have information regarding the affected service and instance or node. This information will be used to add visible contextual information to the incident tags.
    Tagging: Tags will need to include information about the affected service and the instance or node.
    Routing: Alerts will need to be routed to each specific squad. For example, if it’s an error related to the payment gateway monitoring service it has to be routed to that specific business team.

    Microservice Architecture for Infrastructure Layer Based Teams

    Squads: One squad to be created in Squadcast for each team looking after specific parts of the infrastructure.
    Services: One service connected to the infra layer for each team. Alert payloads will need to be customised to have information regarding the affected infra layer, and instance or node. Without this identifying information it will be much harder to fix issues.
    Tagging: Tags will need to include information about the affected service and the instance or node.
    Routing: Alerts will need to be routed to each specific squad.
    Note: All other scenarios involving this kind of setup can be modelled like this but is not recommended as you may lose some amount of analytics and access control capabilities.

    Conclusion: Depending on the nature of your infrastructure as well as the size and composition of your on-call staff, combinations of the above guidelines would be ideal for your organization. Initially, you may need to do several tests to determine the best way to model services in Squadcast depending on your specific needs. If you are a large organization with multiple interconnected services, our recommendations will assist you in implementing a framework that will optimize your alerting processes and help reduce your MTTR (Mean Time To Resolve).

    Our next blog in this series titled “Intelligent Incident Response”, will help you understand what needs to be done to mitigate impact or fix the issue with help of Squadcast and all the while ensuring that you learn from every incident, which should be the biggest takeaway from your Incident Response process.

    Squadcast is an incident management tool that’s purpose-built for SRE. Your team can get rid of unwanted alerts, receive relevant notifications, work in collaboration using the virtual incident war rooms, and use automated tools like runbooks to eliminate toil.

    squadcast
    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    March 31, 2021
    March 31, 2021
    Share this post:

    Subscribe to our latest updates

    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    In this blog:
      Subscribe to our LinkedIn Newsletter to receive more educational content
      Subscribe now
      FAQ
      Learn how organizations are using Squadcast
      to maintain and improve upon their Reliability metrics
      Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
      mapgears
      "Mapgears simplified their complex On-call Alerting process with Squadcast.
      Squadcast has helped us aggregate alerts coming in from hundreds...
      bibam
      "Bibam found their best PagerDuty alternative in Squadcast.
      By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
      tanner
      "Squadcast helped Tanner gain system insights and boost team productivity.
      Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
      Alexandre Lessard
      System Analyst
      Martin do Santos
      Platform and Architecture Tech Lead
      Sandro Franchi
      CTO
      Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
      Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
      What our
      customers
      have to say
      mapgears
      "Mapgears simplified their complex On-call Alerting process with Squadcast.
      Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
      Alexandre Lessard
      System Analyst
      bibam
      "Bibam found their best PagerDuty alternative in Squadcast.
      By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
      Martin do Santos
      Platform and Architecture Tech Lead
      tanner
      "Squadcast helped Tanner gain system insights and boost team productivity.
      Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
      Sandro Franchi
      CTO
      Revamp your Incident Response.
      Peak Reliability
      Easier, Faster, More Automated with SRE.
      Incident Response Mobility
      Manage incidents on the go with Squadcast mobile app for Android and iOS devices
      google playapple store
      Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
      Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
      Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
      Users love Squadcast on G2
      Copyright © Squadcast Inc. 2017-2024