Ready to switch? Discover how easy it is to migrate to Squadcast today! Learn More.
Blog
Incident Response
Better Incident Response: Incident Classification & Setting Severities with Tags

Better Incident Response: Incident Classification & Setting Severities with Tags

Prakya Vasudevan
February 20, 2020
Better Incident Response: Incident Classification & Setting Severities with Tags
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

Implementing an incident classification step in your incident management software  and process can significantly bring down the MTTR and stress involved in the first few minutes of an incident. 

How to implement Incident classification? 

Apart from setting up on-call schedules and adopting best practices on how to handle various kinds of incidents, incident management also has to do with constantly refining processes and benchmarks to ultimately achieve higher system reliability. One way of refining processes is making use of incident classification like that of incident severities. 

Every team has their own unique way of defining severities. But this evolves once they have a basic classification framework for defining the severity of an incident. The most common starting point is the SEV 1 - SEV 5 scale, outlined below: 

  • SEV-1 incidents are those that are critical and have a very large impact on the customer experience. Typically major incidents that cause outages hindering product or service usability for a large percentage of the customers. 
  • SEV-2 incidents are also critical in nature but are less severe in comparison with SEV-1 incidents. Incidents that impact a smaller percentage of customers and impede product usage nevertheless come under SEV-2. 
  • SEV-3 incidents are those that can be minor but may have a significant impact if not addressed immediately. These may be incidents that involve degradation of product stability but may not impact product usage right away. 
  • SEV-4 incidents are minor incidents that indicate that the product is not performing to the required standard but needn’t necessarily impact product usability. 
  • SEV-5 incidents are minor bugs that need to be fixed but don’t affect the product usability. 

However, there are other factors like the required urgency in solving the incident, or how the incident can affect other parts of the system that may not be taken into account while assigning the severity of an incident. Some incident management tools attempt to solve this by adding other forms of classification like incident urgency, and incident criticality. Many solutions only allow for incident severity as the one form of classification and in some cases, this is done manually instead of automatically assigning the severity levels based on the incoming alert context. 

There’s a clear opportunity to improve incident response processes with better incident classification. If implemented the right way, this can bring down MTTR significantly and also provide an opportunity to reduce the toil involved with routing manually and also adds more context to an incident during the primary analysis. 

At Squadcast, we chose to add more flexibility to this process by creating a custom rule-based auto-tagging system instead of having just a dropdown to manually select or assign tags. We basically define tags as key-value pairs for eg. the key could be severity and the possible values could be SEV0, SEV1, SEV2, etc. or the key could be Team and the possible values could be Backend, Frontend, Database, etc. With the Tagging and Routing features in Squadcast, you can set  pretty much any kind of custom tags which will be automatically assigned based on the rules you define on top of the attributes being passed in the incident payload. You can then use these tags to set routing rules ensuring that the right responder is notified at the right time to bring down the resolution time. 

Introducing Part 2 of the Kevin Series, we illustrate how to use tags to set severities in Squadcast. We have more use-case based articles lined up to show you other ways to implement incident classification using tags - stay tuned! 

P.S. In case you were wondering, Kevin has previously also set up his own alert deduplication rules to reduce alert noise in Squadcast. 

Severities and Auto-Routing with Incident Tags

It's February 13th on a warm afternoon and Kevin is lazily dreaming about how his date is going to pan out the next day. His dream is suddenly disrupted by a torrent of database incidents that pour in. What's more annoying is that most of them are not particularly critical or even related to the class of issues he generally handles. 

Kevin’s got a new ringtone for incidents. Love Me Do, in keeping with the Valentine spirit.

Also, he works with Kai, who is expected to handle all the low-severity incidents and typically everything that comes in with regard to query optimization.

Kevin realised that he could be spending his time more effectively by

  • Classifying his incidents by assigning the type or class of incidents that they fit into
  • Assigning severity to get to critical incidents faster
  • Automatically route incidents based on tags to ensure that the right responder is alerted

This would allow more time for Kevin’s day dreaming!

Given that they work in a relatively small company where on-call rotations are rather erratic or handled by both when fires happen, he decided to make this process a whole lot better by simply routing more efficiently. 

Plus, anticipating the same barrage of incidents while he’s on his date tomorrow, he decides to take matters into his own hands. He sees that the database incident is a query optimisation based incident. And not even a severe one at that, based on the visited_returned_ratio value in the payload.

	
    {  
      "payload": {    
        "id" : 23,    
        "issue" : "SLOW_QUERY_PERF",    
        "metric" : {      
          "visited_returned_ratio" : 1300.2334,      
          "time_interval" : 10	  
        },    
        "summary" : "Slow query performance",    
        "cluster_name" : "cluster-prod-0-awsumdb",    
        "cluster_id" : 9,    
        "hostnames" : [      
          "rpl0-awsumdb.cluster-prod-0-awsumdb.db.com",      
          "rpl2-awsumdb.cluster-prod-0-awsumdb.db.com"	  
        ],    
        "link" : "",    
        "created" : "2020-02-13T13:00:00.116Z",    
        "status" : "open"  
      }
    }
  

He then writes a rule to auto-add tags to the incident to add more context to it and classify it better

Rule: re(payload.issue, "QUERY") && payload.metric.visited_returned_ratio < 5000


Tags assigned:

  • issueType : optimisation
  • severity : low

Finally, now he's done ensuring that at least the incidents are classified. With a satisfied smug, he sits back and admires his work of art. A quick thought jumps through his head and he rubs his hands in devious mischief.

He now uses routing rules and the issueType tag to automatically route it to the right person going forward. In this case, to Kai. So that Kevin does not get disturbed for these kinds of issues anymore.

Kevin thoughtfully arrives at the conclusion that this is quite possibly the best gift he could give to his single friend on Valentine's day.

Infact, he believes he has cracked the "gifting" secret code for any occasion, for his on-call team members (flaunts an evil grin)

Written By:
Prakya Vasudevan
Prakya Vasudevan
February 20, 2020
Incident Response
Incident Management
Best Practices
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Better Incident Response: Incident Classification & Setting Severities with Tags

Feb 20, 2020
Last Updated:
July 15, 2024
Share this post:
Better Incident Response: Incident Classification & Setting Severities with Tags

What you absolutely must know when responding to an incident is what kind of impact it has on customers and how negatively it can affect your team. This is typically addressed by following some kind of incident classification, usually “incident severity levels”, to indicate the importance of  every incident - that is, to understand how seriously various stakeholders are affected and to route the incident differently if necessary. One should note that Incident Classification is not used to determine the root cause or find the resolution. 

Table of Contents:

    Implementing an incident classification step in your incident management software  and process can significantly bring down the MTTR and stress involved in the first few minutes of an incident. 

    How to implement Incident classification? 

    Apart from setting up on-call schedules and adopting best practices on how to handle various kinds of incidents, incident management also has to do with constantly refining processes and benchmarks to ultimately achieve higher system reliability. One way of refining processes is making use of incident classification like that of incident severities. 

    Every team has their own unique way of defining severities. But this evolves once they have a basic classification framework for defining the severity of an incident. The most common starting point is the SEV 1 - SEV 5 scale, outlined below: 

    • SEV-1 incidents are those that are critical and have a very large impact on the customer experience. Typically major incidents that cause outages hindering product or service usability for a large percentage of the customers. 
    • SEV-2 incidents are also critical in nature but are less severe in comparison with SEV-1 incidents. Incidents that impact a smaller percentage of customers and impede product usage nevertheless come under SEV-2. 
    • SEV-3 incidents are those that can be minor but may have a significant impact if not addressed immediately. These may be incidents that involve degradation of product stability but may not impact product usage right away. 
    • SEV-4 incidents are minor incidents that indicate that the product is not performing to the required standard but needn’t necessarily impact product usability. 
    • SEV-5 incidents are minor bugs that need to be fixed but don’t affect the product usability. 

    However, there are other factors like the required urgency in solving the incident, or how the incident can affect other parts of the system that may not be taken into account while assigning the severity of an incident. Some incident management tools attempt to solve this by adding other forms of classification like incident urgency, and incident criticality. Many solutions only allow for incident severity as the one form of classification and in some cases, this is done manually instead of automatically assigning the severity levels based on the incoming alert context. 

    There’s a clear opportunity to improve incident response processes with better incident classification. If implemented the right way, this can bring down MTTR significantly and also provide an opportunity to reduce the toil involved with routing manually and also adds more context to an incident during the primary analysis. 

    At Squadcast, we chose to add more flexibility to this process by creating a custom rule-based auto-tagging system instead of having just a dropdown to manually select or assign tags. We basically define tags as key-value pairs for eg. the key could be severity and the possible values could be SEV0, SEV1, SEV2, etc. or the key could be Team and the possible values could be Backend, Frontend, Database, etc. With the Tagging and Routing features in Squadcast, you can set  pretty much any kind of custom tags which will be automatically assigned based on the rules you define on top of the attributes being passed in the incident payload. You can then use these tags to set routing rules ensuring that the right responder is notified at the right time to bring down the resolution time. 

    Introducing Part 2 of the Kevin Series, we illustrate how to use tags to set severities in Squadcast. We have more use-case based articles lined up to show you other ways to implement incident classification using tags - stay tuned! 

    P.S. In case you were wondering, Kevin has previously also set up his own alert deduplication rules to reduce alert noise in Squadcast. 

    Severities and Auto-Routing with Incident Tags

    It's February 13th on a warm afternoon and Kevin is lazily dreaming about how his date is going to pan out the next day. His dream is suddenly disrupted by a torrent of database incidents that pour in. What's more annoying is that most of them are not particularly critical or even related to the class of issues he generally handles. 

    Kevin’s got a new ringtone for incidents. Love Me Do, in keeping with the Valentine spirit.

    Also, he works with Kai, who is expected to handle all the low-severity incidents and typically everything that comes in with regard to query optimization.

    Kevin realised that he could be spending his time more effectively by

    • Classifying his incidents by assigning the type or class of incidents that they fit into
    • Assigning severity to get to critical incidents faster
    • Automatically route incidents based on tags to ensure that the right responder is alerted

    This would allow more time for Kevin’s day dreaming!

    Given that they work in a relatively small company where on-call rotations are rather erratic or handled by both when fires happen, he decided to make this process a whole lot better by simply routing more efficiently. 

    Plus, anticipating the same barrage of incidents while he’s on his date tomorrow, he decides to take matters into his own hands. He sees that the database incident is a query optimisation based incident. And not even a severe one at that, based on the visited_returned_ratio value in the payload.

    	
        {  
          "payload": {    
            "id" : 23,    
            "issue" : "SLOW_QUERY_PERF",    
            "metric" : {      
              "visited_returned_ratio" : 1300.2334,      
              "time_interval" : 10	  
            },    
            "summary" : "Slow query performance",    
            "cluster_name" : "cluster-prod-0-awsumdb",    
            "cluster_id" : 9,    
            "hostnames" : [      
              "rpl0-awsumdb.cluster-prod-0-awsumdb.db.com",      
              "rpl2-awsumdb.cluster-prod-0-awsumdb.db.com"	  
            ],    
            "link" : "",    
            "created" : "2020-02-13T13:00:00.116Z",    
            "status" : "open"  
          }
        }
      
    

    He then writes a rule to auto-add tags to the incident to add more context to it and classify it better

    Rule: re(payload.issue, "QUERY") && payload.metric.visited_returned_ratio < 5000


    Tags assigned:

    • issueType : optimisation
    • severity : low

    Finally, now he's done ensuring that at least the incidents are classified. With a satisfied smug, he sits back and admires his work of art. A quick thought jumps through his head and he rubs his hands in devious mischief.

    He now uses routing rules and the issueType tag to automatically route it to the right person going forward. In this case, to Kai. So that Kevin does not get disturbed for these kinds of issues anymore.

    Kevin thoughtfully arrives at the conclusion that this is quite possibly the best gift he could give to his single friend on Valentine's day.

    Infact, he believes he has cracked the "gifting" secret code for any occasion, for his on-call team members (flaunts an evil grin)

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    February 20, 2020
    February 20, 2020
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Prakya Vasudevan
    On-call On-boarding Checklist
    On-call On-boarding Checklist
    May 20, 2020
    Best Practices in Incident Management
    Best Practices in Incident Management
    May 7, 2020
    Configure an Intuitive Service Dashboard & Reduce Response Time
    Configure an Intuitive Service Dashboard & Reduce Response Time
    April 30, 2020
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.