In the fast-paced world of organizations handling numerous microservices and projects, tackling the challenges that arise can be a daunting task. As many of our customers come with infrastructures that included a large number of microservices we set out to make it easier for them to streamline alert source management.
Enter Global Event Rulesets (GER). This feature is designed to redefine the way you manage alerts. With GER, we're on a mission to simplify and streamline the setup process, allowing you to create rules that effortlessly direct alerts to the right service without the hassle of maintaining separate webhooks for each one.
Say goodbye to manual configurations and the complexities associated with alert management; GER's centralized approach is set to make your life easier and your alerts more efficient.
This feature is available to everyone who takes Squadcast on a spin via our 14 day free trial!
At present, within the Squadcast ecosystem, alert sources are linked to services through the utilization of service webhooks. This approach results in the aggregation of all events originating from diverse alert sources under a unified service umbrella. Once an alert source is integrated into a service, the automation process is initiated. Within this pipeline, a set of actions into motion, encompassing tasks like tagging, routing, deduplication, and suppression—all of which are executed at the service level.
The Global Event Rulesets (GER) feature redefines alert management by introducing a unified global ruleset. GER employs a unique endpoint featuring a shared routing key for each alert source. This innovation empowers users to establish a rule encompassing multiple alert sources, along with their corresponding configurations, ensuring that alerts are seamlessly routed to the designated service when specific criteria are fulfilled.
Here’s an example of a sample endpoint:
Example for Datadog:
After creating a ruleset, users gain the capability to incorporate alert sources and link individual rules to each specific alert source. When an event is received, the routing pipeline determines the most suitable service for the alert. Subsequently, automation rules are triggered upon the alert's arrival at the designated service, provided they have been configured.
Here's an illustrative example of how to establish a ruleset featuring multiple alert sources and their respective rules. Once the ruleset is associated with an alert source, users enjoy the flexibility of adding multiple rules for that source, each tied to a corresponding service for routing. Whenever a rule's expression evaluates as true, alerts are routed to the designated service.
To add new rulesets,
Step1: Navigate to Global Event Rulesets and Add New Ruleset.
Step2: Next, add the Ruleset Name, optional Description, and select the Ruleset Owner.
Step3: Click Save, and you're done.
This creates a new ruleset, and the next step is to add alert sources and start creating rules for your ruleset. If you would like to create multiple such rulesets, each with individual endpoints, repeat the above steps as needed.
To add alert sources to a ruleset,
Step1: Navigate to Global Event Rulesets and select the relevant ruleset from the list.
Step2: Click Add Alert Source and in the side panel, search and select the alert source you wish to create a rule for and click Add.
1. You can only add one alert source at a time.
2. Deleting an added alert source from the ruleset will result in all its rules getting deleted.
Event rules allow you to set actions that should be taken on events that meet your designated rule criteria. In the current version, the only action that the system takes is routing of incoming alerts.
To add rules for an alert source,
Step1: Navigate to Global Event Rulesets. Select the relevant ruleset from the list.
Step2: For your added alert source, click Add Rule.
Step3: In the side panel, provide a Rule Description and create the Rule Expression, referring to the payload data available on the right.
Step4: Lastly, designate the Service for routing when the rule expression is met. Click Save.
Please Note: You can create and manage up to 1000 rules for each alert source.
To manage the order of rule execution, simply use the arrows to rearrange the priority of these rules.
Important: If you intend to delete a Service in Squadcast that is associated with a Global Event Ruleset, please ensure that you delete the rule first. Otherwise, you will receive a warning message similar to the one described below.
Example Rule Expression:
Any alerts that are sent through event rules but do not match any are routed to the Service configured in the Catch All Rule. If the Catch All Rule is empty, the outlier alert is simply dropped from the system. Configuring this helps in making sure no alerts are missed, that is, every incoming alert ends up reaching a Service.
Please Note: This is not mandatory, but we highly recommend having this configured.
Step1: Navigate to Global Event Rulesets and Select the relevant ruleset from the list.
Step2: For your added alert source, click Add Catch All Rule and select a Service.
Step3: Click Save.
In conclusion, Global Event Rulesets play a vital role in enhancing alert management. It offers a more efficient, time-saving, and user-friendly way to handle alerts, catering to the needs of organizations dealing with a complex web of microservices. Embracing Global Event Rules means embracing simplicity and efficiency in the realm of alert routing.
We trust that our Global Event Rulesets feature will bring us one step closer to achieving our objective of offering the utmost user-friendliness, providing you with a seamless experience. We encourage you to take Squadcast on a spin via our 14 day free trial and give this new feature a try! Do share your thoughts or feedback in the comments or with our support team. Cheers!
Squadcast is a Reliability Workflow platform that integrates On-Call alerting and Incident Management along with SRE workflows in one offering. Designed for a zero-friction setup, ease of use and clean UI, it helps developers, SREs and On-Call teams proactively respond to outages and create a culture of learning and continuous improvement.