Storage systems are an integral part of IT infrastructure. Given that modern markets are highly competitive and demanding, businesses strive for 24/7 availability. This in turn sets higher expectations for storage systems to be operational all the time. But just like other IT components, even storage systems are prone to incidents. Hence, it is important to have an efficient communication process, to manage alerts during system failures/disasters.
In this blog post, we will discuss how OSNexus QuantaStor and Squadcast integrate to help route critical alerts to the right users for timely action and ensure minimum downtime and high availability.
Traditionally, businesses have relied heavily on email as a medium for alerts during a crisis, and it is central to numerous business operations. However, in modern times, email-based alerting is primarily used for non-urgent communications; and it’s not a reliable source of notification for critical infra-related issues.
Today’s IT environments are complex, and they are more prone to system failures and downtime, which may have a serious impact on business operations. This is where 'Email' as a medium for alerting may struggle. Here are some common challenges that are associated with email-based alerting:
Hence, we see businesses investing heavily in alert mechanisms for better alerting and monitoring. This is where OSNexus Quantastor can help. The storage management platform uses an internal component which takes charge of the entire communication, and integrating it with Squadcast gives users added benefits of numerous incident response features.
Before we jump into the integration benefits, here’s a quick brief of the capabilities of each platform.
Squadcast is an end-to-end incident response tool, built with an SRE mindset. It streamlines all the incident response activities and aligns teams towards a common organizational goal of better reliability.
With distinctively configurable features, the platform facilitates on-call teams to streamline high-priority alerts and stay productive. Here are some of the features:
OSNexus QuantaStor is a storage management platform that enables organizations to replace traditional SAN/NAS systems with standard servers to deliver robust, reliable, and highly scalable object, file, and block storage solutions that are easy to manage.
The QuantaStor storage platform monitors the entire stack from the software to the hardware so that IT departments are quickly notified when maintenance is required, be it a simple failed hot-swap power supply to bad media, or thermal issues. QuantaStor notifies IT departments through an internal component called the Alert Manager which is in charge of communicating alerts to IT staff via ITSM (IT Service Management) modules and/or e-mail.
With Squadcast you can create multiple on-call schedules in minutes. Add users to it and set up rotation policies to align with your business needs. Squadcast also allows you to take actions directly from within the platform, like acknowledging or resolving an incident.
All this with just a tap, thus making it easy to do tasks that are otherwise manual and repetitive.
An important step towards better incident management is adding enough context to incident alerts when the incidents get detected.
There are other factors like the urgency required in resolving an incident, or how the incident can affect other parts of the system, that may not be taken into account while assigning the severity of an incident. Some incident management tools attempt to solve this by adding other forms of classification, like incident urgency and incident priority. Most solutions only allow for incident severity as the form of classification and in some cases, this is done manually instead of automatically assigning the severity levels based on the incoming alert context.
With Squadcast, there is an added layer of flexibility that lets you define rules to classify alerts as Sev-1 or Sev-2 or Sev-3. This rule-based auto-tagging system in Squadcast allows you to classify incidents as and when they are raised, thus making the alert notification more context-rich.
Teams can level up their reliability and transparency with Squadcast’s free public Status Page. Status Pages (either Public or Private) can help communicate the status of your services internally to other teams or externally to your customers/stakeholders at all times. This can be done by configuring your services and their dependencies to show their status in real-time on the Status Page.
Users can customize and refine tagging rules to prioritize alerts by attaching severities to each incident. After tagging, each alert can then be routed to a specific user group or escalated to the concerned team, enabling a faster response.
Alerting very specific personnel is sometimes highly critical. By using tags, alerts can be immediately forwarded to a specific team/personnel with the help of Routing Rules. These flexible conditional routing rules are based on incident properties and with multiple diverse notification modes, they can eliminate alert fatigue, resulting in faster time to detect and resolve. For instance, when there is a firewall breach detected, the Infrastructure Security team can be immediately alerted/notified.
Squadcast offers a reporting and analytics feature to help track a team’s performance in acknowledging and resolving alerts. You can visualize and analyze the distribution of incidents across services for a specified period of time, along with the service status.
With the growing number of incidents, many patterns will emerge to double down on frequent issues. You can do exploratory data analysis using graphical representations and understand more about past incidents. This data can also be exported by filtering based on Tags that the incidents carry, such as Severity Tags, Alert Source, Status, Date & Time, etc.
Step1: From the navigation bar in Squadcast, on the top left corner pick the applicable Team from the Team-picker and select Services. Next, click on Alert Sources for the applicable Service.
Step2: Search for OSNexus QuantaStor from the Alert Source drop-down and copy the Webhook URL.
Step1: In the dashboard, click on Alert Manager.
Step2: Click on ITSM Integrations. In the Module drop-down, select Squadcast. Paste the previously copied Squadcast Webhook URL and click on Add.
Step3: Click on Generate Test Alert and click OK.
This should generate a test alert from OSNexus QuantaStor, in turn triggering an incident in Squadcast. This will confirm that the integration is working as expected.
That’s it, you are good to go! Whenever an alert is generated in OSNexus QuantaStor, an incident will be created in Squadcast. When the alert is closed in OSNexus QuantaStor, the corresponding incident will automatically be marked as resolved in Squadcast as well.
The combination of OSNexus QuantaStor and Squadcast will help drive critical alerts to users in a timely manner and improve your overall incident response.
If you are interested in leveraging these two tools, have other best practices to share or just need help with the integration set-up, feel free to drop a line to the Support Team from Squadcast or OSNexus Quantastor. To try QuantaStor for free, visit osnexus.com.
Squadcast is an incident management tool that’s purpose-built for SRE. Get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. Work in collaboration using virtual incident war rooms and use automation to eliminate toil.