Effective Alert Routing, On-Call and Incident Response
SLOs and Error
Incident Analytics and
TRY SQUADCAST FOR FREE
BOOK YOUR DEMO
Squadcast made it to the "Best Software Award" in the IT Management category by G2 🎉 Read full report
🎉 We Are Hiring! 🎉
We Are Hiring!
Effective Alert Routing, On-Call
and Incident Response
Post Incident Review
SLOs and Error Budgets
Incident Analytics and Reliability
Mobile Incident Management
Product Manager at Squadcast
Squadcast way to resolve Incidents
TRY SQUADCAST for Free
schedule a demo
Subscribe to our latest updates
Enter your Email Id
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
On-call On-boarding Checklist
A humane on-call is the mark of good engineering culture. Access our free on-call onboarding checklist that can proactively help your on-call team & improve your overall on-call experience.
May 20, 2020
Best Practices in Incident Management
In an always-on world, companies look to systems & processes to keep their services up & running at all times. Squadcast latest post outlines a few best practices in incident management to restore services during unplanned downtime.
May 7, 2020
Configure an Intuitive Service Dashboard & Reduce Response Time
Leverage Multiple Alert Sources in Squadcast to reflect your actual system infrastructure on your Service Dashboard
April 30, 2020
What you should know about Squadcast + Grafana Integration
At Squadcast, we use Grafana and absolutely love it! This blog post talks about how you can use your Grafana data to set off alert triggers in Squadcast. Turbocharge your observability data in Grafana by making it actionable.
April 2, 2020
Incident Response in the time of Remote Work
The unexpected and sudden shift to remote working introduces a new set of problems within the incident response space. And while each organization needs to take its own unique circumstances into account, this post outlines the best practices and steps that can be taken in the right direction in keeping operations both productive and proactive.
March 26, 2020
Must Read DevOps & SRE Books for all Engineers
Here's a curated list of “Must Read” books specific to the Incident Management space, suggested by folks from the SRE and DevOps community to help you understand what changed their perspective of software engineering as a role.
March 24, 2020
Top Monitoring Tools for DevOps Engineers and SREs
Monitoring has moved from a simple proactive practice to a necessity on any product launch checklist. It is crucial to pick a tool that meets your observability needs & ensures reliability of your service to your customers.
March 18, 2020
Hrushikesh shares his journey into SRE and his thoughts on the future of this space
Hrushikesh is passionate about making a complex design with simple and reliable solutions. He is technology and platform agnostic and doesn’t believe in limiting himself to just a few. He started his career in 2006 with a Media company where he was responsible for introducing new technologies along with driving a team to deliver quickly. He does not limit his role to just development and operations and loves exploring everything in the tech space. He believes that SRE principles will revolutionize the way classic operations-driven organizations think.
March 5, 2020
Better Incident Response: Incident Classification & Setting Severities with Tags
Implementing an incident classification step in your incident management software and process can significantly bring down the MTTR and stress involved in the first few minutes of an incident.
February 20, 2020
Scheduling IT and Engineering on-call rotations just got easier
Introducing UI improvements to the on-call schedules and rotations feature on Squadcast.
February 13, 2020
Things to do to make on-call less stressful
Doing on-call management in a way that’s better, less stressful and actually works to improve your incident response processes, uptime & reliability
January 30, 2020
Hiteshwar shares his thoughts on being an SRE
Hiteshwar is an SRE based out of Mumbai, India. His area of specialization is in distributed systems. He works on Kubernetes, running his own custom clusters, maintaining them and creating tools to manage and monitor them. He is an active speaker in meetups and developer groups and also teaches DevOps and SRE practices at learning centers.
January 24, 2020
Arild Jensen from Upwork shares his thoughts on being an SRE
Arild Jensen, SRE Manager at Upwork, talks about his journey into SRE and some best practices he picked up along the way including implementing a blameless culture, code review and making decisions based on hard data.
January 17, 2020
What you can show on your status page
When something goes down, the first thing a customer does is check if there is something wrong with their systems or if it is an issue with one of their service providers. So it’s important to make sure that your status page has all the information that is needed where they don’t feel the need to raise an issue or create a ticket, adding to your support costs.
January 14, 2020
Using a Status Page in your Incident response process
Status pages can be used in different forms for internal or external communication which aligns all teams towards a culture of transparency, both with your customers and outside stakeholders as well as your colleagues and peers.
January 10, 2020
Reducing On-call Alert Fatigue with Deduplication
Alert noise is a very common on call complaint leading to fatigue and on call burnout. This article is an attempt at helping folks address this problem.
January 8, 2020
Squadcast's Year in Review, 2019
It’s the end of a decade and this year has been nothing short of great with accelerated product adoption, team growing 2x in size, a platform full of features and a heart full of happiness!
December 31, 2019
How to avoid on-call burnout
Incident management is stressful. Even more so, during the holidays. This is a checklist of things to watch out for to make sure your on-call team remains calm if an incident were to occur.
December 20, 2019
Danny Mican on his experience as an SRE at Auth0
Danny Mican, an SRE from Auth0 shares his thoughts on SRE and being SLO driven to deliver outstanding customer experiences. Danny currently manages the reliability of systems that authenticate over 2.5 billion logins per month and is expected to have 99.9% (3 Nines) availability.
December 2, 2019
Pavlos Ratis shares his experience on being an SRE
Pavlos is a Site Reliability Engineer based in Munich, Germany. He likes building software and expanding his knowledge around the reliability of services and their infrastructure. He has created a few open-source SRE projects such as the awesome-sre, Wheel of Misfortune, Availability Calculator, and awesome-chaos-engineering to assist teams and individuals in getting on board with the SRE culture. Recently, he was invited to be a technical reviewer of the "Real World SRE" book by Nat Welch, where he offered suggestions regarding the content of the book.
November 13, 2019
Managing technical risk effectively with Error Budgets
Tradeoffs are hard. Think about the time when you had to choose between two equally compelling options - (a) addressing technical debt or (b) pushing out that long-awaited feature release, and risk breaking production. Or when your team couldn’t agree on where to draw the line on improving request latency versus shipping a major new update.
October 14, 2019
Mark Henderson from Stack Overflow shares his experience on being an SRE
Mark Henderson has been a Site Reliability Engineer at Stack Overflow since 2015. Before this he worked as the sole systems administrator at a small software company in Sydney, Australia. These days, he lives in South Australia and works from home with his wife and two children.
July 11, 2019
IT Incident Management
Submit a Ticket
Copyright © Squadcast Inc. 2017-2021