On-call Engineers are the first line of defense when an outage occurs ensuring customer-impacting services are quickly noticed & resolved. Our latest blog outlines some of the use cases that can help to avoid the pitfalls in organization’s on-call rotation. Also, get access to our Free On-call Onboarding Checklist
Typically an organization’s first step towards committing to reliability for customers and users is the practice of having an on-call rotation wherein on-call engineers are the first line of defense, ensuring customer-impacting outages are quickly noticed and resolved.
At Squadcast we use our own on-call rotation using our platform that supports many custom on-call rotation types, including follow-the-sun, daily, weekly, or split shift rotations along with capabilities to create multiple scheduling layers within a single schedule.
Below, we’ve highlighted some of the use cases highlighting how on-call responsibilities can be viewed with on-call schedule scripts that would help knowing who is on-call right now, along with knowing which Slack channel to send on-call notifications to for a particular schedule.
Get the list of people who are on-call for a given Squadcast Schedule across all the shifts, at the time of execution of the script and send it to a slack channel.
Please note, this script doesn’t categorize users according to shift-name, i.e. doesn’t say which user is on-call for which shift.
It makes use of the Who is On-Call API of Squadcast to get the list of people on-call for a given time.
The script takes a configuration JSON file as its command-line argument which looks something like the following:
Open a terminal window and execute the following command to clone the repo.
NOTE: If you are the Account Owner or an Admin of the Squadcast account, then follow the steps to obtain a refresh token. If you are a Stakeholder or a User, get in touch with the Account Owner or Admin to get the token.
i. Login to your Squadcast Dashboard and go to the profile page.
ii. Here, you’ll find an API Access Token. If you haven’t already created one, click on Generate new API token.
iii. Copy the token generated to the value corresponding to REFRESH_TOKEN key in the params.json file.
i. Select Schedules from the Sidebar within the platform
ii. On the right side of the screen, you can see the list of different schedules created. Copy the name of the Schedule for which you want to get the list of on-call people to the params.json file.
i. Open your Slack app.
ii. Scroll the sidebar on the left to get to the Apps section. Click on the ‘+’ button beside Apps.
iii. Search “Webhook” in the search box. Click on the Add button of the Incoming WebHooks.
iv. You’ll be redirected to a webpage which looks something like the picture below. Click on the Add to Slack button.
v. You’ll be redirected to a screen where you’ll be asked to select which Slack channel the webhook will send messages to. Click on the Add Incoming WebHook Integration once done.
vi. In the next screen, you’ll get your Webhook URL. Copy it to the params.json file.
Now once we have the params.json file ready, next we’ll build the script and set-up the cron.
You’ll need golang installed in your system in-order to build this script.
If you don’t have it already, you can download here
To build the script, go to the directory of the script and execute the command
Once the build is successful, you’ll see a binary named oncall in the current working directory. To execute this binary, you’ll have to specify the path to the params.json file containing all the parameters we have configured in the previous steps. The command to run would be
Now the script itself doesn’t take care of the configuration for when to execute this. That needs to be handled externally. So, in this blog, we’ll go over how you can set-up a cron-job in a linux system in-order to do so.
Let’s say that the on-call hand-off at an organization happens every Monday at 12pm. So, it makes sense to run this script at that time. In-order to do so, the steps would be:
i. Open a terminal and execute the following command
ii. A file editor would open in the terminal. Paste the following line in the file and save it.
For more details on how to use the crontab command, you can checkout its manpage by running the command
You may also refer to these blogs as well.
NOTE: However you automate the invocation of the script, make sure that you don’t invoke the script very frequently. As a rule of thumb, avoid invoking the script more than once in a time window of 1 hour.
Once the cron is setup, you’ll get notifications to the selected Slack channel about the new on-call users at the time of invocation of the script.
In the script, you just have to replace the NotifySlack function with your own written NotifyHangouts or NotifyTeams functions and you’re good to go. You’ll have to refer to the API documentation of the respective chat-ops platform for details regarding setting up.
Currently, our API doesn’t support grouping by shift names. Later when we support it in our API, we’ll update the script as well.
You can simply have multiple params.json files with different configurations and configure the cron-job to call the script with multiple different configurations.
For example, if you have 2 params.json files having absolute paths /home/params1.json & /home/params2.json with the binary at /home/oncall, then you can configure the crontab file as
With this, you have a flexibility to do a lot of things:
As would be apparent by now, the notifications sent to Slack are done by polling our Squadcast API. It's not event-driven by some hook. Incase of event-hook driven notification, each time there is any change in schedule, an event would be generated and the hook would be notified upon which it would take action. Supporting event-hooks is part of our product roadmap.
Once event hook notification is available the advantages are:
i. The user won’t have to worry about when to run the script.
ii. Incase of irregular sudden changes to the schedule, you can still get the on-call users notification. But in case of polling, it is done at regular intervals. Hence any irregular changes will be missed.
Streamline on-call management for any kind of rotation type or team with Squadcast. Here’s a simple checklist we created while first starting off our on-call on-boarding process. If you’re planning this for your on-call team, it’s best to get them involved from the start to pitch in and improve as it is being built. Access our Free On-call Onboarding Checklist here. Please note that you can choose to use this directly or tweak it to fit your current processes and needs.
We hope these scripts enable you to formalize your on-call rotation process to make it as easy as possible for your team to respond to issues.We’d love to hear from you on other best practices that can be followed for an improved on-call experience.
Squadcast is an incident management tool that’s purpose-built for SRE. Your team can get rid of unwanted alerts, receive relevant notifications, work in collaboration using the virtual incident war rooms, and use automated tools like runbooks to eliminate toil.