And it starts with the company culture. Irrespective of how small or large your team is, it’s wise to invest some time in creating a good on-call onboarding plan. A humane on-call is the mark of a good engineering culture.
Being on-call means that you’re expected to be reachable for any issues that may occur during your shift. It’s easy to lose any and all motivation by just anxiously anticipating that mid-dinner ping.
While on-call is not meant to be a fun activity, it definitely doesn’t have to be so dreadful. In our conversations with some on-call teams - both small and large, we found out that only a very small percentage of teams had some idea about what it means to be on-call before they first went on-call.
This doesn’t necessarily strike as an important thing to fix, simply because the effects of a bad on-call rotation almost always slips between the cracks with other perceivably more important engineering, product and business tasks to get to. However, it is neither sustainable nor humane to push on-call duty on someone without first onboarding them to set the right expectations and provide useful information. This can immensely help the on-call team to tackle what can otherwise be, as mentioned above, dreadful.
It can get overwhelming when you have to design an onboarding plan from scratch. Always start small. You can continue to iterate on this to make it better fit your evolving processes.
Here’s a simple checklist we created while first starting off our on-call on-boarding process. If you’re planning this for your on-call team, it’s best to get them involved from the start to pitch in and improve as it is being built.
Access our Free On-call Onboarding Checklist here. Please note that you can choose to use this directly or tweak it to fit your current processes and needs.
Squadcast is an incident management tool that’s purpose-built for SRE. Create a blameless culture by reducing the need for physical war rooms, unify internal & external SLIs, automate incident resolution, and create a knowledge base to effectively handle incidents.