🚀 AI Generated Incident Summaries Feature is Now Live! See it in action! 🎉
Blog
SRE
Most frequently asked questions surrounding Google’s Cloud Operations Sandbox

Most frequently asked questions surrounding Google’s Cloud Operations Sandbox

July 29, 2021
Most frequently asked questions surrounding Google’s Cloud Operations Sandbox
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE. It simulates all the behavioural complexities of a real GCP(Google Cloud Platform) environment, so that budding SREs can practice hands-on while learning SRE best practices.

The core skills you need to become a good SRE are:

  1. Observability of complex microservice-based cloud environments
  2. Performing quick root-cause analysis when things go wrong
  3. Automating rollbacks and monitoring deployments
  4. Tracking SLOs, SLIs over a time period
Architecture of the demo application provided with the sandbox
Image Source

With Cloud Operations Sandbox, you can get started and take the first steps into SRE expertise and answer the question, ‘Will it work in my production environment’? We have compiled a list of FAQs related to the Google SRE Sandbox and answered them below.

Q: What are the major features of the sandbox?

While the sandbox has many features, in this blog we will be focusing on observability, root cause analysis, simulating user traffic and SLO/SLI tracking. The features in the sandbox used for learning about these are Cloud Tracing, Locust artificial load generator, cloud profiler, cloud debugger and SRE recipes.

Q: Can I track custom SLOs and SLAs with the sandbox?

The demo application that comes with the sandbox has microservices that are pre-instrumented with logging, monitoring, tracing, debugging, and profiling capabilities. In the screenshot shown below you can see how Service Level Indicators(SLI)s can be defined for the demo app.

Defining SLIs in the Google Sandbox
Image Source

You can pick SLIs based on availability, latency or even define your own custom metric for the demo application.

If you have instead chosen to track SLIs for your replicated production environment you will need to instrument the services separately.

Q: Which module is used to simulate traffic in the sandbox?

The artificial load generator used by the sandbox is Locust. Locust is mainly used for testing the load-bearing abilities of your infrastructure. With Locust you can define artificial user behaviour using Python code. Locust allows performing load tests by simulating upto millions of concurrent users.

User Interface of the locus load generator
Image Source

Below you will find a code-snippet with the python code used to simulate the behaviour of a user.

from
locust
import
HttpUser, between, task
class
WebsiteUser
(
HttpUser
):
    wait_time = between(
5
,
15
)
    
def
on_start
(
self
):
        self.client.post(
"/login"
, {
            
"username"
:
"test_user"
,
            
"password"
:
""

        })

Q: What is ‘Google cloud debugger' and how does it work in the sandbox?

You may have noticed many instances where an issue faced in production, cannot be reproduced in the test environment for root cause analysis. To discover the underlying cause, you must either go into the source code or add more logs to the program when it is running in the production environment. The Cloud Debugger allows developers to debug code during execution using real-time request data.

Developers have the option of utilising the Cloud Debugger to debug a running application using real-time request data. Breakpoints and log points may be defined while viewing the project. A snapshot of the process state is taken when a breakpoint is hit, so you may examine what went wrong.

With the Cloud Debugger, adding a log statement to a running project doesn't result in slowed performance. Typically, this would need re-deploying the program/code, with all of the risks that are involved for production deployment.

Q: What is ‘Google cloud profiler’ and how can it help me?

You can use Cloud Profiler to perform statistical testing on your application. It collects statistical information on CPU usage, heap size, threads and so on depending on the programming language used. You may utilise the Profiler UI charts to identify performance gaps in your application code.

Once you have installed the Profiler library, you do not have to write any profiling code in your application; all you have to do is make the Profiler library available (the method depends on the language). This library will generate reports and allow you to conduct various analyses.

Note that if you are not using the demo application the profiler has to be configured to work with the related microservice.

Q: What are the tools available to learn tracing across Sandbox?

Cloud Trace allows developers to examine distributed traces by graphically revealing request latency bottlenecks. Developers gather the trace information by instrumenting the application code. Traces also include environmental information added to the Cloud Logging records. The sandbox provides openCencus and OpenTelemetry to learn tracing within the platform.

The solution the sandbox uses for instrumenting is OpenCensus. The OpenCensus project is open-source and offers trace instrumentation in many languages. Furthermore, it enables the trace data to be exported to Google Cloud Operations dashboard. To examine the data, you may utilise the Cloud Trace UI.

Clicking on a trace in the timeline will give you a more detailed view and breakdown of the traced call and the subsequent calls that were made.

Q: Can I replicate my production/staging environment in the sandbox?

Your production/staging environment can be replicated if it is hosted on GCP(Google Cloud Platform).

Q: Can I check for observability of my replicated environment?

The sandbox has a demo application(hipster shop) that comes pre-instrumented with observability. If you are using your own environment, you will need to instrument your microservices accordingly.

Q: Can I send alerts to an external platform?

As of now the demo sandbox has an inbuilt incident management system with basic functionality. Sending alerts to an external platform can be done after creating a custom module.

Q: How much does the Sandbox cost?

The sandbox is provided free of charge. However, since it can only be used on the Google Cloud Platform(GCP) platform, any computing resources consumed will be billed.

Q: Can I improve my MTTR(Mean time to Respond) with the sandbox?

The sandbox has a feature called “SRE recipes” that auto-generates issues in your environment. It is a good way to learn the skills to fix things in production. It is important to note that SRE recipes will only be working in the demo application provided with the sandbox. You will need to create your own scripts to auto-generate problems in your custom setup. By practicing, SREs can get better at fixing issues in production and reducing the MTTR(Mean time to respond) to incidents.

Q: Can I test the performance of my production environment in the sandbox?

Yes. The sandbox environment can be used to test your production environment since it has a tool to generate synthetic traffic. However, the sandbox does not have any tools for thorough unit testing and performance testing.

Q: What new features will be added to the sandbox?

Runbooks are expected to be added in the sandbox in the near future. Creating effective runbooks is an important skill all SREs need to acquire.

Conclusion

The SRE sandbox is a great place to test out your skills for becoming a better SRE. To be effective in their work, SREs need expertise in the areas of observability, performance testing and distributed architecture. The sandbox provides a way for budding SREs to test out different scenarios. Some possible scenarios include checking the performance of your application under different user loads, getting better at resolving critical issues and testing out different on-call strategies.

Written By:
July 29, 2021
Nir Sharma
Nir Sharma
July 29, 2021
SRE
Best Practices
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Most frequently asked questions surrounding Google’s Cloud Operations Sandbox

Jul 29, 2021
Last Updated:
May 2, 2024
Share this post:
Most frequently asked questions surrounding Google’s Cloud Operations Sandbox

Cloud Operations Sandbox serves as a simulation tool for budding SREs to learn the best practices from Google and apply them to real cloud services. In this blog, we have compiled a list of FAQs surrounding the use of Google's Cloud Operations Sandbox.

Table of Contents:

    The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE. It simulates all the behavioural complexities of a real GCP(Google Cloud Platform) environment, so that budding SREs can practice hands-on while learning SRE best practices.

    The core skills you need to become a good SRE are:

    1. Observability of complex microservice-based cloud environments
    2. Performing quick root-cause analysis when things go wrong
    3. Automating rollbacks and monitoring deployments
    4. Tracking SLOs, SLIs over a time period
    Architecture of the demo application provided with the sandbox
    Image Source

    With Cloud Operations Sandbox, you can get started and take the first steps into SRE expertise and answer the question, ‘Will it work in my production environment’? We have compiled a list of FAQs related to the Google SRE Sandbox and answered them below.

    Q: What are the major features of the sandbox?

    While the sandbox has many features, in this blog we will be focusing on observability, root cause analysis, simulating user traffic and SLO/SLI tracking. The features in the sandbox used for learning about these are Cloud Tracing, Locust artificial load generator, cloud profiler, cloud debugger and SRE recipes.

    Q: Can I track custom SLOs and SLAs with the sandbox?

    The demo application that comes with the sandbox has microservices that are pre-instrumented with logging, monitoring, tracing, debugging, and profiling capabilities. In the screenshot shown below you can see how Service Level Indicators(SLI)s can be defined for the demo app.

    Defining SLIs in the Google Sandbox
    Image Source

    You can pick SLIs based on availability, latency or even define your own custom metric for the demo application.

    If you have instead chosen to track SLIs for your replicated production environment you will need to instrument the services separately.

    Q: Which module is used to simulate traffic in the sandbox?

    The artificial load generator used by the sandbox is Locust. Locust is mainly used for testing the load-bearing abilities of your infrastructure. With Locust you can define artificial user behaviour using Python code. Locust allows performing load tests by simulating upto millions of concurrent users.

    User Interface of the locus load generator
    Image Source

    Below you will find a code-snippet with the python code used to simulate the behaviour of a user.

    from
    locust
    import
    HttpUser, between, task
    class
    WebsiteUser
    (
    HttpUser
    ):
        wait_time = between(
    5
    ,
    15
    )
        
    def
    on_start
    (
    self
    ):
            self.client.post(
    "/login"
    , {
                
    "username"
    :
    "test_user"
    ,
                
    "password"
    :
    ""

            })

    Q: What is ‘Google cloud debugger' and how does it work in the sandbox?

    You may have noticed many instances where an issue faced in production, cannot be reproduced in the test environment for root cause analysis. To discover the underlying cause, you must either go into the source code or add more logs to the program when it is running in the production environment. The Cloud Debugger allows developers to debug code during execution using real-time request data.

    Developers have the option of utilising the Cloud Debugger to debug a running application using real-time request data. Breakpoints and log points may be defined while viewing the project. A snapshot of the process state is taken when a breakpoint is hit, so you may examine what went wrong.

    With the Cloud Debugger, adding a log statement to a running project doesn't result in slowed performance. Typically, this would need re-deploying the program/code, with all of the risks that are involved for production deployment.

    Q: What is ‘Google cloud profiler’ and how can it help me?

    You can use Cloud Profiler to perform statistical testing on your application. It collects statistical information on CPU usage, heap size, threads and so on depending on the programming language used. You may utilise the Profiler UI charts to identify performance gaps in your application code.

    Once you have installed the Profiler library, you do not have to write any profiling code in your application; all you have to do is make the Profiler library available (the method depends on the language). This library will generate reports and allow you to conduct various analyses.

    Note that if you are not using the demo application the profiler has to be configured to work with the related microservice.

    Q: What are the tools available to learn tracing across Sandbox?

    Cloud Trace allows developers to examine distributed traces by graphically revealing request latency bottlenecks. Developers gather the trace information by instrumenting the application code. Traces also include environmental information added to the Cloud Logging records. The sandbox provides openCencus and OpenTelemetry to learn tracing within the platform.

    The solution the sandbox uses for instrumenting is OpenCensus. The OpenCensus project is open-source and offers trace instrumentation in many languages. Furthermore, it enables the trace data to be exported to Google Cloud Operations dashboard. To examine the data, you may utilise the Cloud Trace UI.

    Clicking on a trace in the timeline will give you a more detailed view and breakdown of the traced call and the subsequent calls that were made.

    Q: Can I replicate my production/staging environment in the sandbox?

    Your production/staging environment can be replicated if it is hosted on GCP(Google Cloud Platform).

    Q: Can I check for observability of my replicated environment?

    The sandbox has a demo application(hipster shop) that comes pre-instrumented with observability. If you are using your own environment, you will need to instrument your microservices accordingly.

    Q: Can I send alerts to an external platform?

    As of now the demo sandbox has an inbuilt incident management system with basic functionality. Sending alerts to an external platform can be done after creating a custom module.

    Q: How much does the Sandbox cost?

    The sandbox is provided free of charge. However, since it can only be used on the Google Cloud Platform(GCP) platform, any computing resources consumed will be billed.

    Q: Can I improve my MTTR(Mean time to Respond) with the sandbox?

    The sandbox has a feature called “SRE recipes” that auto-generates issues in your environment. It is a good way to learn the skills to fix things in production. It is important to note that SRE recipes will only be working in the demo application provided with the sandbox. You will need to create your own scripts to auto-generate problems in your custom setup. By practicing, SREs can get better at fixing issues in production and reducing the MTTR(Mean time to respond) to incidents.

    Q: Can I test the performance of my production environment in the sandbox?

    Yes. The sandbox environment can be used to test your production environment since it has a tool to generate synthetic traffic. However, the sandbox does not have any tools for thorough unit testing and performance testing.

    Q: What new features will be added to the sandbox?

    Runbooks are expected to be added in the sandbox in the near future. Creating effective runbooks is an important skill all SREs need to acquire.

    Conclusion

    The SRE sandbox is a great place to test out your skills for becoming a better SRE. To be effective in their work, SREs need expertise in the areas of observability, performance testing and distributed architecture. The sandbox provides a way for budding SREs to test out different scenarios. Some possible scenarios include checking the performance of your application under different user loads, getting better at resolving critical issues and testing out different on-call strategies.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    July 29, 2021
    July 29, 2021
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Nir Sharma
    What are Canary Deployments and Why are they Important?
    What are Canary Deployments and Why are they Important?
    August 25, 2022
    Classifying Severity Levels for Your Organization
    Classifying Severity Levels for Your Organization
    July 5, 2022
    Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises
    Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises
    April 5, 2022
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.