As one of the most popular open-source Kubernetes monitoring solutions, Prometheus leverages a multidimensional data model of time-stamped metric data and labels. The platform uses a pull-based architecture to collect metrics from various targets. It stores the metrics in a time-series database and provides the powerful PromQL query language for efficient analysis and data visualization.
Despite its powerful capabilities, there are several key considerations that determine the observability efficiency of a Kubernetes cluster through Prometheus. These considerations include
This article walks you through each step to install and configure Prometheus in detail.
Prometheus supports multiple installation options that can be chosen depending upon the complexity of the deployment, the need for customization, and the availability of resources to manage and maintain the installation. Installation options include:
You can leverage a Kubernetes operator to simplify the installation and management of Prometheus by abstracting recurrent configuration tasks through a high-level interface. Although operators require additional technical expertise to set up and manage, they also offer significant benefits, including automatic backups, scaling, and self-healing, making them ideal for large-scale production environments.
You can create YAML files that define the desired state of the Prometheus deployment, including the configuration of core components such as the Prometheus server, alert manager, and exporters. This installation option requires a greater degree of manual effort to maintain and update the deployment, as any changes to the configuration or components require YAML file changes and redeployment. However, the approach is beneficial for simpler deployments or for SREs who prefer granular control over the deployment configuration.
You can leverage the Helm package manager for easier configuration of Prometheus and underlying Kubernetes resources. Compared to manifest-based installation, Helm charts provide more customization options, allowing for fine-grained control over the Prometheus deployment. The Helm templating engine also simplifies the upgrade and management process for complex implementations that require intricate configuration of dependency management, versioning, and rollback support.
When choosing between Helm charts and manifests for installing Prometheus on Kubernetes, there are several important factors to consider:
In the following sections, we go through the steps to install and configure Prometheus on a Kubernetes cluster using Helm.
Before installing Prometheus, it is recommended to create a plan for
As Prometheus consumes significant CPU and memory resources, it is also recommended to inspect if your Kubernetes cluster has optimum availability of resources to support the installation and subsequent configuration steps.
As the first step, create a namespace that defines a logical boundary to isolate resources of the Prometheus setup from other services of your Kubernetes cluster.
For this demo, we create the namespace darwin and use it for the Prometheus deployment.
Check for output:
Run the following command to install Prometheus using Helm:
Check for output:
To verify the installation, run the command:
The output returns the list of different pods that are running in the darwin namespace:
In our case, the pods are:
This step is optional and should be used only when you want to interact with the Prometheus server running in a Kubernetes cluster from a local machine.
Once you have the details of the pods running different services, create a port-forward from your local device to the primary pod (on which the Prometheus server is deployed) for accessing the Prometheus UI.
To achieve this, use the command below.
The above command forwards traffic from the 9091 port of your local machine to the prometheus-prometheus-0 pod. After running this command, you can access the Prometheus UI by navigating to http://localhost:9091 in your web browser.
Quick note: The port-forward command will continue running in the foreground until you interrupt it manually (e.g., by pressing Ctrl-C). If you stop port-forwarding, you can no longer access the Prometheus UI until you re-initiate a new port-forward.
Alternatively, suppose you want to keep the port-forward running in the background. In that case, you can start it as a background process by adding & at the end of the command:
Although port-forwarding is a helpful option to forward and test traffic access from the UI locally, it is not advisable to expose traffic to the wider network. Instead, use a Prometheus service (covered in step 5 below) to expose pods as a network service for external clients.
ConfigMaps in Kubernetes store the configuration data of your cluster workloads outside of the container image. They allow you to manage configuration files independently from the application code. For a Prometheus instance, a ConfigMap processes the configuration file prometheus.yml to store the specifications of various targets, scraping metrics, alert rules, and other settings of the Prometheus server.
To create a ConfigMap named prometheus-config in the darwin namespace that contains the configuration file prometheus.yml, use the command:
Which returns the output:
Modify the configuration file with specifications similar to:
In this case, the scrape interval is set to 15 seconds, with one target (darwin-service:8080) defined with a scrape interval of 5 seconds.
Quick note: Be sure to set the targets field to only the IP or hostname of the service you want to scrape.
Apply the configuration with the command:
Once you create the Prometheus deployment, the next step is to expose the Prometheus server within the Kubernetes cluster and allow other pods and services (within or external to the cluster) to communicate with it. In our example, we create a Kubernetes service object to include a stable IP address and port number that other pods and services can use to access the Prometheus server.
Create a service named darwin-prometheus-service with specifications similar to:
Quick note: The specification above creates a new service named darwin-prometheus-service in the darwin namespace with a NodePort type, and configures it to forward traffic to the darwin-prometheus deployment using the app: darwin-prometheus selector. The service is exposed on port 9090 with a targetPort of 9090, and is accessible from outside the cluster using the IP address 10.0.0.100 and port 30000.
Instead of using NodePort, you can also use other service types, like ClusterIP (default), LoadBalancer, or ExternalName based on your use case. Details of various Kubernetes service types can be found here.
Apply the service file using the command:
Verify that the service has been created using the following command:
The output returns details of darwin-prometheus-service with a NodePort and the port number that you assign to it.
When you install Prometheus using Helm chart, both the prometheus.yml and values.yaml files are generated. During the installation of the Helm chart, Helm reads the values.yaml file and generates the Kubernetes manifest files for deploying Prometheus, including the prometheus.yml configuration file.
In the following steps, we use the prometheus.yml file to configure key touchpoints of the Prometheus deployment.
Prometheus periodically scrapes metrics from target endpoints, including the kube-state-metrics service. To modify Prometheus scrape configurations, you can modify the prometheus.yml configuration file to specify scrape targets and related parameters.
In this example, two separate static_configs blocks are defined, each with a different job_name. The first block scrapes a single target, darwin-service-1:80, every 5 seconds, while the second block scrapes a single target, darwin-service-2:80, every 10 seconds.
Relabeling allows you to transform or modify the scraped data labels before storing them in the time-series database. This is useful for modifying labels to match your naming conventions or adding additional metadata to the scraped data.
For instance, if you want to modify the job label of darwin-service-1 and save the scraped metrics to a new value, say darwin-new-service, you relabel the prometheus.yml configuration file as follows.
Additional details of internal relabeling and supported actions of the relabel_config block can be found on the Grafana blog.
In a default Prometheus configuration, you deploy containers without resource limits, consequently leading to suboptimal performance of the operating cluster. Instead, you can configure resource consumption at the job, instance, or global level by explicitly defining the limit in the prometheus.yml configuration file.
In this case, containers are allocated as below:
To increase the scalability and availability of your Prometheus deployment, you can configure a federated cluster setup, where multiple Prometheus instances share information and query each other to consolidate metrics data. This can be useful in large-scale deployments where you have multiple clusters with Prometheus servers and intend to analyze metrics data for all of them.
Configuring cluster federation involves the following steps:
Prometheus provides a variety of ways to expose metrics, including through its web UI, an HTTP endpoint, or a push gateway. For the collection and monitoring of specific application or service metrics by the Prometheus server, you can expose those metrics in a format that Prometheus can parse.
The steps to expose Prometheus metrics include
Alerting rules are used to define conditions under which alerts are triggered. As an essential part of monitoring and reliability engineering, you can set up notifications via various channels such as email, Slack, or Squadcast to help detect and resolve issues before they become critical.
In this case, the rule_files field points to a directory containing alert rules, which define the conditions under which alerts are triggered. Triggered alerts get sent to the specified Alertmanager targets, which you can further configure to send notifications to various channels, such as email or the Squadcast platform.
StorageClass provisioning is an important aspect of Prometheus configuration that ensures metrics data is stored by following a set of properties, such as:
It is important to note that unlike other configuration settings covered in the steps above, StorageClass provisioning is a Kubernetes configuration, and is done by configuring the values.yaml file.
Provisioning a StorageClass for Prometheus involves the following steps:
Some important points to watch out for are given below.
As Prometheus stores time-series data over time, it can expand to consume a significant amount of storage. To ensure appropriate sizing of persistent storage for your Prometheus installation, ensure the following:
Efficient resource management prevents Prometheus from consuming too many or too few resources of a Kubernetes cluster. As a recommended practice, you should enforce resource requests and limits according to the expected workload of Prometheus. This helps to ensure optimal performance of the Prometheus instance and prevents cost escalation by limiting resource usage.
To prevent Prometheus from being a potential target of cyberattacks; it is crucial to restrict public access to Prometheus. With network policies, you can define how traffic is allowed to flow between pods in a Kubernetes cluster. When applying network policies at the component level, ensure the following:
For robust security, it is best to use dedicated service accounts for Prometheus components to access Kubernetes resources. Make sure to grant these accounts the necessary permissions to access the required resources by binding a cluster role based on the component’s scope. With this, you can limit the access of Kubernetes resources to only what is actually needed for Prometheus.
ConfigMaps stores Prometheus configuration information as key-value pairs and string data. Using ConfigMaps to centrally store configuration information makes it easier to update Prometheus configurations as needed without having to modify individual Kubernetes deployment files. Some recommended steps to achieve this include:
Application performance in a Kubernetes cluster typically depends on the performance of containers, pods, and services. Undeniably, monitoring core components of a Kubernetes cluster is an essential aspect of reliability engineering that helps gain proactive insights into cluster health, workloads, and underlying infrastructure.
For a distributed ecosystem of containerized applications and related dependencies, monitoring Kubernetes using Prometheus can be a complex undertaking. It is crucial to adopt configuration best practices that ensure core Kubernetes components expose metrics securely. Prometheus can then scrape them in real time for rapid analysis and visualization.