6 min read

In this article by Gigi Sayfan, the author of the book Mastering Kubernetes, we will learn how to do the monitoring Kubernetes with Heapster.

(For more resources related to this topic, see here.)

Monitoring Kubernetes with Heapster

Heapster is a Kubernetes project that provides a robust monitoring solution for Kubernetes clusters. It runs as a pod (of course), so it can be managed by Kubernetes itself. Heapster supports Kubernetes and CoreOS clusters. It has a very modular and flexible design. Heapster collects both operational metrics and events from every node in the cluster, stores them in a persistent backend (with a well-defined schema) and allows visualization and programmatic access. Heapster can be configured to use different backends (or sinks, in Heapster’s parlance) and their corresponding visualization frontends. The most common combination is InfluxDB as backend and Grafana as frontend. The Google Cloud platform integrates Heapster with the Google monitoring service. There are many other less common backends, such as the following:

  • Log
  • InfluxDB
  • Google Cloud monitoring
  • Google Cloud logging
  • Hawkular-Metrics(metrics only)
  • OpenTSDB
  • Monasca (metrics only)
  • Kafka (metrics only)
  • Riemann (metrics only)
  • Elasticsearch

You can use multiple backends by specifying sinks on the command-line:

--sink=log --sink=influxdb:http://monitoring-influxdb:80/

Mastering Kubernetes

cAdvisor

cAdvisor is part of the kubelet, which runs on every node. It collects information about the CPU/cores usage, memory, network,and file systems of each container. It provides a basic UI on port 4194, but, most importantly for Heapster, it provides all this information through the kubelet. Heapster records the information collected by cAdvisor on each node and stores it in its backend for analysis and visualization.

The cAdvisor UI is useful if you want to quickly verify that a particular node is setup correctly, for example, while creating a new cluster when Heapster is not hooked up yet.

Here is what it looks same as shown following:

Mastering Kubernetes

InfluxDB backend

InfluxDB is a modern and robust distributed time-series database. It is very well-suited and used broadly for centralized metrics and logging. It is also the preferred Heapster backend (outside the Google Cloud platform). The only thing is InfluxDB clustering, high availability is part of enterprise offering.

The storageschema

The InfluxDB storage schema defines the information that Heapster stores in InfluxDB and is available for querying and graphing later. The metrics are divided into multiple categories, called measurements. You can treat and query each metric separately, or you can query a whole category as one measurement and receive the individual metrics as fields. The naming convention is <category>/<metrics name> (except for uptime, which has a single metric). If you have a SQL background you can think of measurements as tables. Each metrics are stored per container. Each metric is labeled with the following information:

  • pod_id – Unique ID of a pod
  • pod_name – User-provided name of a pod
  • pod_namespace – The namespace of a pod
  • container_base_image – Base image for the container
  • container_name – User-provided name of the container or full cgroup name for system containers
  • host_id – Cloud-provider-specified or user-specified Identifier of a node
  • hostname – Hostname where the container ran
  • labels – Comma-separated list of user-provided labels; format is key:value’
  • namespace_id – UID of the namespace of a pod
  • resource_id – A unique identifier used to differentiate multiple metrics of the same type, for example, FS partitions under filesystem/usage

Here are all the metrics grouped by category. As you can see, it is quite extensive.

CPU

  • cpu/limit – CPU hard limit in millicores
  • cpu/node_capacity – CPU capacity of a node
  • cpu/node_allocatable – CPU allocatable of a node
  • cpu/node_reservation – Share of CPU that is reserved on the node allocatable
  • cpu/node_utilization – CPU utilization as a share of node allocatable
  • cpu/request – CPU request (the guaranteed amount of resources) in millicores
  • cpu/usage – Cumulative CPU usage on all cores
  • cpu/usage_rate – CPU usage on all cores in millicores

File system

  • filesystem/usage – Total number of bytes consumed on a filesystem
  • filesystem/limit – The total size of the filesystem in bytes
  • filesystem/available – The number of available bytes remaining in the filesystem

Memory

  • memory/limit – Memory hard limit in bytes
  • memory/major_page_faults – Number of major page faults
  • memory/major_page_faults_rate – Number of major page faults per second
  • memory/node_capacity – Memory capacity of a node
  • memory/node_allocatable – Memory allocatable of a node
  • memory/node_reservation – Share of memory that is reserved on the node allocatable
  • memory/node_utilization – Memory utilization as a share of memory allocatable
  • memory/page_faults – Number of page faults
  • memory/page_faults_rate – Number of page faults per second
  • memory/request – Memory request (the guaranteed amount of resources) in bytes
  • memory/usage – Total memory usage
  • memory/working_set – Total working set usage; working set is the memory being used and not easily dropped by the kernel

Network

  • network/rx – Cumulative number of bytes received over the network
  • network/rx_errors – Cumulative number of errors while receiving over the network
  • network/rx_errors_rate – Number of errors per second while receiving over the network
  • network/rx_rate – Number of bytes received over the network per second
  • network/tx – Cumulative number of bytes sent over the network
  • network/tx_errors – Cumulative number of errors while sending over the network
  • network/tx_errors_rate – Number of errors while sending over the network
  • network/tx_rate – Number of bytes sent over the network per second

Uptime

  • uptime – Number of milliseconds since the container was started

You can work with InfluxDB directly if you’re familiar with it. You can either connect to it using its own API or use its web interface. Type the following command to find its port:

k describe service monitoring-influxdb --namespace=kube-system | grep NodePort

Type:                   NodePort
NodePort:               http    32699/TCP
NodePort:               api     30020/TCP

Now you can browse to the InfluxDB web interface using the HTTP port. You’ll need to configure it to point to the API port. The username and password are root and root by default:

Mastering Kubernetes

Once you’re setup you can select what database to use (see top-right corner). The Kubernetes database is called k8s. You can now query the metrics using the InfluxDB query language.

Grafana visualization

Grafana runs in its own container and serves a sophisticated dashboard that works well with InfluxDB as a data source. To locate the port, type the following command:

k describe service monitoring-influxdb --namespace=kube-system | grep NodePort

Type:                   NodePort
NodePort:               <unset> 30763/TCP

Now you can access the Grafana web interface on that port. The first thing you need to do is setup the data source to point to the InfluxDB backend:

Mastering Kubernetes

Make sure to test the connection and then go explore the various options in the dashboards. There are several default dashboards, but you should be able to customize it to your preferences. Grafana is designed to let adapt it to your needs.

Summary

In this article we have learned how to do monitoring Kubernetes with Heapster. 

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here