Home Tutorials Monitoring, Logging, and Troubleshooting

Monitoring, Logging, and Troubleshooting

June 20, 2017 - 12:00 am

2364

6 min read

In this article by Gigi Sayfan, the author of the book Mastering Kubernetes, we will learn how to do the monitoring Kubernetes with Heapster.

(For more resources related to this topic, see here.)

Monitoring Kubernetes with Heapster

Heapster is a Kubernetes project that provides a robust monitoring solution for Kubernetes clusters. It runs as a pod (of course), so it can be managed by Kubernetes itself. Heapster supports Kubernetes and CoreOS clusters. It has a very modular and flexible design. Heapster collects both operational metrics and events from every node in the cluster, stores them in a persistent backend (with a well-defined schema) and allows visualization and programmatic access. Heapster can be configured to use different backends (or sinks, in Heapster’s parlance) and their corresponding visualization frontends. The most common combination is InfluxDB as backend and Grafana as frontend. The Google Cloud platform integrates Heapster with the Google monitoring service. There are many other less common backends, such as the following:

Log
InfluxDB
Google Cloud monitoring
Google Cloud logging
Hawkular-Metrics(metrics only)
OpenTSDB
Monasca (metrics only)
Kafka (metrics only)
Riemann (metrics only)
Elasticsearch

You can use multiple backends by specifying sinks on the command-line:

--sink=log --sink=influxdb:http://monitoring-influxdb:80/

Mastering Kubernetes

cAdvisor

cAdvisor is part of the kubelet, which runs on every node. It collects information about the CPU/cores usage, memory, network,and file systems of each container. It provides a basic UI on port 4194, but, most importantly for Heapster, it provides all this information through the kubelet. Heapster records the information collected by cAdvisor on each node and stores it in its backend for analysis and visualization.

The cAdvisor UI is useful if you want to quickly verify that a particular node is setup correctly, for example, while creating a new cluster when Heapster is not hooked up yet.

Here is what it looks same as shown following:

Mastering Kubernetes

InfluxDB backend

InfluxDB is a modern and robust distributed time-series database. It is very well-suited and used broadly for centralized metrics and logging. It is also the preferred Heapster backend (outside the Google Cloud platform). The only thing is InfluxDB clustering, high availability is part of enterprise offering.

The storageschema

The InfluxDB storage schema defines the information that Heapster stores in InfluxDB and is available for querying and graphing later. The metrics are divided into multiple categories, called measurements. You can treat and query each metric separately, or you can query a whole category as one measurement and receive the individual metrics as fields. The naming convention is <category>/<metrics name> (except for uptime, which has a single metric). If you have a SQL background you can think of measurements as tables. Each metrics are stored per container. Each metric is labeled with the following information:

pod_id – Unique ID of a pod
pod_name – User-provided name of a pod
pod_namespace – The namespace of a pod
container_base_image – Base image for the container
container_name – User-provided name of the container or full cgroup name for system containers
host_id – Cloud-provider-specified or user-specified Identifier of a node
hostname – Hostname where the container ran
labels – Comma-separated list of user-provided labels; format is key:value’
namespace_id – UID of the namespace of a pod
resource_id – A unique identifier used to differentiate multiple metrics of the same type, for example, FS partitions under filesystem/usage

Here are all the metrics grouped by category. As you can see, it is quite extensive.

CPU

cpu/limit – CPU hard limit in millicores
cpu/node_capacity – CPU capacity of a node
cpu/node_allocatable – CPU allocatable of a node
cpu/node_reservation – Share of CPU that is reserved on the node allocatable
cpu/node_utilization – CPU utilization as a share of node allocatable
cpu/request – CPU request (the guaranteed amount of resources) in millicores
cpu/usage – Cumulative CPU usage on all cores
cpu/usage_rate – CPU usage on all cores in millicores

File system

filesystem/usage – Total number of bytes consumed on a filesystem
filesystem/limit – The total size of the filesystem in bytes
filesystem/available – The number of available bytes remaining in the filesystem

Memory

memory/limit – Memory hard limit in bytes
memory/major_page_faults – Number of major page faults
memory/major_page_faults_rate – Number of major page faults per second
memory/node_capacity – Memory capacity of a node
memory/node_allocatable – Memory allocatable of a node
memory/node_reservation – Share of memory that is reserved on the node allocatable
memory/node_utilization – Memory utilization as a share of memory allocatable
memory/page_faults – Number of page faults
memory/page_faults_rate – Number of page faults per second
memory/request – Memory request (the guaranteed amount of resources) in bytes
memory/usage – Total memory usage
memory/working_set – Total working set usage; working set is the memory being used and not easily dropped by the kernel

Network

network/rx – Cumulative number of bytes received over the network
network/rx_errors – Cumulative number of errors while receiving over the network
network/rx_errors_rate – Number of errors per second while receiving over the network
network/rx_rate – Number of bytes received over the network per second
network/tx – Cumulative number of bytes sent over the network
network/tx_errors – Cumulative number of errors while sending over the network
network/tx_errors_rate – Number of errors while sending over the network
network/tx_rate – Number of bytes sent over the network per second

Uptime

uptime – Number of milliseconds since the container was started

You can work with InfluxDB directly if you’re familiar with it. You can either connect to it using its own API or use its web interface. Type the following command to find its port:

k describe service monitoring-influxdb --namespace=kube-system | grep NodePort

Type:                   NodePort
NodePort:               http    32699/TCP
NodePort:               api     30020/TCP

Now you can browse to the InfluxDB web interface using the HTTP port. You’ll need to configure it to point to the API port. The username and password are root and root by default:

Mastering Kubernetes

Once you’re setup you can select what database to use (see top-right corner). The Kubernetes database is called k8s. You can now query the metrics using the InfluxDB query language.

Grafana visualization

Grafana runs in its own container and serves a sophisticated dashboard that works well with InfluxDB as a data source. To locate the port, type the following command:

k describe service monitoring-influxdb --namespace=kube-system | grep NodePort

Type:                   NodePort
NodePort:               <unset> 30763/TCP

Now you can access the Grafana web interface on that port. The first thing you need to do is setup the data source to point to the InfluxDB backend:

Mastering Kubernetes

Make sure to test the connection and then go explore the various options in the dashboards. There are several default dashboards, but you should be able to customize it to your preferences. Grafana is designed to let adapt it to your needs.

Summary

In this article we have learned how to do monitoring Kubernetes with Heapster.

Resources for Article:

Further resources on this subject:

The Microsoft Azure Stack Architecture [article]
Building A Recommendation System with Azure [article]
Setting up a Kubernetes Cluster [article]

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Monitoring, Logging, and Troubleshooting

Monitoring Kubernetes with Heapster

cAdvisor

InfluxDB backend

The storageschema

CPU

File system

Memory

Network

Uptime

Grafana visualization

Summary

Resources for Article:

LEAVE A REPLY Cancel reply

Interviews

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring Forms in Angular – types, benefits and differences

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9...

Exploring the Strategy Behavioral Design Pattern in Node.js

Giving material.angular.io a refresh from Angular Blog – Medium

Popular on Packt Hub

How to use arrays, lists, and dictionaries in Unity for 3D...

Customizing Elgg Themes

Using Python Automation to interact with network devices [Tutorial]

Basics of Jupyter Notebook and Python

OpenCV: Detecting Edges, Lines, and Shapes

MobilePro

datapro

Programming

Subscribe to our newsletter