[box type=”note” align=”” class=”” width=””]This article is an excerpt from a book written by David Blomquist and Tomasz Janiszewski, titled Apache Mesos Cookbook. Throughout the course of the book, you will get to know tips and tricks along with best practices to follow when working with Mesos.[/box]
In this article, we will learn about configuring logging options, setting up monitoring ecosystem, and upgrading your Mesos cluster.
Here we will configure logging options that will allow us to debug the state of Mesos.
We will assume Mesos is available on localhost port 5050. The steps provided here will work for either master or agents.
When Mesos is installed from pre-built packages, the logs are by default stored in /var/log/mesos/. When installing from a source build, storing logs is disabled by default. To change the log store location, we need to edit /etc/default/mesos and set the LOGS variable to the desired destination. For some reason, mesos-init-wrapper does not transfer the contents of /etc/mesos/log_dir to the –log_dir flag. That’s why we need to set the log’s destination in the environment variable.
Remember that only Mesos logs will be stored there. Logs from third-party applications (for example, ZooKeeper) will still be sent to STDERR.
Changing the default logging level can be done in one of two ways: by specifying the — logging_level flag or by sending a request and changing the logging level at runtime for a specific period of time.
For example, to change the logging level to INFO, just put it in the following code:
/etc/mesos/logging_level
echo INFO > /etc/mesos/logging_level
The possible levels are INFO, WARNING, and ERROR.
For example, to change the logging level to the most verbose for 15 minutes for debug purposes, we need to send the following request to the logging/toggle endpoint:
curl -v -X POST localhost:5050/logging/toggle?level=3&duration=15mins
Mesos uses the Google-glog library for debugging, but third-party dependencies such as ZooKeeper have their own logging solution. All configuration options are backed by glog and apply only to Mesos core code.
Now, we will set up monitoring for Mesos.
We must have a running monitoring ecosystem. Metrics storage could be a simple time- series database such as graphite, influxdb, or prometheus. In the following example, we are using graphite and our metrics are published with http://diamond.readthedocs.io/en/latest/.
Monitoring is enabled by default. Mesos does not provide any way to automatically push metrics to the registry. However, it exposes them as a JSON that can be periodically pulled and saved into the metrics registry:
pip (Pip Installs Packages) is a Python package manager used to install software written in Python.
[handler_graphite]
class = handlers.GraphiteHandler host = <graphite.host>
port = <graphite.port>
Remember to replace graphite.host and graphite.port with real graphite details.
Mesos exposes metrics via the HTTP API. Diamond is a small process that periodically pulls metrics, parses them, and sends them to the metrics registry, in this case, graphite. The default implementation of Mesos Collector does not store all the available metrics so it’s recommended to write a custom handler that will collect all the interesting information.
Metrics could be read from the following endpoints: http://mesos.apache.org/documentation/latest/endpoints/metrics/snapshot/
http://mesos.apache.org/documentation/latest/endpoints/slave/monitor/statistics/
http://mesos.apache.org/documentation/latest/endpoints/slave/state/
In this recipe, you will learn how to upgrade your Mesos cluster.
Mesos release cadence is at least one release per quarter. Minor releases are backward compatible, although there could be some small incompatibilities or the dropping of deprecated methods. The recommended method of upgrading is to apply all intermediate versions. For example, to upgrade from 0.27.2 to 1.0.0, we should apply 0.28.0, 0.28.1, 0.28.2, and finally 1.0.0.
If the agent’s configuration changes, clearing the metadata directory is required. You can do this with the following code:
rm -rv {MESOS_DIR}/metadata
Here, {MESOS_DIR} should be replaced with the configured Mesos directory. Rolling upgrades is the preferred method of upgrading clusters, starting with masters and then agents.
To minimize the impact on running tasks, if an agent’s configuration changes and it becomes inaccessible, then it should be switched to maintenance mode.
Configuration changes may require clearing the metadata because the changes may not be backward compatible. For example, when an agent runs with different isolators, it shouldn’t attach to the already running processes without this isolator. The Mesos architecture will guarantee that the executors that were not attached to the Mesos agent will commit suicide after a configurable amount of time (–executor_registration_timeout).
Maintenance mode allows you to declare the time window during which the agent will be inaccessible. When this occurs, Mesos will send a reverse offer to all the frameworks to drain that particular agent. The frameworks are responsible for shutting down its task and spawning it on another agent. The Maintenance mode is applied, even if the framework does not implement the HTTP API or is explicitly declined. Using maintenance mode can prevent restarting tasks multiple times.
Consider the following example with five agents and one task, X. We schedule the rolling upgrade of all the agents. Task X is deployed on agent 1. When it goes down, it’s moved to 2, then to 3, and so on. This approach is extremely inefficient because the task is restarted five times, but it only needs to be restarted twice. Maintenance mode enables the framework to optimally schedule the task to run on agent 5 when 1 goes down, and then return to 1 when 5 goes down:
Worst case scenario of rolling upgrade without maintenance mode legend optimal solution of rolling upgrade with maintenance mode.
We have learnt about running and maintaining Mesos. To know more about managing containers and understanding the scheduler API you may check out this book, Apache Mesos Cookbook.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…