In this article by Fabrizio Soppelsa and Chanwit Kaewkasi, the author of Native Docker Clustering with Swarm we’re now going to see how to administer a running Swarm cluster. The topics include scaling the cluster size (adding and removing nodes), updating the cluster and nodes information; handling the node status (promotion and demotion), troubleshooting, and graphical interfaces (UI).
(For more resources related to this topic, see here.)
In standalone mode, cluster operations must be done directly inside the container ‘swarm’.
We’re not going to cover every option in detail. Swarm standalone is not deprecated yet, and is used around, the reason for which we’re discussing it here, but it will be probably declared deprecated soon. It is obsoleted by the Swarm mode.
The commands to administer a Docker Swarm standalone cluster are:
You can choose to create Docker hosts either way you prefer.
If you plan to use Docker Machine, you’re probably going to hit Machine’s limits very soon, and you will need to be very patient while even listing machines, having to wait several seconds for Machine to get and print all the information on the whole.
My favorite method is to use Machine with the generic driver, thus delegate to something else (that is, Ansible) the host provisioning (Operating System installation, network and security groups configurations, and so on), and later exploit Machine to install Docker the proper way:
Then handle hosts with the tool in part 2, or even others.
If you use Machine’s generic driver, it will select the latest stable Docker binaries. While we were writing this article, in order to use Docker 1.12, we had to overcome this by passing Machine a special option to get the latest, unstable, version of Docker:
docker-machine create -d DRIVER--engine-install-url https://test.docker.com mymachine
For a production Swarm (mode), at the time you’ll be reading this article, 1.12 will be already stable, so this trick will not be necessary anymore, unless you need to use some of the very latest Docker features.
The theory of HA suggests us that the number of managers must be odd, and equal or more than 3. This is to grant a quorum in high availability, that is the majority of nodes agree on what part of nodes are leading the operations.
If there were two managers, and one goes down and comes back, it’s possible that both will think to be the leaders. That causes a logical crash in the cluster organization called split brain.
The more managers you have, the higher is the resistance ratio to failures. Refer to the following table:
Number of managers | Quorum (majority) | Maximum possible failures |
3 | 2 | 1 |
5 | 3 | 2 |
7 | 4 | 3 |
9 | 5 | 4 |
Also, in Swarm Mode, an overlay network is created automatically and associated as ingress traffic to the nodes. Its purpose is to be used with containers:
You will want that your containers be associated to an internal overlay (VxLAN meshed) network to communicate with each other, rather than using public or other networks. Thus, Swarm creates this already for you, ready to use.
We recommend, further, to geographically distribute managers. If an earthquake hits the datacenter where all managers are serving, the cluster would go down, wouldn’t it? So, consider to place each manager or groups of managers into different physical locations. With the advent of cloud computing, that’s really easy, you can spawn up each manager in a different AWS region, or even better have a manager running each on different providers on different regions, that is on AWS, on Digital Ocean, on Azure and also on private cloud, such as OpenStack.
IMAGE OF A WORLD WITH SCATTERED MANAGERS IN CONTINENTS?
You can add an arbitrary number of workers. This is the elastic part of the Swarm. It’s totally fine to have 5, 15, 200, or 2,300 running workers. This is the easiest part to handle: You can add and remove workers with no burdens, at any time, at any size.
The very easiest way to add nodes, if you plan to not go over 100 nodes total, is to use basic scripting.
At the time of docker swarm init, just copy and paste the lines printed in the output.
Then, create a certain bunch of workers:
#!/bin/bash
for i in `seq 0 9`; do
docker-machine create -d amazonec2 --engine-install-url https://test.docker.com --amazonec2-instance-type "t2.large" swarm-worker-$i
done
After that, it will be only necessary to go through the list of machines, ssh into them, and join the nodes:
#!/bin/bash
SWARMWORKER="swarm-worker-"
for machine in `docker-machine ls --format {{.Name}} | grep $SWARMWORKER`;
do
docker-machine ssh $machine sudo docker swarm join --token SWMTKN-1-5c3mlb7rqytm0nk795th0z0eocmcmt7i743ybsffad5e04yvxt-9m54q8xx8m1wa1g68im8srcme
172.31.10.250:2377
done
This script runs through the machines, and for each with a name starting with swarm-worker-, will ssh into, and join the node to the existing Swarm, to the leader manager, here 172.31.10.250.
Refer to https://github.com/swarm2k/swarm2k/tree/master/amazonec2 for some further details or to download these one liners.
Belt is another tool for massively provisioning Docker Engines. It is basically a SSH wrapper on steroids and it requires you to prepare provider-specific images as well as provisioning templates before go massively. In this section, we’ll learn to do so:
You can compile Belt yourself by getting its source from Github.
# Set $GOPATH here
go get https://github.com/chanwit/belt
Currently, Belt supports the DigitalOcean driver. We can prepare our template for provisioning such as the following inside config.yml:
digitalocean:
image: "docker-1.12-rc4"
region: nyc3
ssh_key_fingerprint: "your SSH ID"
ssh_user: root
Then we can create a hundred nodes basically with a couple of commands.
First we create three boxes of 16 GB, namely, mg0, mg1, and mg2.
$ belt create 16gb mg[0:2]
NAME IPv4 MEMORY REGION IMAGE STATUS
mg2 104.236.231.136 16384 nyc3 Ubuntu docker-1.12-rc4 active
mg1 45.55.136.207 16384 nyc3 Ubuntu docker-1.12-rc4 active
mg0 45.55.145.205 16384 nyc3 Ubuntu docker-1.12-rc4 active
Then we can use the status command to wait for all nodes to become active:
$ belt status --wait active=3
STATUS #NODES NAMES
active 3 mg2, mg1, mg0
We'll do this again for 10 worker nodes.
$ belt create 512mb node[1:10]
$ belt status --wait active=13
STATUS #NODES NAMES
active 3 node10, node9, node8, node7
You can alternatively use Ansible (as you like, and it’s becoming very popular) to make things more repeatable. I (Fabrizio) created some Ansible modules to work directly with Machine and Swarm (Mode), compatible with Docker 1.12 (https://github.com/fsoppelsa/ansible-swarm). They require Ansible 2.2+, the very first version of Ansible compatible with binary modules.
You will need to compile the modules (written in Go), and then pass them to the ansible-playbook -M parameter.
git clone https://github.com/fsoppelsa/ansible-swarm
cd ansible-swarm/library
go build docker_machine_
go build docker_swarm_
cd ..
There are some examples of plays in playbooks/. Ansible’s plays syntax is that easy to understand, that’s even superfluous to explain in detail.
I used this play to join 10 workers to the Swarm2k experiment:
---
name: Join the Swarm2k project
hosts: localhost
connection: local
gather_facts: False
#mg0 104.236.18.183
#mg1 104.236.78.154
#mg2 104.236.87.10
tasks:
name: Load shell variables
shell: >
eval $(docker-machine env "{{ machine_name }}")
echo $DOCKER_TLS_VERIFY &&
echo $DOCKER_HOST &&
echo $DOCKER_CERT_PATH &&
echo $DOCKER_MACHINE_NAME
register: worker
name: Set facts
set_fact:
whost: "{{ worker.stdout_lines[0] }}"
wcert: "{{ worker.stdout_lines[1] }}"
name: Join a worker to Swarm2k
docker_swarm:
role: "worker"
operation: "join"
join_url: ["tcp://104.236.78.154:2377"]
secret: "d0cker_swarm_2k"
docker_url: "{{ whost }}"
tls_path: "{{ wcert }}"
register: swarm_result
name: Print final msg
debug: msg="{{ swarm_result.msg }}"
Basically, after loading some host facts from Machine, it invokes the docker_swarm module:
After having docker_swarm.go compiled in library/, adding workers to the swarm is as easy as:
#!/bin/bash
SWARMWORKER="swarm-worker-"
for machine in `docker-machine ls --format {{.Name}} | grep $SWARMWORKER`;
do
ansible-playbook -M library --extra-vars "{machine_name: $machine}" playbook.yaml
don
We now operate a little bit with this example, made of 3 managers and 10 workers.
You can reference the nodes by calling them either by their hostname (manager1) or by their ID (ctv03nq6cjmbkc4v1tc644fsi). The other columns in this list statement describe the properties of the cluster nodes.
The docker node command comes with some possible options.
Promotion is possible for worker nodes (transforming them into managers), while demotion is possible for manager nodes (transforming them into workers).
Always keep in mind the table to guarantee high availability when managing the number of managers and workers (odd number, more than or equal to 3).
Use the following syntax to promote worker0 and worker1 to managers:
docker node promote worker0
docker node promote worker1
There is nothing magic behind the curtain. It is just that Swarm attempts to change the node role, with an on-the-fly instruction.
Demote is the same (docker node demote worker1). But be careful to not demote the node you’re working from, otherwise you’ll get locked out.
What happens if you try to demote a Leader manager? In this case, the RAFT algorithm will start an election and a new leader will be selected among the Active managers.
You must have noticed, in the preceding screenshot, that worker9 is in Drain availability. This means that the node is in the process of evacuating its tasks (if any), which will be rescheduled somewhere else on the cluster.
You can change the availability of a node by updating its status using the docker node update command:
The –availability option can take either active, pause, or drain. Here we just restored worker9 to the active state.
Another powerful update argument is about labels. There are –label-add and –label-rm that respectively allow us to add labels to Swarm nodes.
Docker Swarm labels do not affect the Engine labels. It’s possible to specify labels when starting the Docker Engine (dockerd […] –label “staging” –label “dev” […]). But Swarm has no power to edit/change them. The labels we see here only affect the Swarm behavior.
Labels are useful to categorize nodes. When you start services, you can then filter and decide where to physically spawn containers, using labels. For instance, if you want to dedicate a bunch of nodes with SSD to host MySQL, you can actually do this:
docker node update --label-add type=ssd --label-add type=mysql worker1
docker node update --label-add type=ssd --label-add type=mysql worker2
docker node update --label-add type=ssd --label-add type=mysql worker3
Later, when you will start a service with some replica factor, say 3, you’ll be sure that it will start MySQL containers exactly on worker1, worker2, and worker3, if you filter by node.type:
docker service create --replicas 3 --constraint 'node.type == mysql' --name mysql-service mysql:5.5.
In this article, we went through the typical Swarm administration procedures and options. After showing how to add managers and workers to the cluster, we explained in detail how to update cluster and node properties, how to check the Swarm health, and we encountered Shipyard as a UI. After this focus on infrastructure, now it’s time to use our Swarms.
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…