In this article by Jesse Keating, the author of the book Mastering Ansible - Second Edition, we will cover the following topics in order to lay the foundation for mastering Ansible:

Ansible version and configuration

Inventory parsing and data sources

Variable types and locations

Variable precedence

(For more resources related to this topic, see here.)

This article provides a exploration of the architecture and design of how Ansible goes about performing tasks on your behalf. We will cover basic concepts of inventory parsing and how the data is discovered. We will also cover variable types and find out where variables can be located, the scope they can be used in, and how precedence is determined when variables are defined in more than one location.

Ansible version and configuration

There are many documents out there that cover installing Ansible in a way that is appropriate for the operating system and version that you might be using. This article will assume the use of the Ansible 2.2 version. To discover the version in use on a system with Ansible already installed, make use of the version argument, that is, either ansible or ansible-playbook:

system-architecture-and-design-ansible-img-0

Inventory parsing and data sources

In Ansible, nothing happens without an inventory. Even ad hoc actions performed on localhost require an inventory, even if that inventory consists just of the localhost. The inventory is the most basic building block of Ansible architecture. When executing ansible or ansible-playbook, an inventory must be referenced. Inventories are either files or directories that exist on the same system that runs ansible or ansible-playbook. The location of the inventory can be referenced at runtime with the inventory file (-i) argument, or by defining the path in an Ansible config file.

Inventories can be static or dynamic, or even a combination of both, and Ansible is not limited to a single inventory. The standard practice is to split inventories across logical boundaries, such as staging and production, allowing an engineer to run a set of plays against their staging environment for validation, and then follow with the same exact plays run against the production inventory set.

Static inventory

The static inventory is the most basic of all the inventory options. Typically, a static inventory will consist of a single file in the ini format. Here is an example of a static inventory file describing a single host, mastery.example.name:

mastery.example.name

That is all there is to it. Simply list the names of the systems in your inventory. Of course, this does not take full advantage of all that an inventory has to offer. If every name were listed like this, all plays would have to reference specific host names, or the special all group. This can be quite tedious when developing a playbook that operates across different sets of your infrastructure. At the very least, hosts should be arranged into groups. A design pattern that works well is to arrange your systems into groups based on expected functionality. At first, this may seem difficult if you have an environment where single systems can play many different roles, but that is perfectly fine. Systems in an inventory can exist in more than one group, and groups can even consist of other groups! Additionally, when listing groups and hosts, it's possible to list hosts without a group. These would have to be listed first, before any other group is defined.

Let's build on our previous example and expand our inventory with a few more hosts and some groupings:

[web]
mastery.example.name

[dns]
backend.example.name

[database]
backend.example.name

[frontend:children]
web

[backend:children]
dns
database

What we have created here is a set of three groups with one system in each, and then two more groups, which logically group all three together. Yes, that's right; you can have groups of groups. The syntax used here is [groupname:children], which indicates to Ansible's inventory parser that this group by the name of groupname is nothing more than a grouping of other groups. The children in this case are the names of the other groups. This inventory now allows writing plays against specific hosts, low-level role-specific groups, or high-level logical groupings, or any combination.

By utilizing generic group names, such as dns and database, Ansible plays can reference these generic groups rather than the explicit hosts within. An engineer can create one inventory file that fills in these groups with hosts from a preproduction staging environment and another inventory file with the production versions of these groupings. The playbook content does not need to change when executing on either staging or production environment because it refers to the generic group names that exist in both inventories. Simply refer to the right inventory to execute it in the desired environment.

Dynamic inventories

A static inventory is great and enough for many situations. But there are times when a statically written set of hosts is just too unwieldy to manage. Consider situations where inventory data already exists in a different system, such as LDAP, a cloud computing provider, or an in-house CMDB (inventory, asset tracking, and data warehousing) system. It would be a waste of time and energy to duplicate that data, and in the modern world of on-demand infrastructure, that data would quickly grow stale or disastrously incorrect.

Another example of when a dynamic inventory source might be desired is when your site grows beyond a single set of playbooks. Multiple playbook repositories can fall into the trap of holding multiple copies of the same inventory data, or complicated processes have to be created to reference a single copy of the data. An external inventory can easily be leveraged to access the common inventory data stored outside of the playbook repository to simplify the setup. Thankfully, Ansible is not limited to static inventory files.

A dynamic inventory source (or plugin) is an executable script that Ansible will call at runtime to discover real-time inventory data. This script may reach out into external data sources and return data, or it can just parse local data that already exists but may not be in the Ansible inventory ini format. While it is possible and easy to develop your own dynamic inventory source, Ansible provides a number of example inventory plugins, including but not limited to:

OpenStack Nova

Rackspace Public Cloud

DigitalOcean

Linode

Amazon EC2

Google Compute Engine

Microsoft Azure

Docker

Vagrant

Many of these plugins require some level of configuration, such as user credentials for EC2 or authentication endpoint for OpenStack Nova. Since it is not possible to configure additional arguments for Ansible to pass along to the inventory script, the configuration for the script must either be managed via an ini config file read from a known location or environment variables read from the shell environment used to execute ansible or ansible-playbook.

When ansible or ansible-playbook is directed at an executable file for an inventory source, Ansible will execute that script with a single argument, --list. This is so that Ansible can get a listing of the entire inventory in order to build up its internal objects to represent the data. Once that data is built up, Ansible will then execute the script with a different argument for every host in the data to discover variable data. The argument used in this execution is --host <hostname>, which will return any variable data specific to that host.

Variable types and location

Variables are a key component to the Ansible design. Variables allow for dynamic play content and reusable plays across different sets of inventory. Anything beyond the very basic of Ansible use will utilize variables. Understanding the different variable types and where they can be located, as well as learning how to access external data or prompt users to populate variable data is the key to mastering Ansible.

Variable types

Before diving into the precedence of variables, we must first understand the various types and subtypes of variables available to Ansible, their location, and where they are valid for use.

The first major variable type is inventory variables. These are the variables that Ansible gets by way of the inventory. These can be defined as variables that are specific to host_vars to individual hosts or applicable to entire groups as group_vars. These variables can be written directly into the inventory file, delivered by the dynamic inventory plugin, or loaded from the host_vars/<host> or group_vars/<group> directories.

These types of variables might be used to define Ansible behavior when dealing with these hosts or site-specific data related to the applications that these hosts run. Whether a variable comes from host_vars or group_vars, it will be assigned to a host's hostvars. Accessing a host's own variables can be done just by referencing the name, such as {{ foobar }}, and accessing another host's variables can be accomplished by accessing hostvars. For example, to access the foobar variable for examplehost: {{ hostvars['examplehost']['foobar'] }}. These variables have global scope.

The second major variable type is role variables. These are variables specific to a role and are utilized by the role tasks and have scope only within the role that they are defined in, which is to say that they can only be used within the role. These variables are often supplied as a role default, which are meant to provide a default value for the variable but can easily be overridden when applying the role. When roles are referenced, it is possible to supply variable data at the same time, either by overriding role defaults or creating wholly new data. These variables apply to all hosts within the role and can be accessed directly much like a host's own hostvars.

The third major variable type is play variables. These variables are defined in the control keys of a play, either directly by the vars key or sourced from external files via the vars_files key. Additionally, the play can interactively prompt the user for variable data using vars_prompt. These variables are to be used within the scope of the play and in any tasks or included tasks of the play. The variables apply to all hosts within the play and can be referenced as if they are hostvars.

The fourth variable type is task variables. Task variables are made from data discovered while executing tasks or in the facts gathering phase of a play. These variables are host-specific and are added to the host's hostvars and can be used as such, which also means they have global scope after the point in which they were discovered or defined. Variables of this type can be discovered via gather_facts and fact modules (modules that do not alter state but rather return data), populated from task return data via the register task key, or defined directly by a task making use of the set_fact or add_host modules. Data can also be interactively obtained from the operator using the prompt argument to the pause module and registering the result:

name: get the operators name
pause:
  prompt: "Please enter your name"
register: opname

There is one last variable type, the extra variables, or extra-vars type. These are variables supplied on the command line when executing ansible-playbook via --extra-vars. Variable data can be supplied as a list of key=value pairs, a quoted JSON data, or a reference to a YAML-formatted file with variable data defined within:

--extra-vars "foo=bar owner=fred"
--extra-vars '{"services":["nova-api","nova-conductor"]}'
--extra-vars @/path/to/data.yaml

Extra variables are considered global variables. They apply to every host and have scope throughout the entire playbook.

Accessing external data

Data for role variables, play variables, and task variables can also come from external sources. Ansible provides a mechanism to access and evaluate data from the control machine (the machine running ansible-playbook). The mechanism is called a lookup plugin, and a number of them come with Ansible. These plugins can be used to lookup or access data by reading files, generate and locally store passwords on the Ansible host for later reuse, evaluate environment variables, pipe data in from executables, access data in the Redis or etcd systems, render data from template files, query dnstxt records, and more. The syntax is as follows:

lookup('<plugin_name>', 'plugin_argument')

For example, to use the mastery value from etcd in a debug task:

name: show data from etcd
debug: msg: "{{ lookup('etcd', 'mastery') }}"

Lookups are evaluated when the task referencing them is executed, which allows for dynamic data discovery. To reuse a particular lookup in multiple tasks and reevaluate it each time, a playbook variable can be defined with a lookup value. Each time the playbook variable is referenced, the lookup will be executed potentially providing different values over time.

Variable precedence

There are a few major types of variables that can be defined in a myriad of locations. This leads to a very important question, what happens when the same variable name is used in multiple locations? Ansible has a precedence for loading variable data, and thus it has an order and a definition to decide which variable will win. Variable value overriding is an advanced usage of Ansible, so it is important to fully understand the semantics before attempting such a scenario.

Precedence order

Ansible defines the precedence order as follows:

Extra vars (from command line) always win

Task vars (only for the specific task)

Block vars (only for the tasks within the block)

Role and include vars

Vars created with set_fact

Vars created with the register task directive

Play vars_files

Play vars_prompt

Play vars

Host facts

Playbook host_vars

Playbook group_vars

Inventory host_vars

Inventory group_vars

Inventory vars

Role defaults

Merging hashes

We focused on the precedence in which variables will override each other. The default behavior of Ansible is that any overriding definition for a variable name will completely mask the previous definition of that variable. However, that behavior can be altered for one type of variable, the hash. A hash variable (a dictionary in Python terms) is a dataset of keys and values. Values can be of different types for each key, and can even be hashes themselves for complex data structures.

In some advanced scenarios, it is desirable to replace just one bit of a hash or add to an existing hash rather than replacing the hash all together. To unlock this ability, a configuration change is necessary in an Ansible config file. The config entry is hash_behavior, which takes one of replace, or merge. A setting of merge will instruct Ansible to merge or blend the values of two hashes when presented with an override scenario rather than the default of replace, which will completely replace the old variable data with the new data.

Let's walk through an example of the two behaviors. We will start with a hash loaded with data and simulate a scenario where a different value for the hash is provided as a higher priority variable:

Starting data:

hash_var:
  fred:
    home: Seattle
    transport: Bicycle

New data loaded via include_vars:

hash_var:
  fred:
    transport: Bus

With the default behavior, the new value for hash_var will be:

hash_var:
  fred:
    transport: Bus

However, if we enable the merge behavior we would get the following result:

hash_var:
  fred:
    home: Seattle
    transport: Bus

There are even more nuances and undefined behaviors when using merge, and as such, it is strongly recommended to only use this setting if absolutely needed.

Summary

In this article, we covered key design and architecture concepts of Ansible, such as version and configuration, variable types and locations, and variable precedence.