Improving Ansible performance

[blog poststandalone]

I've been using Ansible for years and it is great. It allows you to manage your servers and even having some sort of version control over them. The fact that it only depends on SSH and no agent in the servers you manage is also very convenient. However, this design also has its drawbacks: it can be painfully slow.

In this lengthy post I will show you some good practices, configuration optimizations and modifications to increase the Ansible performance so your playbooks take much less time to finish.

Good practices

The way you set up your roles and playbooks has an impact on performance. Here are some good practices to gain some seconds in your execution times.

Bundle package installations

By default, if you use a loop (or with_items) with a package module like apt, Ansible will execute the package manager once per loop, which is very slow. To prevent this, you can set the squash_actions parameter in Ansible configuration, however this will be soon deprecated, because there is a better way.

Since Ansible 2.3 passing a list of packages to the name parameter of most of package modules is allowed. This way, the package manager will execute only once to install all the packages at the same time.

Try to add as many packages as you can to a single ansible task. You can add them to a basic role instead of installing these different packages in different roles with their own package (or apt) statements.

For example, I have something like this in a role called common, that is added to all my servers.

---
- name: Install basic packages
  apt:
    state: present
    name:
      - apt-config-auto-update
      - apt-transport-https
      - ca-certificates
      - less
      - python3-pip
      - python3-setuptools
      - rsync
      - sudo
      - tmux
      - unattended-upgrades

Use fast repository mirrors

When installing packages you want to download them as fast as possible so configure your servers to use the mirrors that are closer to them.

You can use tools like netselect-apt for Debian will help you with that. If you have servers in different regions, you should configure different mirrors per region for your servers.

You can also consider to use mirrors that are in a geolocated CDN so the URL always resolve to a near server.

Use a high package cache ttl

Package modules like apt allow you to specify a valid_cache_time in Ansible so you even when an update_cache is called, this will not run while the cache is still valid.

It is generally a good idea to set valid_cache_time to one or several hours.

---
- name: Update apt cache
  apt: update_cache=yes cache_valid_time=7200

NOTE: In Debian be sure to have installed apt-config-auto-update so a timestamp file is created when updating the package catalog. Check this bug for more information.

Avoid downloading from 3rd parties

It is not uncommon that during a playbook run you need to install something from an external URL. For example, if you want to install wp-cli, you could run

---
- name: Install wp-cli
  get_url: https://raw.githubusercontent.com/wp-cli/builds/gh-pages/phar/wp-cli.phar
  dest: /usr/bin/wp
  mode: '0755'

However, in my experience, it is better to have the file stored locally so you can grant it's availability, version and download speed.

---
- name: Install wp-cli
  copy:
    src: wp-cli.phar
    dest: /usr/bin/wp
    mode: '0755'

Use free strategy

You may not know that Ansible allows you to use strategies in your playbooks. By default, the linear strategy is used, which executes each task in all hosts at the same time so no host will execute the next task until all hosts finish the previous one.

If your playbook does not require this synchronization or simply if your servers are completely independent from each other, you can use the free strategy in your playbook so your servers won't wait for each other.

---
- hosts: all
  strategy: free
  tasks:
  ...

Use asynchronous tasks

All the tasks are executed in a sequence, but we can break this sequence and run tasks asynchronously using async, poll and until. This can be a complex setup so be sure to check the documentation about it.

Here is a little example from the ansible documentation to give you some idea of how it works.

---
# Requires ansible 1.8+
- name: 'YUM - async task'
  yum:
    name: docker-io
    state: present
  async: 1000
  poll: 0
  register: yum_sleeper

- name: 'YUM - check on async task'
  async_status:
    jid: "{{ yum_sleeper.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 30

Gathering facts

Gathering facts is time consuming so you better be sure to know if you need to gather them and when.

Smart gathering

You can configure Ansible to gather facts only once so if you include a different playbook they are not gathered again. You can do this by setting the gathering to smart in the Ansible configuration file.

[defaults]
gathering = smart

Caching facts

You can cache facts so they do not have to be gathered again in subsequent runs. There are several cache backends that you can configure. Using redis in your ansible.cfg would look like this:

[defaults]
# Use 'redis' as backend
fact_caching = redis
# Prefix for 'redis' keys
fact_caching_prefix = ansible_facts_
# Connection to 'redis'
fact_caching_connection = localhost
# Cache for 6 hours
fact_caching_timeout = 21600

Don't gather facts if you don't have to

If you are not using facts in your playbook, you can skip the fact gathering by setting gather_facts to False.

---
- hosts: databases
  gather_facts: false
  tasks:
  ...

General configuration

Some general Ansible configuration options that will boost performance. Remember that the configuration file will be in /etc/ansible/ansible.cfg or in your home directory in ~/.ansible.cfg.

SSH configuration

As SSH connections are the backbone of the communications with the hosts, we should be sure that we have an optimal configuration for this. There are several settings that we must include for better performance.

First of all we must configure ControlPersist so connections to servers can be recycled. Be sure to also set control_path to store the persistent sockets.

If you are using SSH Public Keys for authentication I suggest to also set PrefferredAuthentications to publickey so you do not run into delays in servers that have GSSAPIAuthentication enabled.

The other important setting is pipelining, which reduces the number of SSH connections required to run some modules.

After the changes your SSH settings should look like this

[defaults]
ssh_args = -o ControlMaster=auto -o ControlPersist=3600s -o PreferredAuthentications=publickey
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r
pipelining = True

Forks

If you are running a playbook in many servers and you have enough processing power in your Ansible server, you might want to increase the number of forks.

Depending on your resources, you can test different values for forks in your configuration file. The default is 5 so you might want to test higher values.

[defaults]
forks = 20

Mitogen

There is a strategy plugin for Ansible called Mitogen. This plugin is able to speed up the performance of your playbooks like magic.

There are some things to take into account, though. There might be conflicts with the current strategies configured in your playbooks and also some tasks my not work with the mitogen_linear strategy (i.e.: raw tasks).

To configure it you only have to download it from the Mitogen website, making sure to get the right version for your Ansible version and uncompress it wherever you want. Then you must add this to your configuration file in the defaults section.

[defaults]
strategy_plugins = /path/to/mitogen/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

Debugging

If you want to know which tasks take more time and have a nice summary, you can add this to your configuration.

[defaults]
callback_whitelist = profile_tasks
stdout_callback = debug

Some results

I've tested the very same playbook with the very same hosts, around 50 servers, with and without all the above optimizations. The difference is incredible!.

Running ansible-playbook -D playbook.yml -C before the above optimizations it took around 2 hours and 15 minutes to complete. Yes, it is a complex playbook with hundreds of tasks.

Running the same command with the above optimizations it took less than 15 minutes!.

The tests were run from very similar machines using the same network connection, being the non optimized machine the one with slightly better system resources. I also run the test twice with the same results, so it is consistent.

I hope you find this post useful and that you can save some of your time by putting these tips into practice.