Ansible Playbook Patterns I Use in Production — Idempotency, Handlers & Roles
The difference between a junior Ansible playbook and one you can run safely against 500 production servers is not length or complexity — it is discipline. Idempotent tasks that can be rerun without side effects. Handlers that restart services only when they need to. Roles that make your playbooks composable. Check mode for dry runs. Tags for surgical reruns. These are the patterns I use every day on real infrastructure, and they are what separate "works on my machine" from "works on 500 machines every night at 3 AM."
Ansible gets a bad reputation for being fragile at scale, but that is almost always a playbook problem, not an Ansible problem. This guide shows you the patterns I use to keep playbooks safe, fast, and debuggable in production.
Pattern 1 — Every Task Must Be Idempotent
Idempotency means running the playbook twice produces the same result as running it once. The first run changes things. The second run reports ok everywhere because nothing needed to change. This is the single most important property of a production playbook.
Ansible's built-in modules are idempotent by design. The problem starts when you reach for shell or command.
# BAD — runs every time, not idempotent
- name: Install Node.js
ansible.builtin.shell: |
curl -fsSL https://rpm.nodesource.com/setup_20.x | bash -
yum install -y nodejs
# GOOD — uses creates to make it idempotent
- name: Install Node.js
ansible.builtin.shell: |
curl -fsSL https://rpm.nodesource.com/setup_20.x | bash -
yum install -y nodejs
args:
creates: /usr/bin/node
The creates argument tells Ansible: "skip this task if /usr/bin/node already exists." One line turns a dangerous task into a safe one. Similarly, removes skips a task if a file does not exist.
Even better — use native modules whenever possible:
# BEST — let the module handle idempotency
- name: Ensure Node.js is installed
ansible.builtin.package:
name: nodejs
state: present
Rules I follow: shell and command are a last resort. If a module exists, use it. If you must use shell, always add creates, removes, or a changed_when condition that tells Ansible when the task actually changed something.
Pattern 2 — Handlers for Service Restarts
Restarting NGINX on every playbook run is wasteful and disruptive. You only want to restart it if the config actually changed. That is exactly what handlers are for.
- name: Configure NGINX
hosts: webservers
become: yes
tasks:
- name: Deploy nginx.conf
ansible.builtin.template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
mode: '0644'
validate: 'nginx -t -c %s'
notify: reload nginx
- name: Deploy site config
ansible.builtin.template:
src: templates/site.conf.j2
dest: /etc/nginx/conf.d/site.conf
notify: reload nginx
handlers:
- name: reload nginx
ansible.builtin.service:
name: nginx
state: reloaded
Three things to notice:
- notify only fires on change. If the template task reports
ok(no change), the handler does not run. If it reportschanged, the handler queues up. - Handlers run once per play. Both templates can notify
reload nginx, but NGINX reloads exactly once at the end. validateprevents broken configs from being written. Ansible renders the template, runsnginx -tagainst it, and only replaces the live file if the syntax check passes. This is a huge safety net — a typo in your template cannot take down all your web servers.
Reload vs restart: use reloaded when the service supports hot config reload (NGINX, HAProxy, rsyslog). Use restarted only when you actually need a process restart (after a binary upgrade). Reloads are invisible to traffic; restarts drop connections.
Pattern 3 — Roles for Reusable Logic
Once a playbook grows past 50 tasks, it becomes unreadable. That is when you convert it to roles. A role is a standardized directory structure for packaging tasks, handlers, templates, variables, and files together.
roles/
nginx/
tasks/
main.yml # task entry point
install.yml
configure.yml
handlers/
main.yml # reload nginx, restart nginx
templates/
nginx.conf.j2
site.conf.j2
defaults/
main.yml # default variables (overridable)
vars/
main.yml # role variables (higher priority)
files/
ssl-params.conf # static files to copy
meta/
main.yml # dependencies on other roles
Your playbook becomes a thin wrapper:
- name: Configure production web servers
hosts: webservers
become: yes
roles:
- common
- nginx
- node_exporter # for Prometheus metrics
- fail2ban
Each role is independently testable, reusable across projects, and has a clear contract through its defaults/main.yml. Want to override the NGINX port? Set nginx_port: 8080 in your playbook. No need to touch the role itself.
I keep all my roles in a dedicated repo and pull them in with requirements.yml using ansible-galaxy. That way multiple projects can share the same battle-tested nginx or docker role, and a fix in one place benefits every project.
Free Download
Ansible — 50 Production Interview Questions (PDF)
Standalone Ansible deep-dive: architecture, dynamic inventories, roles, handlers, and the production patterns from this guide — in three layers (answer, mental model, production depth).
Download PDFPattern 4 — Check Mode Before Every Production Run
Ansible has a dry-run flag that tells you what would change without actually changing anything. I never run a production playbook without it first.
# Dry run — shows what would change
ansible-playbook -i inventory/prod.yml site.yml --check
# Dry run with diffs — shows exactly what lines would change in files
ansible-playbook -i inventory/prod.yml site.yml --check --diff
# Run for real
ansible-playbook -i inventory/prod.yml site.yml
--check --diff is the closest Ansible equivalent to terraform plan. It shows you the unified diff of every file that would change, which packages would be installed, and which services would be restarted. Review the output, commit to the change, then run for real.
Caveat: modules that fetch data (like uri calls to external APIs) cannot always simulate perfectly. Test your specific playbook's check-mode behavior before relying on it for critical changes.
Pattern 5 — Tags for Surgical Reruns
When your playbook has 500 tasks and you just need to push an NGINX config change, rerunning the entire playbook is overkill. Tags let you run only the tasks you need.
- name: Install NGINX
ansible.builtin.package:
name: nginx
state: present
tags: [packages, nginx]
- name: Deploy nginx.conf
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: reload nginx
tags: [config, nginx]
- name: Ensure nginx is running
ansible.builtin.service:
name: nginx
state: started
enabled: yes
tags: [service, nginx]
Now you can run precisely what you need:
# Only run nginx-related tasks
ansible-playbook site.yml --tags nginx
# Only run config tasks across all services
ansible-playbook site.yml --tags config
# Skip package installs (you are just tweaking config)
ansible-playbook site.yml --skip-tags packages
My tagging convention: tag by component (nginx, postgres, redis), by phase (packages, config, service), and occasionally by risk (disruptive for tasks that cause downtime). With those three axes you can always run exactly what you need.
Pattern 6 — Serial Rollouts for Safety
By default, Ansible runs the same task on all hosts in parallel (up to forks, usually 5). That is fine for reads and config pushes, but dangerous for disruptive changes. For rolling deployments, use serial.
- name: Rolling NGINX config update
hosts: webservers
become: yes
serial: "25%" # update 25% of hosts at a time
max_fail_percentage: 10 # abort if more than 10% fail
tasks:
- name: Drain this host from load balancer
ansible.builtin.uri:
url: "http://lb/drain/{{ inventory_hostname }}"
method: POST
- name: Deploy new nginx config
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: reload nginx
- name: Wait for nginx health check
ansible.builtin.uri:
url: "http://{{ inventory_hostname }}/health"
status_code: 200
register: health
until: health.status == 200
retries: 10
delay: 3
- name: Re-add this host to load balancer
ansible.builtin.uri:
url: "http://lb/enable/{{ inventory_hostname }}"
method: POST
This is a proper rolling update: drain, deploy, health check, re-enable — 25% at a time, aborting if more than 10% fail. A change that would have taken the whole fleet offline with a config typo now fails fast on a small batch and stops before damage spreads.
Pattern 7 — Variables Live in Group Vars, Not Playbooks
Never hardcode environment-specific values in your playbooks. Use group_vars/ and host_vars/ for everything that changes between environments.
inventory/
prod/
hosts.yml
group_vars/
all.yml # defaults for everything
webservers.yml # web-tier specific
databases.yml # db-tier specific
host_vars/
db1.prod.yml # host-specific overrides
staging/
hosts.yml
group_vars/
all.yml
Example group_vars/webservers.yml:
nginx_worker_processes: auto
nginx_worker_connections: 4096
nginx_client_max_body_size: 100M
app_version: "2.14.3"
deploy_user: deployer
Your playbook stays environment-agnostic. Switching from staging to prod is a matter of pointing to a different inventory directory. No playbook edits, no if statements, no drift between environments.
Pattern 8 — Secrets Go Through Ansible Vault
Never put passwords, API tokens, or private keys in plain-text YAML. Use Ansible Vault.
# Encrypt a variable file
ansible-vault encrypt group_vars/prod/secrets.yml
# Edit it later
ansible-vault edit group_vars/prod/secrets.yml
# Run a playbook that uses vault-encrypted variables
ansible-playbook site.yml --vault-password-file ~/.vault-pass
In CI/CD, store the vault password in a secrets manager (AWS Secrets Manager, HashiCorp Vault, or just a protected environment variable) and pass it to the playbook run. Plain-text secrets should never hit Git, period.
The Playbook Checklist I Use Before Production
Before I run any playbook against production, I work through this list:
- Is every task idempotent? Can I run this twice without side effects? If there is a
shell/command, does it havecreates,removes, orchanged_when? - Do service restarts go through handlers? No direct
service: restartedin tasks. - Do config templates validate before replacing? Use the
validateargument where supported. - Have I run
--check --diffand reviewed the output? - Are secrets in Vault, not in plain YAML?
- For disruptive changes, is
serialset to a safe percentage withmax_fail_percentage? - Are environment-specific values in
group_vars, not hardcoded? - Can I rerun just this change with
--tagsif I need to fix something?
Every "yes" is a reason the playbook is safer to run. Every "no" is a scar waiting to happen.
Frequently Asked Questions
What does idempotent mean in Ansible?
Idempotent means running a playbook twice produces the same state as running it once. The first run makes changes; the second detects no changes are needed. This is what makes playbooks safe to rerun on a schedule or in CI.
What are Ansible handlers?
Handlers are tasks that run only when notified, and only once per play regardless of how many notifications. Classic use: restart NGINX only if its config actually changed, not on every run.
What is an Ansible role?
A role is a standard directory structure bundling tasks, handlers, templates, variables, and files for a specific purpose. Roles make playbooks reusable, testable, and composable across projects.
What is check mode?
Check mode (--check) is a dry-run flag that simulates changes without applying them. Combined with --diff, it shows exactly what would change on each host — Ansible's equivalent of terraform plan.
Should I use tags in production playbooks?
Yes. Tags let you run a subset of tasks, which is essential for playbooks with hundreds of tasks. Tag by component (nginx, postgres), phase (packages, config, service), or risk level.
Next Steps
If you want to go further with Ansible and the broader DevOps toolchain:
- Ansible vs Terraform — When to Use Which — how Ansible fits alongside Terraform in production
- NGINX Reverse Proxy on EC2 with SSL — a great first target for an Ansible role
- GitHub Actions OIDC for AWS — run Ansible playbooks from CI without access keys
- Free DevOps resources — including the standalone Ansible 50Q interview PDF