ANSIBLE

Ansible Playbook Patterns I Use in Production — Idempotency, Handlers & Roles

By Akshay Ghalme·April 12, 2026·11 min read

The difference between a junior Ansible playbook and one you can run safely against 500 production servers is not length or complexity — it is discipline. Idempotent tasks that can be rerun without side effects. Handlers that restart services only when they need to. Roles that make your playbooks composable. Check mode for dry runs. Tags for surgical reruns. These are the patterns I use every day on real infrastructure, and they are what separate "works on my machine" from "works on 500 machines every night at 3 AM."

Ansible gets a bad reputation for being fragile at scale, but that is almost always a playbook problem, not an Ansible problem. This guide shows you the patterns I use to keep playbooks safe, fast, and debuggable in production.

Pattern 1 — Every Task Must Be Idempotent

Idempotency means running the playbook twice produces the same result as running it once. The first run changes things. The second run reports ok everywhere because nothing needed to change. This is the single most important property of a production playbook.

Ansible's built-in modules are idempotent by design. The problem starts when you reach for shell or command.

# BAD — runs every time, not idempotent
- name: Install Node.js
  ansible.builtin.shell: |
    curl -fsSL https://rpm.nodesource.com/setup_20.x | bash -
    yum install -y nodejs

# GOOD — uses creates to make it idempotent
- name: Install Node.js
  ansible.builtin.shell: |
    curl -fsSL https://rpm.nodesource.com/setup_20.x | bash -
    yum install -y nodejs
  args:
    creates: /usr/bin/node

The creates argument tells Ansible: "skip this task if /usr/bin/node already exists." One line turns a dangerous task into a safe one. Similarly, removes skips a task if a file does not exist.

Even better — use native modules whenever possible:

# BEST — let the module handle idempotency
- name: Ensure Node.js is installed
  ansible.builtin.package:
    name: nodejs
    state: present

Rules I follow: shell and command are a last resort. If a module exists, use it. If you must use shell, always add creates, removes, or a changed_when condition that tells Ansible when the task actually changed something.

Pattern 2 — Handlers for Service Restarts

Restarting NGINX on every playbook run is wasteful and disruptive. You only want to restart it if the config actually changed. That is exactly what handlers are for.

- name: Configure NGINX
  hosts: webservers
  become: yes
  tasks:
    - name: Deploy nginx.conf
      ansible.builtin.template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        mode: '0644'
        validate: 'nginx -t -c %s'
      notify: reload nginx

    - name: Deploy site config
      ansible.builtin.template:
        src: templates/site.conf.j2
        dest: /etc/nginx/conf.d/site.conf
      notify: reload nginx

  handlers:
    - name: reload nginx
      ansible.builtin.service:
        name: nginx
        state: reloaded

Three things to notice:

  1. notify only fires on change. If the template task reports ok (no change), the handler does not run. If it reports changed, the handler queues up.
  2. Handlers run once per play. Both templates can notify reload nginx, but NGINX reloads exactly once at the end.
  3. validate prevents broken configs from being written. Ansible renders the template, runs nginx -t against it, and only replaces the live file if the syntax check passes. This is a huge safety net — a typo in your template cannot take down all your web servers.

Reload vs restart: use reloaded when the service supports hot config reload (NGINX, HAProxy, rsyslog). Use restarted only when you actually need a process restart (after a binary upgrade). Reloads are invisible to traffic; restarts drop connections.

Pattern 3 — Roles for Reusable Logic

Once a playbook grows past 50 tasks, it becomes unreadable. That is when you convert it to roles. A role is a standardized directory structure for packaging tasks, handlers, templates, variables, and files together.

roles/
  nginx/
    tasks/
      main.yml          # task entry point
      install.yml
      configure.yml
    handlers/
      main.yml          # reload nginx, restart nginx
    templates/
      nginx.conf.j2
      site.conf.j2
    defaults/
      main.yml          # default variables (overridable)
    vars/
      main.yml          # role variables (higher priority)
    files/
      ssl-params.conf   # static files to copy
    meta/
      main.yml          # dependencies on other roles

Your playbook becomes a thin wrapper:

- name: Configure production web servers
  hosts: webservers
  become: yes
  roles:
    - common
    - nginx
    - node_exporter    # for Prometheus metrics
    - fail2ban

Each role is independently testable, reusable across projects, and has a clear contract through its defaults/main.yml. Want to override the NGINX port? Set nginx_port: 8080 in your playbook. No need to touch the role itself.

I keep all my roles in a dedicated repo and pull them in with requirements.yml using ansible-galaxy. That way multiple projects can share the same battle-tested nginx or docker role, and a fix in one place benefits every project.

Free Download

Ansible — 50 Production Interview Questions (PDF)

Standalone Ansible deep-dive: architecture, dynamic inventories, roles, handlers, and the production patterns from this guide — in three layers (answer, mental model, production depth).

Download PDF

Pattern 4 — Check Mode Before Every Production Run

Ansible has a dry-run flag that tells you what would change without actually changing anything. I never run a production playbook without it first.

# Dry run — shows what would change
ansible-playbook -i inventory/prod.yml site.yml --check

# Dry run with diffs — shows exactly what lines would change in files
ansible-playbook -i inventory/prod.yml site.yml --check --diff

# Run for real
ansible-playbook -i inventory/prod.yml site.yml

--check --diff is the closest Ansible equivalent to terraform plan. It shows you the unified diff of every file that would change, which packages would be installed, and which services would be restarted. Review the output, commit to the change, then run for real.

Caveat: modules that fetch data (like uri calls to external APIs) cannot always simulate perfectly. Test your specific playbook's check-mode behavior before relying on it for critical changes.

Pattern 5 — Tags for Surgical Reruns

When your playbook has 500 tasks and you just need to push an NGINX config change, rerunning the entire playbook is overkill. Tags let you run only the tasks you need.

- name: Install NGINX
  ansible.builtin.package:
    name: nginx
    state: present
  tags: [packages, nginx]

- name: Deploy nginx.conf
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: reload nginx
  tags: [config, nginx]

- name: Ensure nginx is running
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: yes
  tags: [service, nginx]

Now you can run precisely what you need:

# Only run nginx-related tasks
ansible-playbook site.yml --tags nginx

# Only run config tasks across all services
ansible-playbook site.yml --tags config

# Skip package installs (you are just tweaking config)
ansible-playbook site.yml --skip-tags packages

My tagging convention: tag by component (nginx, postgres, redis), by phase (packages, config, service), and occasionally by risk (disruptive for tasks that cause downtime). With those three axes you can always run exactly what you need.

Pattern 6 — Serial Rollouts for Safety

By default, Ansible runs the same task on all hosts in parallel (up to forks, usually 5). That is fine for reads and config pushes, but dangerous for disruptive changes. For rolling deployments, use serial.

- name: Rolling NGINX config update
  hosts: webservers
  become: yes
  serial: "25%"           # update 25% of hosts at a time
  max_fail_percentage: 10 # abort if more than 10% fail
  tasks:
    - name: Drain this host from load balancer
      ansible.builtin.uri:
        url: "http://lb/drain/{{ inventory_hostname }}"
        method: POST

    - name: Deploy new nginx config
      ansible.builtin.template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: reload nginx

    - name: Wait for nginx health check
      ansible.builtin.uri:
        url: "http://{{ inventory_hostname }}/health"
        status_code: 200
      register: health
      until: health.status == 200
      retries: 10
      delay: 3

    - name: Re-add this host to load balancer
      ansible.builtin.uri:
        url: "http://lb/enable/{{ inventory_hostname }}"
        method: POST

This is a proper rolling update: drain, deploy, health check, re-enable — 25% at a time, aborting if more than 10% fail. A change that would have taken the whole fleet offline with a config typo now fails fast on a small batch and stops before damage spreads.

Pattern 7 — Variables Live in Group Vars, Not Playbooks

Never hardcode environment-specific values in your playbooks. Use group_vars/ and host_vars/ for everything that changes between environments.

inventory/
  prod/
    hosts.yml
    group_vars/
      all.yml          # defaults for everything
      webservers.yml   # web-tier specific
      databases.yml    # db-tier specific
    host_vars/
      db1.prod.yml     # host-specific overrides
  staging/
    hosts.yml
    group_vars/
      all.yml

Example group_vars/webservers.yml:

nginx_worker_processes: auto
nginx_worker_connections: 4096
nginx_client_max_body_size: 100M
app_version: "2.14.3"
deploy_user: deployer

Your playbook stays environment-agnostic. Switching from staging to prod is a matter of pointing to a different inventory directory. No playbook edits, no if statements, no drift between environments.

Pattern 8 — Secrets Go Through Ansible Vault

Never put passwords, API tokens, or private keys in plain-text YAML. Use Ansible Vault.

# Encrypt a variable file
ansible-vault encrypt group_vars/prod/secrets.yml

# Edit it later
ansible-vault edit group_vars/prod/secrets.yml

# Run a playbook that uses vault-encrypted variables
ansible-playbook site.yml --vault-password-file ~/.vault-pass

In CI/CD, store the vault password in a secrets manager (AWS Secrets Manager, HashiCorp Vault, or just a protected environment variable) and pass it to the playbook run. Plain-text secrets should never hit Git, period.

The Playbook Checklist I Use Before Production

Before I run any playbook against production, I work through this list:

  1. Is every task idempotent? Can I run this twice without side effects? If there is a shell/command, does it have creates, removes, or changed_when?
  2. Do service restarts go through handlers? No direct service: restarted in tasks.
  3. Do config templates validate before replacing? Use the validate argument where supported.
  4. Have I run --check --diff and reviewed the output?
  5. Are secrets in Vault, not in plain YAML?
  6. For disruptive changes, is serial set to a safe percentage with max_fail_percentage?
  7. Are environment-specific values in group_vars, not hardcoded?
  8. Can I rerun just this change with --tags if I need to fix something?

Every "yes" is a reason the playbook is safer to run. Every "no" is a scar waiting to happen.

Frequently Asked Questions

What does idempotent mean in Ansible?

Idempotent means running a playbook twice produces the same state as running it once. The first run makes changes; the second detects no changes are needed. This is what makes playbooks safe to rerun on a schedule or in CI.

What are Ansible handlers?

Handlers are tasks that run only when notified, and only once per play regardless of how many notifications. Classic use: restart NGINX only if its config actually changed, not on every run.

What is an Ansible role?

A role is a standard directory structure bundling tasks, handlers, templates, variables, and files for a specific purpose. Roles make playbooks reusable, testable, and composable across projects.

What is check mode?

Check mode (--check) is a dry-run flag that simulates changes without applying them. Combined with --diff, it shows exactly what would change on each host — Ansible's equivalent of terraform plan.

Should I use tags in production playbooks?

Yes. Tags let you run a subset of tasks, which is essential for playbooks with hundreds of tasks. Tag by component (nginx, postgres), phase (packages, config, service), or risk level.


Next Steps

If you want to go further with Ansible and the broader DevOps toolchain:

  1. Ansible vs Terraform — When to Use Which — how Ansible fits alongside Terraform in production
  2. NGINX Reverse Proxy on EC2 with SSL — a great first target for an Ansible role
  3. GitHub Actions OIDC for AWS — run Ansible playbooks from CI without access keys
  4. Free DevOps resources — including the standalone Ansible 50Q interview PDF
AG

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

More Guides & Terraform Modules

Every guide comes with a matching open-source Terraform module you can deploy right away.