MONITORINGAWS

How to Set Up Prometheus + Grafana on AWS EC2 with Terraform

By Akshay Ghalme·April 8, 2026·18 min read

Prometheus collects and stores metrics. Grafana visualizes them. Together, they give you complete infrastructure monitoring — CPU, memory, disk, network, and custom application metrics — with powerful alerting. This guide sets up both on AWS EC2 using Docker Compose, with Terraform for the infrastructure.

I’ve caught production issues hours before customers noticed them using this exact setup. Monitoring is not optional — it’s how you sleep at night while running production infrastructure.

What Prometheus and Grafana Actually Do

Prometheus is a pull-based monitoring system. It scrapes metrics from your applications and infrastructure at regular intervals, stores them as time-series data, and provides PromQL — a powerful query language for analyzing metrics.

Grafana is a visualization platform. It connects to Prometheus (and many other data sources), lets you build dashboards, and sends alerts when things go wrong.

Node Exporter runs on each server and exposes system metrics (CPU, memory, disk, network) that Prometheus scrapes.

Prerequisites

  • AWS account with EC2 permissions
  • Terraform installed locally
  • Basic understanding of EC2 and security groups
  • SSH key pair in your AWS region

Infrastructure with Terraform

First, create the EC2 instance with the right security group:

resource "aws_security_group" "monitoring" {
  name        = "monitoring-stack"
  description = "Prometheus + Grafana"
  vpc_id      = var.vpc_id

  ingress {
    description = "Grafana"
    from_port   = 3000
    to_port     = 3000
    protocol    = "tcp"
    cidr_blocks = [var.my_ip]
  }

  ingress {
    description = "Prometheus"
    from_port   = 9090
    to_port     = 9090
    protocol    = "tcp"
    cidr_blocks = [var.my_ip]
  }

  ingress {
    description = "SSH"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [var.my_ip]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_instance" "monitoring" {
  ami                    = "ami-0abcdef1234567890"  # Ubuntu 22.04
  instance_type          = "t3.small"
  key_name               = var.key_name
  vpc_security_group_ids = [aws_security_group.monitoring.id]
  subnet_id              = var.public_subnet_id

  root_block_device {
    volume_size = 30
    volume_type = "gp3"
  }

  user_data = <<-EOF
    #!/bin/bash
    apt-get update -y
    apt-get install -y docker.io docker-compose
    systemctl enable docker
    systemctl start docker
    usermod -aG docker ubuntu
  EOF

  tags = { Name = "monitoring-stack" }
}

Important: Restrict Prometheus (9090) and Grafana (3000) to your IP only. Never expose them to 0.0.0.0/0.

Installing Prometheus

SSH into the instance and create the config files:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

Create docker-compose.yml:

version: "3.8"

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"
    restart: unless-stopped

volumes:
  prometheus_data:

Start it: docker-compose up -d. Verify at http://<your-ip>:9090. You should see the Prometheus UI with the “prometheus” target showing as UP.

Installing Grafana

Add Grafana to your docker-compose.yml:

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

Restart: docker-compose up -d. Access Grafana at http://<your-ip>:3000. Login with admin / your password.

Add Prometheus as a data source: Settings → Data Sources → Add → Prometheus → URL: http://prometheus:9090 → Save & Test.

Adding Node Exporter for System Metrics

Add node-exporter to docker-compose.yml:

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - "9100:9100"
    restart: unless-stopped

This exposes CPU, memory, disk, filesystem, and network metrics. Prometheus scrapes it automatically (we already configured the scrape target).

Restart and verify: docker-compose up -d. Check Prometheus targets page — node-exporter should show as UP.

Building Your First Dashboard

The fastest way: import a community dashboard.

  1. In Grafana: Dashboards → Import
  2. Enter dashboard ID: 1860 (Node Exporter Full)
  3. Select your Prometheus data source
  4. Click Import

You instantly get CPU, memory, disk, network, and system load panels.

To create a custom CPU usage panel:

100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

For memory usage:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

For disk usage:

(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100

Setting Up Alerts

In Grafana: Alerting → Alert Rules → New Alert Rule.

CPU alert example:

  • Query: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
  • Condition: Is above 80
  • For: 5 minutes (avoids false positives from brief spikes)
  • Summary: “High CPU usage on monitoring server”

Set up notification channels: Alerting → Contact Points → Add. Options: Email, Slack webhook, PagerDuty, etc.

What to alert on:

  • CPU > 80% for 5 min
  • Memory > 85% for 5 min
  • Disk > 90%
  • Instance down (up == 0)

What NOT to alert on: Brief spikes, cosmetic metrics, things you can’t act on at 3 AM. Alert fatigue kills monitoring.

Monitoring Your Application

System metrics are just the start. For application monitoring, instrument your code with Prometheus client libraries:

  • Python: prometheus_client
  • Node.js: prom-client
  • Go: prometheus/client_golang
  • Java: micrometer

Track the 4 golden signals:

  1. Latency — how long requests take
  2. Traffic — requests per second
  3. Errors — error rate (5xx responses)
  4. Saturation — how full your system is

Production Hardening

  • Reverse proxy with SSL: Put Nginx in front of Grafana with Let’s Encrypt. Never run Grafana on plain HTTP in production.
  • Authentication: Disable anonymous access (GF_AUTH_ANONYMOUS_ENABLED=false). Enable OAuth if your team uses Google/GitHub SSO.
  • Prometheus retention: Set --storage.tsdb.retention.time=30d (or whatever fits your disk). Default 15 days.
  • Backup Grafana dashboards: Export as JSON regularly, or use Grafana’s built-in backup. Losing dashboards after weeks of tuning is painful.
  • Resource limits: Add mem_limit and cpus to Docker Compose to prevent runaway processes.

Common Mistakes to Avoid

  • Alert fatigue: Alerting on everything means you ignore everything. Be selective.
  • No retention limits: Prometheus will fill your disk. Always set --storage.tsdb.retention.time.
  • Exposing ports to the internet: Prometheus has no built-in auth. Restrict with security groups.
  • No persistent volumes: Without Docker volumes, you lose all data on container restart.
  • Skipping memory metrics: CPU is not enough. Install node-exporter for the full picture.

Frequently Asked Questions

How much does it cost to run Prometheus and Grafana on AWS?

A t3.small instance runs about $15/month. Add $2.40/month for 30 GB gp3 storage. Total: under $20/month for a complete monitoring stack.

Can I use Prometheus with ECS or EKS?

Yes. For ECS, use Prometheus ECS service discovery. For EKS, the kube-prometheus-stack Helm chart is the standard — it bundles Prometheus, Grafana, and Alertmanager with Kubernetes-native service discovery.

Prometheus vs CloudWatch — which should I use?

Use CloudWatch for native AWS service metrics (RDS, ALB, Lambda). Use Prometheus for custom application metrics, PromQL queries, and multi-cloud setups. Many teams use both.

How long does Prometheus store data?

Default: 15 days. Configure with --storage.tsdb.retention.time=30d (or any duration). For long-term storage beyond months, look at Thanos or Cortex.

Is Grafana free?

Yes. Grafana OSS is fully open-source. Self-host it for free. Grafana Cloud also has a free tier (10k metrics, 50 GB logs, 50 GB traces).

AG

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

More Guides & Terraform Modules

Every guide comes with a matching open-source Terraform module you can deploy right away.