Yes, Grafana is open-source and free to self-host. Grafana Cloud also has a generous free tier.

MONITORINGAWS

How to Set Up Prometheus + Grafana on AWS EC2 with Terraform

Q: How much does it cost to run Prometheus and Grafana on AWS?

A t3.small instance costs about $15/month. Add EBS storage costs for data retention.

Q: Can I use Prometheus with ECS or EKS?

Yes. Use Prometheus service discovery for ECS/EKS. For EKS, the kube-prometheus-stack Helm chart is the standard approach.

Q: Prometheus vs CloudWatch — which should I use?

CloudWatch for native AWS metrics. Prometheus for custom application metrics, multi-cloud, and when you need PromQL's powerful query language.

Q: How long does Prometheus store data?

Default is 15 days. Configure with --storage.tsdb.retention.time flag. For long-term storage, use Thanos or Cortex.

By Akshay Ghalme·April 8, 2026·18 min read

Prometheus collects and stores metrics. Grafana visualizes them. Together, they give you complete infrastructure monitoring — CPU, memory, disk, network, and custom application metrics — with powerful alerting. This guide sets up both on AWS EC2 using Docker Compose, with Terraform for the infrastructure.

I’ve caught production issues hours before customers noticed them using this exact setup. Monitoring is not optional — it’s how you sleep at night while running production infrastructure.

What Prometheus and Grafana Actually Do

Prometheus is a pull-based monitoring system. It scrapes metrics from your applications and infrastructure at regular intervals, stores them as time-series data, and provides PromQL — a powerful query language for analyzing metrics.

Grafana is a visualization platform. It connects to Prometheus (and many other data sources), lets you build dashboards, and sends alerts when things go wrong.

Node Exporter runs on each server and exposes system metrics (CPU, memory, disk, network) that Prometheus scrapes.

Prerequisites

AWS account with EC2 permissions
Terraform installed locally
Basic understanding of EC2 and security groups
SSH key pair in your AWS region

Infrastructure with Terraform

First, create the EC2 instance with the right security group:

resource "aws_security_group" "monitoring" {
  name        = "monitoring-stack"
  description = "Prometheus + Grafana"
  vpc_id      = var.vpc_id

  ingress {
    description = "Grafana"
    from_port   = 3000
    to_port     = 3000
    protocol    = "tcp"
    cidr_blocks = [var.my_ip]
  }

  ingress {
    description = "Prometheus"
    from_port   = 9090
    to_port     = 9090
    protocol    = "tcp"
    cidr_blocks = [var.my_ip]
  }

  ingress {
    description = "SSH"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [var.my_ip]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_instance" "monitoring" {
  ami                    = "ami-0abcdef1234567890"  # Ubuntu 22.04
  instance_type          = "t3.small"
  key_name               = var.key_name
  vpc_security_group_ids = [aws_security_group.monitoring.id]
  subnet_id              = var.public_subnet_id

  root_block_device {
    volume_size = 30
    volume_type = "gp3"
  }

  user_data = <<-EOF
    #!/bin/bash
    apt-get update -y
    apt-get install -y docker.io docker-compose
    systemctl enable docker
    systemctl start docker
    usermod -aG docker ubuntu
  EOF

  tags = { Name = "monitoring-stack" }
}

Important: Restrict Prometheus (9090) and Grafana (3000) to your IP only. Never expose them to 0.0.0.0/0.

Installing Prometheus

SSH into the instance and create the config files:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

Create docker-compose.yml:

version: "3.8"

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"
    restart: unless-stopped

volumes:
  prometheus_data:

Start it: docker-compose up -d. Verify at http://<your-ip>:9090. You should see the Prometheus UI with the “prometheus” target showing as UP.

Installing Grafana

Add Grafana to your docker-compose.yml:

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

Restart: docker-compose up -d. Access Grafana at http://<your-ip>:3000. Login with admin / your password.

Add Prometheus as a data source: Settings → Data Sources → Add → Prometheus → URL: http://prometheus:9090 → Save & Test.

Adding Node Exporter for System Metrics

Add node-exporter to docker-compose.yml:

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - "9100:9100"
    restart: unless-stopped

This exposes CPU, memory, disk, filesystem, and network metrics. Prometheus scrapes it automatically (we already configured the scrape target).

Restart and verify: docker-compose up -d. Check Prometheus targets page — node-exporter should show as UP.

Building Your First Dashboard

The fastest way: import a community dashboard.

In Grafana: Dashboards → Import
Enter dashboard ID: 1860 (Node Exporter Full)
Select your Prometheus data source
Click Import

You instantly get CPU, memory, disk, network, and system load panels.

To create a custom CPU usage panel:

100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

For memory usage:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

For disk usage:

(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100

Setting Up Alerts

In Grafana: Alerting → Alert Rules → New Alert Rule.

CPU alert example:

Query: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Condition: Is above 80
For: 5 minutes (avoids false positives from brief spikes)
Summary: “High CPU usage on monitoring server”

Set up notification channels: Alerting → Contact Points → Add. Options: Email, Slack webhook, PagerDuty, etc.

What to alert on:

CPU > 80% for 5 min
Memory > 85% for 5 min
Disk > 90%
Instance down (up == 0)

What NOT to alert on: Brief spikes, cosmetic metrics, things you can’t act on at 3 AM. Alert fatigue kills monitoring.

Monitoring Your Application

System metrics are just the start. For application monitoring, instrument your code with Prometheus client libraries:

Python: prometheus_client
Node.js: prom-client
Go: prometheus/client_golang
Java: micrometer

Track the 4 golden signals:

Latency — how long requests take
Traffic — requests per second
Errors — error rate (5xx responses)
Saturation — how full your system is

Production Hardening

Reverse proxy with SSL: Put Nginx in front of Grafana with Let’s Encrypt. Never run Grafana on plain HTTP in production.
Authentication: Disable anonymous access (GF_AUTH_ANONYMOUS_ENABLED=false). Enable OAuth if your team uses Google/GitHub SSO.
Prometheus retention: Set --storage.tsdb.retention.time=30d (or whatever fits your disk). Default 15 days.
Backup Grafana dashboards: Export as JSON regularly, or use Grafana’s built-in backup. Losing dashboards after weeks of tuning is painful.
Resource limits: Add mem_limit and cpus to Docker Compose to prevent runaway processes.

Common Mistakes to Avoid

Alert fatigue: Alerting on everything means you ignore everything. Be selective.
No retention limits: Prometheus will fill your disk. Always set --storage.tsdb.retention.time.
Exposing ports to the internet: Prometheus has no built-in auth. Restrict with security groups.
No persistent volumes: Without Docker volumes, you lose all data on container restart.
Skipping memory metrics: CPU is not enough. Install node-exporter for the full picture.

Frequently Asked Questions

How much does it cost to run Prometheus and Grafana on AWS?

A t3.small instance runs about $15/month. Add $2.40/month for 30 GB gp3 storage. Total: under $20/month for a complete monitoring stack.

Can I use Prometheus with ECS or EKS?

Yes. For ECS, use Prometheus ECS service discovery. For EKS, the kube-prometheus-stack Helm chart is the standard — it bundles Prometheus, Grafana, and Alertmanager with Kubernetes-native service discovery.

Prometheus vs CloudWatch — which should I use?

Use CloudWatch for native AWS service metrics (RDS, ALB, Lambda). Use Prometheus for custom application metrics, PromQL queries, and multi-cloud setups. Many teams use both.

How long does Prometheus store data?

Default: 15 days. Configure with --storage.tsdb.retention.time=30d (or any duration). For long-term storage beyond months, look at Thanos or Cortex.

Is Grafana free?

Yes. Grafana OSS is fully open-source. Self-host it for free. Grafana Cloud also has a free tier (10k metrics, 50 GB logs, 50 GB traces).

Related Guides

AWS

Production VPC on AWS with Terraform

Set up the network foundation for your monitoring stack.

CI/CD

CI/CD with GitHub Actions — No Keys

Automate deployments without storing AWS credentials.

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

LinkedIn GitHub Portfolio