COST OPTIMIZATIONAWS

AWS EC2 Right-Sizing Guide — How to Find and Fix Oversized Instances

By Akshay Ghalme·April 8, 2026·15 min read

Most AWS accounts are running EC2 instances 2–4x larger than needed. Right-sizing is the single highest-impact cost optimization you can make — I’ve seen it save 40–60% on compute costs alone. This guide walks through the exact process: collecting metrics, identifying candidates, safely resizing, and monitoring after.

Why Most EC2 Instances Are Oversized

It happens the same way everywhere: someone provisions an m5.xlarge “just to be safe,” the app works fine, and nobody revisits the decision. Six months later, you’re paying for 4 vCPUs and 16 GB RAM while your app uses 8% CPU and 3 GB memory.

Common causes: fear-driven provisioning, copy-paste from Stack Overflow, “we might need it someday,” and the absence of memory metrics (EC2 doesn’t report memory to CloudWatch by default — so people guess high).

The Right-Sizing Process

Right-sizing is not a one-time task. It’s a cycle:

  1. Collect metrics (CPU, memory, network, disk)
  2. Analyze utilization over 2+ weeks
  3. Identify candidates using thresholds
  4. Test the new size in staging first
  5. Monitor after resizing
  6. Repeat quarterly

Step 1: Enable Detailed CloudWatch Monitoring

Default CloudWatch gives you 5-minute intervals. Enable detailed monitoring for 1-minute granularity:

aws ec2 monitor-instances --instance-ids i-0abc123def456

Critical: EC2 does NOT report memory metrics by default. Install the CloudWatch Agent:

sudo yum install -y amazon-cloudwatch-agent

# Create config
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl   -a fetch-config -m ec2   -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s

Without memory metrics, you’re making decisions blind. This is the #1 mistake in right-sizing.

Step 2: Collect 2 Weeks of Data

Don’t make decisions on 1 day of data. Workloads vary by day of week, time of month, and business cycles. Collect at minimum:

  • CPU Utilization — average and peak
  • Memory Utilization — from CloudWatch Agent
  • Network In/Out — are you hitting bandwidth limits?
  • Disk Read/Write Ops — I/O-bound workloads need different instance families

Step 3: Use AWS Compute Optimizer

Enable it for free in the AWS Console: AWS Compute Optimizer → Get started. It analyzes your last 14 days of CloudWatch data and recommends instance types.

Key findings to look for:

  • Over-provisioned — you’re paying for resources you don’t use
  • Under-provisioned — your instance is struggling (rare, but important)
  • Optimized — current size is appropriate

Compute Optimizer is a starting point, not gospel. It doesn’t know about your deployment patterns, burst behavior, or upcoming load changes.

Step 4: Identify Right-Sizing Candidates

Use these thresholds as a starting point:

MetricThresholdAction
Avg CPU < 20%, Peak < 50%Over-provisionedDownsize instance type
Memory < 40% consistentlyOver-provisionedConsider smaller family
Network < 10% baselineOver-provisionedSmaller instance has less bandwidth
Avg CPU > 80%Under-provisionedUpsize or optimize application

Step 5: Execute the Change

Standalone instance (stop/start required):

aws ec2 stop-instances --instance-ids i-0abc123
aws ec2 modify-instance-attribute   --instance-id i-0abc123   --instance-type t3.medium
aws ec2 start-instances --instance-ids i-0abc123

Auto Scaling Group (zero downtime):

Update your launch template in Terraform:

resource "aws_launch_template" "app" {
  image_id      = "ami-0abcdef1234567890"
  instance_type = "t3.medium"  # was m5.xlarge

  # ... rest of config
}

resource "aws_autoscaling_group" "app" {
  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
    }
  }
}

The instance refresh gradually replaces old instances with the new type. Zero downtime.

Step 6: Monitor After Resizing

Set up CloudWatch alarms for the first week:

  • CPU > 80% for 5 minutes → alarm
  • Memory > 85% for 5 minutes → alarm
  • Application response time > 2x baseline → alarm

Have a rollback plan: for ASG, revert the launch template. For standalone, stop and change back.

Instance Family Cheat Sheet

FamilyBest ForvCPU:Memory RatioExample Use
t3/t3aBurstable, low-moderate CPU1:2 GBWeb servers, small apps, dev/staging
m5/m6iGeneral purpose, balanced1:4 GBApplication servers, mid-size databases
c5/c6iCompute optimized1:2 GBBatch processing, ML inference, gaming
r5/r6iMemory optimized1:8 GBIn-memory caches, large databases

Pro tip: t3 instances are massively underrated. For bursty workloads (which most web apps are), t3 with unlimited credits often costs less than m5 while performing identically.

Real Example: How I Right-Sized a Production Fleet

At my company, we had 12 m5.xlarge instances (4 vCPU, 16 GB) running our application servers. CloudWatch showed:

  • Average CPU: 8%
  • Peak CPU: 35% (during deployments)
  • Memory: 4.2 GB average (26% utilization)

We moved to t3.medium (2 vCPU, 4 GB). Results:

  • m5.xlarge: $0.192/hr × 12 = $2.304/hr = $1,659/month
  • t3.medium: $0.0416/hr × 12 = $0.499/hr = $359/month
  • Savings: $1,300/month (78%)

No performance impact. The t3 burst credits easily handled deployment spikes. We monitored for 2 weeks before committing.

Common Mistakes to Avoid

  • Sizing for peak only — if peak is 50% but happens 1% of the time, you’re wasting 99% of the time
  • Forgetting memory metrics — CPU looks fine but memory is at 90%? Don’t downsize
  • Skipping staging tests — always test the new size with realistic load first
  • Ignoring burstable instances — t3 is perfect for 80%+ of web workloads
  • One-time exercise — right-sizing is quarterly, not annual

Frequently Asked Questions

How often should I right-size EC2 instances?

Quarterly at minimum. Workloads change, and AWS launches new instance types regularly. Set a calendar reminder.

Can I right-size without downtime?

For Auto Scaling Groups, yes — use instance refresh with rolling updates. For standalone instances, you need a brief stop/start (typically under 2 minutes).

What if my app is memory-bound, not CPU-bound?

Install the CloudWatch Agent first. EC2 doesn’t report memory by default. Once you have memory data, you can make informed decisions about whether to change instance family (e.g., m5 → r5 for memory-heavy workloads, or m5 → t3 if memory is low).

Should I use Spot Instances instead of right-sizing?

Both. Right-size first to find the correct instance type, then evaluate Spot for fault-tolerant workloads. Right-sizing is risk-free; Spot requires handling interruptions.

Does right-sizing affect Reserved Instance savings?

Yes. Always right-size first, then purchase RIs for the correct size. Buying RIs for oversized instances locks in waste for 1–3 years.

AG

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

More Guides & Terraform Modules

Every guide comes with a matching open-source Terraform module you can deploy right away.