COST OPTIMIZATIONAWS

AWS EC2 Right-Sizing Guide — How to Find and Fix Oversized Instances

Q: How often should I right-size EC2 instances?

Quarterly minimum. Workloads change over time, and new instance types launch regularly.

Q: Can I right-size without downtime?

For Auto Scaling Groups, yes — update the launch template and let rolling updates handle it. For standalone instances, you need a brief stop/start.

Q: What if my app is memory-bound?

Install the CloudWatch Agent first — EC2 doesn't report memory by default. Then check memory utilization before making decisions.

By Akshay Ghalme·April 8, 2026·15 min read

Most AWS accounts are running EC2 instances 2–4x larger than needed. Right-sizing is the single highest-impact cost optimization you can make — I’ve seen it save 40–60% on compute costs alone. This guide walks through the exact process: collecting metrics, identifying candidates, safely resizing, and monitoring after.

Why Most EC2 Instances Are Oversized

It happens the same way everywhere: someone provisions an m5.xlarge “just to be safe,” the app works fine, and nobody revisits the decision. Six months later, you’re paying for 4 vCPUs and 16 GB RAM while your app uses 8% CPU and 3 GB memory.

Common causes: fear-driven provisioning, copy-paste from Stack Overflow, “we might need it someday,” and the absence of memory metrics (EC2 doesn’t report memory to CloudWatch by default — so people guess high).

The Right-Sizing Process

Right-sizing is not a one-time task. It’s a cycle:

Collect metrics (CPU, memory, network, disk)
Analyze utilization over 2+ weeks
Identify candidates using thresholds
Test the new size in staging first
Monitor after resizing
Repeat quarterly

Step 1: Enable Detailed CloudWatch Monitoring

Default CloudWatch gives you 5-minute intervals. Enable detailed monitoring for 1-minute granularity:

aws ec2 monitor-instances --instance-ids i-0abc123def456

Critical: EC2 does NOT report memory metrics by default. Install the CloudWatch Agent:

sudo yum install -y amazon-cloudwatch-agent

# Create config
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl   -a fetch-config -m ec2   -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s

Without memory metrics, you’re making decisions blind. This is the #1 mistake in right-sizing.

Step 2: Collect 2 Weeks of Data

Don’t make decisions on 1 day of data. Workloads vary by day of week, time of month, and business cycles. Collect at minimum:

CPU Utilization — average and peak
Memory Utilization — from CloudWatch Agent
Network In/Out — are you hitting bandwidth limits?
Disk Read/Write Ops — I/O-bound workloads need different instance families

Step 3: Use AWS Compute Optimizer

Enable it for free in the AWS Console: AWS Compute Optimizer → Get started. It analyzes your last 14 days of CloudWatch data and recommends instance types.

Key findings to look for:

Over-provisioned — you’re paying for resources you don’t use
Under-provisioned — your instance is struggling (rare, but important)
Optimized — current size is appropriate

Compute Optimizer is a starting point, not gospel. It doesn’t know about your deployment patterns, burst behavior, or upcoming load changes.

Step 4: Identify Right-Sizing Candidates

Use these thresholds as a starting point:

Metric	Threshold	Action
Avg CPU < 20%, Peak < 50%	Over-provisioned	Downsize instance type
Memory < 40% consistently	Over-provisioned	Consider smaller family
Network < 10% baseline	Over-provisioned	Smaller instance has less bandwidth
Avg CPU > 80%	Under-provisioned	Upsize or optimize application

Step 5: Execute the Change

Standalone instance (stop/start required):

aws ec2 stop-instances --instance-ids i-0abc123
aws ec2 modify-instance-attribute   --instance-id i-0abc123   --instance-type t3.medium
aws ec2 start-instances --instance-ids i-0abc123

Auto Scaling Group (zero downtime):

Update your launch template in Terraform:

resource "aws_launch_template" "app" {
  image_id      = "ami-0abcdef1234567890"
  instance_type = "t3.medium"  # was m5.xlarge

  # ... rest of config
}

resource "aws_autoscaling_group" "app" {
  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
    }
  }
}

The instance refresh gradually replaces old instances with the new type. Zero downtime.

Step 6: Monitor After Resizing

Set up CloudWatch alarms for the first week:

CPU > 80% for 5 minutes → alarm
Memory > 85% for 5 minutes → alarm
Application response time > 2x baseline → alarm

Have a rollback plan: for ASG, revert the launch template. For standalone, stop and change back.

Instance Family Cheat Sheet

Family	Best For	vCPU:Memory Ratio	Example Use
t3/t3a	Burstable, low-moderate CPU	1:2 GB	Web servers, small apps, dev/staging
m5/m6i	General purpose, balanced	1:4 GB	Application servers, mid-size databases
c5/c6i	Compute optimized	1:2 GB	Batch processing, ML inference, gaming
r5/r6i	Memory optimized	1:8 GB	In-memory caches, large databases

Pro tip: t3 instances are massively underrated. For bursty workloads (which most web apps are), t3 with unlimited credits often costs less than m5 while performing identically.

Real Example: How I Right-Sized a Production Fleet

At my company, we had 12 m5.xlarge instances (4 vCPU, 16 GB) running our application servers. CloudWatch showed:

Average CPU: 8%
Peak CPU: 35% (during deployments)
Memory: 4.2 GB average (26% utilization)

We moved to t3.medium (2 vCPU, 4 GB). Results:

m5.xlarge: $0.192/hr × 12 = $2.304/hr = $1,659/month
t3.medium: $0.0416/hr × 12 = $0.499/hr = $359/month
Savings: $1,300/month (78%)

No performance impact. The t3 burst credits easily handled deployment spikes. We monitored for 2 weeks before committing.

Common Mistakes to Avoid

Sizing for peak only — if peak is 50% but happens 1% of the time, you’re wasting 99% of the time
Forgetting memory metrics — CPU looks fine but memory is at 90%? Don’t downsize
Skipping staging tests — always test the new size with realistic load first
Ignoring burstable instances — t3 is perfect for 80%+ of web workloads
One-time exercise — right-sizing is quarterly, not annual

Frequently Asked Questions

How often should I right-size EC2 instances?

Quarterly at minimum. Workloads change, and AWS launches new instance types regularly. Set a calendar reminder.

Can I right-size without downtime?

For Auto Scaling Groups, yes — use instance refresh with rolling updates. For standalone instances, you need a brief stop/start (typically under 2 minutes).

What if my app is memory-bound, not CPU-bound?

Install the CloudWatch Agent first. EC2 doesn’t report memory by default. Once you have memory data, you can make informed decisions about whether to change instance family (e.g., m5 → r5 for memory-heavy workloads, or m5 → t3 if memory is low).

Should I use Spot Instances instead of right-sizing?

Both. Right-size first to find the correct instance type, then evaluate Spot for fault-tolerant workloads. Right-sizing is risk-free; Spot requires handling interruptions.

Does right-sizing affect Reserved Instance savings?

Yes. Always right-size first, then purchase RIs for the correct size. Buying RIs for oversized instances locks in waste for 1–3 years.

Related Guides

COST

Reduce Costs by Scheduling Resources

Schedule dev/staging to save up to 65%.

CASE STUDY

~80% AWS Cost Reduction Case Study

PHP-FPM tuning, query optimization, and CDN offloading.

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

LinkedIn GitHub Portfolio