DATABASE

How to Set Up a Production RDS Database on AWS with Terraform

By Akshay Ghalme·March 30, 2026·11 min read

A production RDS database on AWS needs private subnet placement with no public access, encryption at rest using KMS, gp3 storage (20% cheaper than gp2), automated backups with 14+ days retention, Performance Insights for query monitoring, and deletion protection. Use db.r6g or db.m6g instances — never burstable db.t3 for production. This guide covers the complete setup with Terraform.

Spinning up an RDS instance through the AWS console takes about five minutes. Click through a few screens, pick your engine, hit create, and you have a database. The problem is that five-minute setup skips most of the things that matter in production — encryption, proper network isolation, backup configuration, and storage optimization.

I have set up RDS databases for production workloads that handle thousands of requests per minute. The setup I am going to walk you through is the same one I use — it covers security, performance, cost, and reliability. None of these steps are optional for production.

What You Will Build

RDS instance in private subnets — zero public exposure, no internet access
Encryption at rest — using AWS KMS, enabled from day one
gp3 storage — cheaper and more flexible than gp2
Automated backups — with point-in-time recovery
Performance Insights — catch slow queries before users complain
Deletion protection — prevent accidental destruction
Custom parameter group — tunable settings without rebooting

Prerequisites

A VPC with database subnets and a DB subnet group (see my VPC guide)
Terraform 1.5 or later
AWS CLI configured

Step 1: Choose Your Engine and Instance Class

For most applications, PostgreSQL is my default recommendation. It is feature-rich, has excellent JSON support, and the AWS community around it is strong. MySQL works well too, especially if your application was built for it.

For instance classes, the choice matters more than most people think:

db.t3 / db.t4g — burstable CPU. Fine for dev and staging. Dangerous for production. If your database consistently uses more CPU than the baseline, it runs out of burst credits and performance tanks with no warning.
db.r6g / db.r7g — memory-optimized. Best for production database workloads. Consistent performance, no surprises.
db.m6g / db.m7g — general purpose. Good middle ground if your workload is balanced between CPU and memory.

I have seen production databases on db.t3.medium that worked fine for months and then suddenly became unusable during a traffic spike because they ran out of CPU credits. Use db.r6g for production. The cost difference is worth the reliability.

Step 2: Place RDS in Private Subnets

Your database should never be accessible from the internet. Not through a security group, not through a public IP, not through anything. It lives in private subnets with no route to the internet.

resource "aws_db_instance" "main" {
  identifier = "${var.name}-db"

  engine         = "postgres"
  engine_version = "16.1"
  instance_class = "db.r6g.large"

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  # Network — private subnets only
  db_subnet_group_name   = var.db_subnet_group_name
  publicly_accessible    = false
  multi_az               = true
  vpc_security_group_ids = [aws_security_group.db.id]

publicly_accessible = false is the critical setting. Even if someone accidentally puts the database in a public subnet, this setting prevents AWS from assigning a public IP to it.

Step 3: Enable Encryption at Rest

  # Encryption
  storage_encrypted = true
  kms_key_id        = aws_kms_key.rds.arn

Create a dedicated KMS key for your database encryption:

resource "aws_kms_key" "rds" {
  description             = "KMS key for ${var.name} RDS encryption"
  deletion_window_in_days = 7
  enable_key_rotation     = true

  tags = {
    Name = "${var.name}-rds-kms"
  }
}

You cannot encrypt an existing unencrypted RDS instance. You would need to create an encrypted snapshot, restore from it, and switch your application over — which means downtime. Enable encryption from the very first terraform apply. There is zero performance impact and no reason to skip it.

Step 4: Switch to gp3 Storage

  # Storage
  allocated_storage     = 100
  max_allocated_storage = 500
  storage_type          = "gp3"
  iops                  = 3000
  storage_throughput    = 125

Why gp3 over gp2:

20% cheaper per GB compared to gp2
3,000 IOPS baseline included at no extra cost (gp2 gives you only 100 IOPS per GB, so you need 1 TB of storage just to get 3,000 IOPS)
Independent scaling — you can increase IOPS without increasing storage size

The max_allocated_storage enables autoscaling. If your database grows beyond 100 GB, AWS automatically expands storage up to 500 GB without downtime.

Step 5: Configure Automated Backups

  # Backups
  backup_retention_period = 14
  backup_window           = "03:00-04:00"    # UTC — low traffic period
  copy_tags_to_snapshot   = true
  skip_final_snapshot     = false
  final_snapshot_identifier = "${var.name}-final-snapshot"

14 days of retention means you can restore your database to any point in time within the last two weeks. If someone runs a bad migration on Tuesday, you can restore to the state just before it ran.

The backup window should be during your lowest-traffic period. For India-based teams, 03:00 UTC (8:30 AM IST) might not be ideal — adjust this to your actual low-traffic hours.

skip_final_snapshot = false means if you ever run terraform destroy, AWS takes a final snapshot before deleting the database. This is your last safety net.

Step 6: Enable Performance Insights

  # Monitoring
  performance_insights_enabled          = true
  performance_insights_retention_period = 7    # Free tier
  monitoring_interval                   = 60
  monitoring_role_arn                   = aws_iam_role.rds_monitoring.arn

Performance Insights shows you exactly which queries are consuming the most resources. It breaks down database load by wait events, SQL statements, and users. The 7-day retention is free. You can upgrade to longer retention if needed.

Enhanced Monitoring (monitoring_interval = 60) gives you OS-level metrics like memory, swap, and disk I/O at the instance level — things CloudWatch basic monitoring does not show.

Step 7: Set Up a Custom Parameter Group

resource "aws_db_parameter_group" "main" {
  name   = "${var.name}-pg16"
  family = "postgres16"

  parameter {
    name  = "log_min_duration_statement"
    value = "1000"    # Log queries slower than 1 second
  }

  parameter {
    name  = "shared_preload_libraries"
    value = "pg_stat_statements"
  }

  parameter {
    name         = "max_connections"
    value        = "200"
    apply_method = "pending-reboot"
  }

  tags = {
    Name = "${var.name}-parameter-group"
  }
}

Add this to your RDS instance:

  parameter_group_name = aws_db_parameter_group.main.name

Using a custom parameter group instead of the default means you can change settings without affecting other databases. log_min_duration_statement = 1000 logs any query that takes more than 1 second — this is how you find performance problems before they become outages.

Step 8: Configure Security Groups

resource "aws_security_group" "db" {
  name_prefix = "${var.name}-db-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [var.app_security_group_id]
    description     = "PostgreSQL from application servers"
  }

  # No egress rules needed — database does not initiate outbound connections

  tags = {
    Name = "${var.name}-db-sg"
  }

  lifecycle {
    create_before_destroy = true
  }
}

The security group only allows inbound traffic on port 5432 from your application's security group. Nothing else. No SSH access, no HTTPS, no other ports. The database does not need outbound internet access either.

Step 9: Enable Deletion Protection

  # Protection
  deletion_protection = true

With deletion protection enabled, running terraform destroy or deleting from the console will fail. You have to explicitly disable deletion protection first. This prevents the worst possible accident — someone accidentally destroying a production database.

Here is the complete resource block with everything together:

resource "aws_db_instance" "main" {
  identifier = "${var.name}-db"

  engine         = "postgres"
  engine_version = "16.1"
  instance_class = "db.r6g.large"

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  db_subnet_group_name   = var.db_subnet_group_name
  publicly_accessible    = false
  multi_az               = true
  vpc_security_group_ids = [aws_security_group.db.id]

  storage_encrypted = true
  kms_key_id        = aws_kms_key.rds.arn

  allocated_storage     = 100
  max_allocated_storage = 500
  storage_type          = "gp3"
  iops                  = 3000
  storage_throughput    = 125

  backup_retention_period   = 14
  backup_window             = "03:00-04:00"
  copy_tags_to_snapshot     = true
  skip_final_snapshot       = false
  final_snapshot_identifier = "${var.name}-final-snapshot"

  performance_insights_enabled          = true
  performance_insights_retention_period = 7
  monitoring_interval                   = 60
  monitoring_role_arn                   = aws_iam_role.rds_monitoring.arn

  parameter_group_name = aws_db_parameter_group.main.name

  deletion_protection = true

  tags = {
    Name        = "${var.name}-db"
    Environment = var.environment
  }
}

Multi-AZ vs Single-AZ

Multi-AZ maintains a synchronous standby replica in another availability zone. If the primary goes down, AWS automatically promotes the standby within 60-120 seconds. Your application reconnects to the same endpoint — no DNS changes needed. It costs roughly 2x the single-AZ price.

Single-AZ is fine for dev and staging. For production, always use Multi-AZ. The question is not if your primary will fail, but when.

Common Mistakes to Avoid

Using burstable instances (db.t3) for production. They work fine until they run out of CPU credits during a traffic spike. Then your database becomes unresponsive. Use db.r6g or db.m6g.
Putting the database in a public subnet. Even with security groups blocking external access, having a route to the internet is an unnecessary risk. Use private subnets with no internet route.
Skipping encryption. You cannot add encryption later without downtime. Enable it from the start. There is no performance cost.
Using gp2 instead of gp3. You are paying 20% more for the same or worse performance. There is no reason to use gp2 for new databases.
No deletion protection. One accidental terraform destroy or console click, and your production database is gone. Deletion protection takes one line of Terraform to enable.

Frequently Asked Questions

Should I use gp2 or gp3 for RDS storage?

Use gp3. It is 20% cheaper than gp2, gives you 3,000 IOPS baseline included, and lets you scale IOPS independently from storage size. There is no reason to choose gp2 for new databases.

Can I encrypt an existing unencrypted RDS database?

Not directly. You need to create an encrypted snapshot, restore a new encrypted instance from it, and switch your application over. This involves downtime. Always enable encryption from day one.

What is the difference between Multi-AZ and Read Replicas?

Multi-AZ is for high availability with automatic failover. You cannot read from the standby. Read Replicas are for scaling read traffic to separate instances. They do not provide automatic failover.

How long should I keep automated backups?

Minimum 7 days for production, 14 days recommended. This gives you point-in-time recovery within the retention window. The storage cost is minimal compared to the protection it provides.

Should I use db.t3 instances for production?

No. Burstable instances run out of CPU credits under sustained load, causing sudden performance drops. Use db.r6g (memory-optimized) or db.m6g (general purpose) for production workloads.

Skip the Manual Setup — Use the Terraform Module

Everything in this guide — private subnet placement, encryption, gp3 storage, automated backups, Performance Insights, parameter groups, deletion protection — is packaged into one module.

module "rds" {
  source = "github.com/akshayghalme/terraform-rds-production"

  name                 = "my-app"
  engine               = "postgres"
  engine_version       = "16.1"
  instance_class       = "db.r6g.large"
  allocated_storage    = 100
  db_name              = "myapp"
  username             = "admin"
  password             = var.db_password
  db_subnet_group_name = module.vpc.db_subnet_group_name
  vpc_id               = module.vpc.vpc_id
  app_security_group_id = module.ecs.app_security_group_id
}

Production-ready database with all the settings covered in this guide. Run terraform apply and your database is deployed securely.

Get the Terraform RDS Production module on GitHub →

Related Guides

NETWORKING

How to Set Up a Production VPC on AWS

Create the private subnets and DB subnet group your RDS needs.

FINOPS

Reduce AWS Costs by Scheduling Dev Resources

Auto-stop your dev RDS at night and save up to 65% on non-production costs.

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

LinkedIn GitHub Portfolio