How to Set Up a Production RDS Database on AWS with Terraform
A production RDS database on AWS needs private subnet placement with no public access, encryption at rest using KMS, gp3 storage (20% cheaper than gp2), automated backups with 14+ days retention, Performance Insights for query monitoring, and deletion protection. Use db.r6g or db.m6g instances — never burstable db.t3 for production. This guide covers the complete setup with Terraform.
Spinning up an RDS instance through the AWS console takes about five minutes. Click through a few screens, pick your engine, hit create, and you have a database. The problem is that five-minute setup skips most of the things that matter in production — encryption, proper network isolation, backup configuration, and storage optimization.
I have set up RDS databases for production workloads that handle thousands of requests per minute. The setup I am going to walk you through is the same one I use — it covers security, performance, cost, and reliability. None of these steps are optional for production.
What You Will Build
- RDS instance in private subnets — zero public exposure, no internet access
- Encryption at rest — using AWS KMS, enabled from day one
- gp3 storage — cheaper and more flexible than gp2
- Automated backups — with point-in-time recovery
- Performance Insights — catch slow queries before users complain
- Deletion protection — prevent accidental destruction
- Custom parameter group — tunable settings without rebooting
Prerequisites
- A VPC with database subnets and a DB subnet group (see my VPC guide)
- Terraform 1.5 or later
- AWS CLI configured
Step 1: Choose Your Engine and Instance Class
For most applications, PostgreSQL is my default recommendation. It is feature-rich, has excellent JSON support, and the AWS community around it is strong. MySQL works well too, especially if your application was built for it.
For instance classes, the choice matters more than most people think:
- db.t3 / db.t4g — burstable CPU. Fine for dev and staging. Dangerous for production. If your database consistently uses more CPU than the baseline, it runs out of burst credits and performance tanks with no warning.
- db.r6g / db.r7g — memory-optimized. Best for production database workloads. Consistent performance, no surprises.
- db.m6g / db.m7g — general purpose. Good middle ground if your workload is balanced between CPU and memory.
I have seen production databases on db.t3.medium that worked fine for months and then suddenly became unusable during a traffic spike because they ran out of CPU credits. Use db.r6g for production. The cost difference is worth the reliability.
Step 2: Place RDS in Private Subnets
Your database should never be accessible from the internet. Not through a security group, not through a public IP, not through anything. It lives in private subnets with no route to the internet.
resource "aws_db_instance" "main" {
identifier = "${var.name}-db"
engine = "postgres"
engine_version = "16.1"
instance_class = "db.r6g.large"
db_name = var.db_name
username = var.db_username
password = var.db_password
# Network — private subnets only
db_subnet_group_name = var.db_subnet_group_name
publicly_accessible = false
multi_az = true
vpc_security_group_ids = [aws_security_group.db.id]
publicly_accessible = false is the critical setting. Even if someone accidentally puts the database in a public subnet, this setting prevents AWS from assigning a public IP to it.
Step 3: Enable Encryption at Rest
# Encryption
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
Create a dedicated KMS key for your database encryption:
resource "aws_kms_key" "rds" {
description = "KMS key for ${var.name} RDS encryption"
deletion_window_in_days = 7
enable_key_rotation = true
tags = {
Name = "${var.name}-rds-kms"
}
}
You cannot encrypt an existing unencrypted RDS instance. You would need to create an encrypted snapshot, restore from it, and switch your application over — which means downtime. Enable encryption from the very first terraform apply. There is zero performance impact and no reason to skip it.
Step 4: Switch to gp3 Storage
# Storage
allocated_storage = 100
max_allocated_storage = 500
storage_type = "gp3"
iops = 3000
storage_throughput = 125
Why gp3 over gp2:
- 20% cheaper per GB compared to gp2
- 3,000 IOPS baseline included at no extra cost (gp2 gives you only 100 IOPS per GB, so you need 1 TB of storage just to get 3,000 IOPS)
- Independent scaling — you can increase IOPS without increasing storage size
The max_allocated_storage enables autoscaling. If your database grows beyond 100 GB, AWS automatically expands storage up to 500 GB without downtime.
Step 5: Configure Automated Backups
# Backups
backup_retention_period = 14
backup_window = "03:00-04:00" # UTC — low traffic period
copy_tags_to_snapshot = true
skip_final_snapshot = false
final_snapshot_identifier = "${var.name}-final-snapshot"
14 days of retention means you can restore your database to any point in time within the last two weeks. If someone runs a bad migration on Tuesday, you can restore to the state just before it ran.
The backup window should be during your lowest-traffic period. For India-based teams, 03:00 UTC (8:30 AM IST) might not be ideal — adjust this to your actual low-traffic hours.
skip_final_snapshot = false means if you ever run terraform destroy, AWS takes a final snapshot before deleting the database. This is your last safety net.
Step 6: Enable Performance Insights
# Monitoring
performance_insights_enabled = true
performance_insights_retention_period = 7 # Free tier
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
Performance Insights shows you exactly which queries are consuming the most resources. It breaks down database load by wait events, SQL statements, and users. The 7-day retention is free. You can upgrade to longer retention if needed.
Enhanced Monitoring (monitoring_interval = 60) gives you OS-level metrics like memory, swap, and disk I/O at the instance level — things CloudWatch basic monitoring does not show.
Step 7: Set Up a Custom Parameter Group
resource "aws_db_parameter_group" "main" {
name = "${var.name}-pg16"
family = "postgres16"
parameter {
name = "log_min_duration_statement"
value = "1000" # Log queries slower than 1 second
}
parameter {
name = "shared_preload_libraries"
value = "pg_stat_statements"
}
parameter {
name = "max_connections"
value = "200"
apply_method = "pending-reboot"
}
tags = {
Name = "${var.name}-parameter-group"
}
}
Add this to your RDS instance:
parameter_group_name = aws_db_parameter_group.main.name
Using a custom parameter group instead of the default means you can change settings without affecting other databases. log_min_duration_statement = 1000 logs any query that takes more than 1 second — this is how you find performance problems before they become outages.
Step 8: Configure Security Groups
resource "aws_security_group" "db" {
name_prefix = "${var.name}-db-"
vpc_id = var.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [var.app_security_group_id]
description = "PostgreSQL from application servers"
}
# No egress rules needed — database does not initiate outbound connections
tags = {
Name = "${var.name}-db-sg"
}
lifecycle {
create_before_destroy = true
}
}
The security group only allows inbound traffic on port 5432 from your application's security group. Nothing else. No SSH access, no HTTPS, no other ports. The database does not need outbound internet access either.
Step 9: Enable Deletion Protection
# Protection
deletion_protection = true
With deletion protection enabled, running terraform destroy or deleting from the console will fail. You have to explicitly disable deletion protection first. This prevents the worst possible accident — someone accidentally destroying a production database.
Here is the complete resource block with everything together:
resource "aws_db_instance" "main" {
identifier = "${var.name}-db"
engine = "postgres"
engine_version = "16.1"
instance_class = "db.r6g.large"
db_name = var.db_name
username = var.db_username
password = var.db_password
db_subnet_group_name = var.db_subnet_group_name
publicly_accessible = false
multi_az = true
vpc_security_group_ids = [aws_security_group.db.id]
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
allocated_storage = 100
max_allocated_storage = 500
storage_type = "gp3"
iops = 3000
storage_throughput = 125
backup_retention_period = 14
backup_window = "03:00-04:00"
copy_tags_to_snapshot = true
skip_final_snapshot = false
final_snapshot_identifier = "${var.name}-final-snapshot"
performance_insights_enabled = true
performance_insights_retention_period = 7
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
parameter_group_name = aws_db_parameter_group.main.name
deletion_protection = true
tags = {
Name = "${var.name}-db"
Environment = var.environment
}
}
Multi-AZ vs Single-AZ
Multi-AZ maintains a synchronous standby replica in another availability zone. If the primary goes down, AWS automatically promotes the standby within 60-120 seconds. Your application reconnects to the same endpoint — no DNS changes needed. It costs roughly 2x the single-AZ price.
Single-AZ is fine for dev and staging. For production, always use Multi-AZ. The question is not if your primary will fail, but when.
Common Mistakes to Avoid
- Using burstable instances (db.t3) for production. They work fine until they run out of CPU credits during a traffic spike. Then your database becomes unresponsive. Use db.r6g or db.m6g.
- Putting the database in a public subnet. Even with security groups blocking external access, having a route to the internet is an unnecessary risk. Use private subnets with no internet route.
- Skipping encryption. You cannot add encryption later without downtime. Enable it from the start. There is no performance cost.
- Using gp2 instead of gp3. You are paying 20% more for the same or worse performance. There is no reason to use gp2 for new databases.
- No deletion protection. One accidental
terraform destroyor console click, and your production database is gone. Deletion protection takes one line of Terraform to enable.
Frequently Asked Questions
Should I use gp2 or gp3 for RDS storage?
Use gp3. It is 20% cheaper than gp2, gives you 3,000 IOPS baseline included, and lets you scale IOPS independently from storage size. There is no reason to choose gp2 for new databases.
Can I encrypt an existing unencrypted RDS database?
Not directly. You need to create an encrypted snapshot, restore a new encrypted instance from it, and switch your application over. This involves downtime. Always enable encryption from day one.
What is the difference between Multi-AZ and Read Replicas?
Multi-AZ is for high availability with automatic failover. You cannot read from the standby. Read Replicas are for scaling read traffic to separate instances. They do not provide automatic failover.
How long should I keep automated backups?
Minimum 7 days for production, 14 days recommended. This gives you point-in-time recovery within the retention window. The storage cost is minimal compared to the protection it provides.
Should I use db.t3 instances for production?
No. Burstable instances run out of CPU credits under sustained load, causing sudden performance drops. Use db.r6g (memory-optimized) or db.m6g (general purpose) for production workloads.
Skip the Manual Setup — Use the Terraform Module
Everything in this guide — private subnet placement, encryption, gp3 storage, automated backups, Performance Insights, parameter groups, deletion protection — is packaged into one module.
module "rds" {
source = "github.com/akshayghalme/terraform-rds-production"
name = "my-app"
engine = "postgres"
engine_version = "16.1"
instance_class = "db.r6g.large"
allocated_storage = 100
db_name = "myapp"
username = "admin"
password = var.db_password
db_subnet_group_name = module.vpc.db_subnet_group_name
vpc_id = module.vpc.vpc_id
app_security_group_id = module.ecs.app_security_group_id
}
Production-ready database with all the settings covered in this guide. Run terraform apply and your database is deployed securely.