NETWORKING

How to Set Up a Production VPC on AWS with Terraform

By Akshay Ghalme · March 30, 2026 · 10 min read

A production VPC on AWS needs three subnet tiers — public subnets for load balancers, private subnets for application servers, and isolated database subnets with no internet access — spread across at least two availability zones for high availability. This guide walks you through building this architecture with Terraform, including NAT Gateway, route tables, and security group lockdown.

Most VPC tutorials I have seen online give you a single public subnet, attach an internet gateway, and call it done. That works for a weekend project. It does not work for anything you plan to run in production.

The moment you need a database that should not be reachable from the internet, or application servers that need outbound internet access without being publicly exposed, that single-subnet setup falls apart. I learned this the hard way early in my career when I had an RDS instance sitting in a public subnet because I did not know any better. Nothing happened, but looking back, it could have been a disaster.

This guide walks you through the VPC architecture I use for every production deployment. It is the same design that runs a multi-tenant SaaS platform serving 1000+ customers with 99.9% uptime.

What You Will Build

A three-tier VPC spread across multiple availability zones:

  • Public subnets — for load balancers, bastion hosts, and NAT Gateways. These have direct internet access.
  • Private subnets — for your application servers, ECS tasks, or EKS pods. These can reach the internet through NAT but are not directly accessible from outside.
  • Database subnets — completely isolated. No internet access at all. Only your application servers can talk to them.

Each tier exists in at least two availability zones, so if one AZ goes down, your entire stack keeps running.

Why Three Tiers

The tiers exist for security. Each layer has different exposure requirements:

Your load balancer needs to be reachable from the internet — that is its job. Your application server should not be directly reachable — it sits behind the load balancer. Your database should not even be able to reach the internet — it only talks to your application.

By separating these into different subnets with different route tables, you enforce this at the network level. Even if someone misconfigures a security group, the route table still prevents your database from being exposed.

Prerequisites

  • An AWS account
  • Terraform 1.5 or later
  • AWS CLI configured with credentials
  • A rough idea of how many IP addresses you will need

Step 1: Define the VPC CIDR Block

The CIDR block determines how many IP addresses your VPC has and what range they fall in. For most production workloads, a /16 block gives you 65,536 addresses, which is more than enough.

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.name}-vpc"
  }
}

Two things to get right here:

  1. Do not overlap with other VPCs. If you ever need to peer VPCs or connect to an on-premise network, overlapping CIDRs will block you. Use 10.0.0.0/16 for production, 10.1.0.0/16 for staging, 10.2.0.0/16 for dev.
  2. Enable DNS hostnames and support. Without these, RDS endpoints and other AWS services will not resolve properly inside your VPC.

Step 2: Create Subnets Across Availability Zones

You need three types of subnets in each AZ. Here is how I carve up the /16 CIDR:

# Public subnets — /20 gives 4,096 IPs each
resource "aws_subnet" "public" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index)
  availability_zone = var.availability_zones[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "${var.name}-public-${var.availability_zones[count.index]}"
    Tier = "public"
  }
}

# Private subnets — /20 gives 4,096 IPs each
resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index + length(var.availability_zones))
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.name}-private-${var.availability_zones[count.index]}"
    Tier = "private"
  }
}

# Database subnets — /20 gives 4,096 IPs each
resource "aws_subnet" "database" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index + 2 * length(var.availability_zones))
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.name}-database-${var.availability_zones[count.index]}"
    Tier = "database"
  }
}

Using cidrsubnet keeps the math clean and avoids manual CIDR calculations. The key detail: only public subnets have map_public_ip_on_launch = true. Private and database subnets should never auto-assign public IPs.

Step 3: Set Up the Internet Gateway

The Internet Gateway gives your public subnets access to the internet. Without it, nothing in your VPC can reach the outside world.

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.name}-igw"
  }
}

There is only one Internet Gateway per VPC. It is highly available by default — AWS manages redundancy across AZs for you.

Step 4: Set Up NAT Gateway

Your private subnets need outbound internet access for things like pulling Docker images, downloading OS updates, and calling external APIs. But they should not be reachable from the internet. That is exactly what NAT Gateway does — outbound only.

# Elastic IP for NAT Gateway
resource "aws_eip" "nat" {
  domain = "vpc"

  tags = {
    Name = "${var.name}-nat-eip"
  }
}

# NAT Gateway in the first public subnet
resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id

  tags = {
    Name = "${var.name}-nat"
  }

  depends_on = [aws_internet_gateway.main]
}
Cost reality check: Each NAT Gateway costs around $32 per month plus $0.045 per GB of data processed. For high availability, you would put one in each AZ, but for most setups a single NAT Gateway is fine. If that AZ has issues, your private subnets temporarily lose outbound internet — but your application keeps running.

Step 5: Configure Route Tables

This is where the three-tier security model comes together. Each tier gets its own route table with different rules.

# Public route table — internet via IGW
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.name}-public-rt"
  }
}

# Private route table — internet via NAT
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }

  tags = {
    Name = "${var.name}-private-rt"
  }
}

# Database route table — NO internet route
resource "aws_route_table" "database" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.name}-database-rt"
  }
}

# Associate subnets with route tables
resource "aws_route_table_association" "public" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}

resource "aws_route_table_association" "database" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.database[count.index].id
  route_table_id = aws_route_table.database.id
}

Notice the database route table has no routes to the internet at all. It only has the local VPC route that AWS adds automatically. This is by design.

Step 6: Create DB Subnet Group

RDS requires a DB subnet group to know which subnets it can launch database instances in. You create this from your database subnets:

resource "aws_db_subnet_group" "main" {
  name       = "${var.name}-db-subnet-group"
  subnet_ids = aws_subnet.database[*].id

  tags = {
    Name = "${var.name}-db-subnet-group"
  }
}

When you create an RDS instance later, you reference this subnet group and AWS places the database in one of these isolated subnets.

Step 7: Lock Down the Default Security Group

AWS creates a default security group for every VPC that allows all inbound and outbound traffic between members. This is a security risk most people do not know about.

resource "aws_default_security_group" "default" {
  vpc_id = aws_vpc.main.id

  # No ingress rules — deny all inbound
  # No egress rules — deny all outbound

  tags = {
    Name = "${var.name}-default-sg-DO-NOT-USE"
  }
}

By declaring it in Terraform with no rules, you strip all permissions from it. The name tag makes it clear that nobody should attach this to any resource. Always create purpose-specific security groups instead.

Common Mistakes to Avoid

  1. Using a single availability zone. Your entire application goes down when that AZ has issues. Always use at least two AZs. AWS AZ outages are rare but they happen, and they always happen at the worst time.
  2. Putting databases in public subnets. I have seen this in production more times than I would like to admit. Even if the security group blocks external access, the database should never have a route to the internet. Defense in depth.
  3. CIDR blocks that are too small. Starting with a /24 (256 IPs) feels fine until you need to add more subnets. You cannot expand a VPC CIDR block — you can only add secondary blocks, which gets messy. Start with /16.
  4. Not planning for NAT Gateway costs. NAT data processing charges add up fast if you pull large Docker images or transfer a lot of data. Use VPC endpoints for S3 and ECR to avoid NAT charges for AWS service traffic. Also consider scheduling dev resources to save up to 65% on non-production costs.
  5. Leaving the default security group open. Most security audits flag this. Lock it down on day one.

Frequently Asked Questions

How many availability zones should I use for a production VPC?

Use at least two availability zones. This gives you high availability — if one AZ goes down, your application keeps running in the other. Three AZs is even better for critical workloads, but two is the minimum for production.

What CIDR block should I use for my VPC?

A /16 CIDR block like 10.0.0.0/16 gives you 65,536 IP addresses, which is more than enough for most workloads. Avoid 172.31.0.0/16 since that is the default VPC CIDR. If you plan to peer VPCs or connect to on-premise networks, make sure your CIDR blocks do not overlap.

Do I need a NAT Gateway in every availability zone?

For true high availability, yes — one NAT Gateway per AZ. But NAT Gateways cost around $32 per month each plus data processing charges. For non-critical workloads, a single NAT Gateway works fine. If that AZ goes down, private subnet resources lose outbound internet temporarily.

Why should database subnets have no internet route?

Database subnets should be completely isolated from the internet for security. Your database should never be reachable from outside your VPC. It only needs to communicate with your application servers in the private subnets.

Can I change my VPC CIDR block after creation?

You can add secondary CIDR blocks to an existing VPC, but you cannot change the primary one. This is why planning your CIDR allocation upfront matters. If you get it wrong, you will need a new VPC and a full migration.


Skip the Manual Setup — Use the Terraform Module

Everything in this guide — the three-tier subnets, NAT Gateway, route tables, DB subnet group, locked-down default security group — I have packaged into an open-source Terraform module.

module "vpc" {
  source = "github.com/akshayghalme/terraform-vpc-production"

  name               = "my-app"
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["ap-south-1a", "ap-south-1b"]
}

Three lines. Run terraform apply. You get a production-ready VPC with everything configured correctly.

Get the Terraform VPC module on GitHub →

AG

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

More Guides & Terraform Modules

Every guide comes with a matching open-source Terraform module you can deploy right away.