How to Set Up AWS EKS with Terraform — Production-Ready Kubernetes Cluster
Amazon EKS gives you a managed Kubernetes control plane so you can focus on deploying workloads instead of babysitting etcd clusters. In this guide, we’ll build a production-ready EKS cluster from scratch using Terraform — complete with managed node groups, IAM Roles for Service Accounts (IRSA), OIDC provider, Nginx Ingress Controller, and monitoring. Everything is copy-paste ready.
Prerequisites
- AWS account with admin or sufficient IAM permissions
- Terraform >= 1.5 installed
- kubectl installed and configured
- AWS CLI v2 configured with
aws configure - A VPC with public and private subnets — see our VPC setup guide if you need one
- Helm v3 installed (for Ingress Controller and Prometheus)
EKS Architecture Overview
Here’s what we’re building:
- EKS Control Plane — managed by AWS across multiple AZs (you never see the master nodes)
- Managed Node Groups — EC2 instances that AWS auto-provisions, patches, and drains during updates
- OIDC Provider — enables IRSA so pods can assume IAM roles without storing credentials
- Private subnets for worker nodes, public subnets for load balancers
- Cluster Autoscaler to scale nodes based on pending pod demand
The control plane costs $0.10/hour ($73/month). Worker nodes are billed as regular EC2 instances.
VPC Setup for EKS
EKS requires specific subnet tags. If you followed our VPC guide, add these tags:
# Public subnets — for external load balancers
resource "aws_subnet" "public" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "public-${count.index + 1}"
"kubernetes.io/role/elb" = "1"
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
# Private subnets — for worker nodes
resource "aws_subnet" "private" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-${count.index + 1}"
"kubernetes.io/role/internal-elb" = "1"
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
The kubernetes.io/role/elb and kubernetes.io/role/internal-elb tags tell the AWS Load Balancer Controller which subnets to use.
EKS Cluster Terraform Code
IAM Role for the Cluster
resource "aws_iam_role" "eks_cluster" {
name = "${var.cluster_name}-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster.name
}
resource "aws_iam_role_policy_attachment" "eks_vpc_resource_controller" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
role = aws_iam_role.eks_cluster.name
}
EKS Cluster with OIDC Provider
resource "aws_eks_cluster" "main" {
name = var.cluster_name
version = "1.31"
role_arn = aws_iam_role.eks_cluster.arn
vpc_config {
subnet_ids = concat(aws_subnet.public[*].id, aws_subnet.private[*].id)
endpoint_private_access = true
endpoint_public_access = true
security_group_ids = [aws_security_group.eks_cluster.id]
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_resource_controller,
]
tags = {
Environment = var.environment
}
}
# OIDC Provider for IRSA
data "tls_certificate" "eks" {
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
tags = {
Name = "${var.cluster_name}-oidc"
}
}
Managed Node Group
resource "aws_iam_role" "eks_nodes" {
name = "${var.cluster_name}-node-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.eks_nodes.name
}
resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.eks_nodes.name
}
resource "aws_iam_role_policy_attachment" "ecr_read_only" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks_nodes.name
}
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${var.cluster_name}-main"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = aws_subnet.private[*].id
instance_types = ["t3.medium"]
capacity_type = "ON_DEMAND"
scaling_config {
desired_size = 3
max_size = 6
min_size = 2
}
update_config {
max_unavailable = 1
}
labels = {
role = "general"
}
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.ecr_read_only,
]
tags = {
Environment = var.environment
}
}
IAM Roles for Service Accounts (IRSA)
IRSA lets Kubernetes pods assume IAM roles without using access keys. This is the right way to grant AWS permissions to pods.
# Example: Create an IRSA role for a pod that needs S3 access
locals {
oidc_provider = replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
}
resource "aws_iam_role" "s3_reader" {
name = "${var.cluster_name}-s3-reader"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = local.oidc_provider_arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${local.oidc_provider}:aud" = "sts.amazonaws.com"
"${local.oidc_provider}:sub" = "system:serviceaccount:default:s3-reader-sa"
}
}
}]
})
}
resource "aws_iam_role_policy_attachment" "s3_reader" {
policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
role = aws_iam_role.s3_reader.name
}
Then create the Kubernetes ServiceAccount that references this role:
apiVersion: v1
kind: ServiceAccount
metadata:
name: s3-reader-sa
namespace: default
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-cluster-s3-reader
kubectl Configuration
After terraform apply, configure kubectl:
aws eks update-kubeconfig \
--region ap-south-1 \
--name my-cluster
# Verify connectivity
kubectl get nodes
kubectl get pods -A
You should see 3 nodes in Ready state and system pods running in kube-system namespace.
Deploying a Test Application
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-test
labels:
app: nginx-test
spec:
replicas: 3
selector:
matchLabels:
app: nginx-test
template:
metadata:
labels:
app: nginx-test
spec:
containers:
- name: nginx
image: nginx:1.27-alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
name: nginx-test
spec:
type: LoadBalancer
selector:
app: nginx-test
ports:
- port: 80
targetPort: 80
kubectl apply -f test-app.yaml
kubectl get svc nginx-test # Wait for EXTERNAL-IP
Nginx Ingress Controller Setup
Instead of creating a LoadBalancer per service, use an Ingress Controller:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.replicaCount=2 \
--set controller.service.type=LoadBalancer \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"=nlb \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"=internet-facing
Then define Ingress resources:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
Monitoring with Prometheus
Deploy the kube-prometheus-stack for full cluster observability. See our monitoring guide for the CloudWatch side.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=your-secure-password \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
Access Grafana:
kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80
Production Best Practices
Multi-AZ Deployment
Always spread nodes across at least 3 availability zones. Our node group above uses aws_subnet.private[*].id which includes all 3 AZ subnets. EKS automatically distributes nodes.
Spot Instances for Non-Critical Workloads
resource "aws_eks_node_group" "spot" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${var.cluster_name}-spot"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = aws_subnet.private[*].id
instance_types = ["t3.medium", "t3.large", "t3a.medium", "t3a.large"]
capacity_type = "SPOT"
scaling_config {
desired_size = 2
max_size = 10
min_size = 0
}
labels = {
role = "spot"
workload = "non-critical"
}
taint {
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}
}
Use tolerations in your pod specs to schedule non-critical workloads on spot nodes.
Cluster Autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=ap-south-1 \
--set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789012:role/cluster-autoscaler-role
Common Mistakes
- Missing subnet tags — without
kubernetes.io/role/elbtags, load balancers won’t provision - Public endpoint only — enable both private and public access, then restrict public by CIDR
- No IRSA — using instance roles or access keys in pods is a security risk
- Single AZ node groups — if that AZ has issues, your entire cluster goes down
- Skipping resource requests/limits — leads to noisy neighbors and OOM kills
- Not enabling cluster logging — you’ll need audit logs when something goes wrong
- Using latest tag for images — pin specific image versions for reproducibility
Frequently Asked Questions
How much does EKS cost compared to self-managed Kubernetes?
EKS charges $0.10/hour ($73/month) for the control plane. Self-managed K8s has no control plane fee but requires managing your own master nodes, etcd, and upgrades — which typically costs far more in engineering time.
What is the total pricing breakdown for EKS?
Control plane: $0.10/hour ($73/month). Worker nodes: standard EC2 pricing. A 3-node cluster with t3.medium costs roughly $73 + $90 = $163/month. Add NAT Gateway costs ($32+/month) for private subnets.
Should I use Fargate or managed node groups?
Managed node groups for most workloads — cheaper, support DaemonSets, GPU, and give you more control. Use Fargate for bursty workloads or when you want zero node management. Fargate doesn’t support DaemonSets or privileged containers.
How do I handle EKS version upgrades?
Update the cluster version in Terraform first, then update node groups. EKS supports in-place rolling upgrades. Always test in staging, review the Kubernetes changelog for deprecations, and ensure your add-ons are compatible.
What is the minimum cluster size for production?
Minimum 3 nodes across 3 availability zones. Use t3.medium or larger. This gives fault tolerance if an entire AZ goes down and enough capacity for system pods plus your workloads.