KUBERNETESAWS

How to Set Up AWS EKS with Terraform — Production-Ready Kubernetes Cluster

By Akshay Ghalme·April 8, 2026·~16 min read

Amazon EKS gives you a managed Kubernetes control plane so you can focus on deploying workloads instead of babysitting etcd clusters. In this guide, we’ll build a production-ready EKS cluster from scratch using Terraform — complete with managed node groups, IAM Roles for Service Accounts (IRSA), OIDC provider, Nginx Ingress Controller, and monitoring. Everything is copy-paste ready.

Prerequisites

  • AWS account with admin or sufficient IAM permissions
  • Terraform >= 1.5 installed
  • kubectl installed and configured
  • AWS CLI v2 configured with aws configure
  • A VPC with public and private subnets — see our VPC setup guide if you need one
  • Helm v3 installed (for Ingress Controller and Prometheus)

EKS Architecture Overview

Here’s what we’re building:

  • EKS Control Plane — managed by AWS across multiple AZs (you never see the master nodes)
  • Managed Node Groups — EC2 instances that AWS auto-provisions, patches, and drains during updates
  • OIDC Provider — enables IRSA so pods can assume IAM roles without storing credentials
  • Private subnets for worker nodes, public subnets for load balancers
  • Cluster Autoscaler to scale nodes based on pending pod demand

The control plane costs $0.10/hour ($73/month). Worker nodes are billed as regular EC2 instances.

VPC Setup for EKS

EKS requires specific subnet tags. If you followed our VPC guide, add these tags:

# Public subnets — for external load balancers
resource "aws_subnet" "public" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name                                        = "public-${count.index + 1}"
    "kubernetes.io/role/elb"                     = "1"
    "kubernetes.io/cluster/${var.cluster_name}"  = "shared"
  }
}

# Private subnets — for worker nodes
resource "aws_subnet" "private" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name                                        = "private-${count.index + 1}"
    "kubernetes.io/role/internal-elb"            = "1"
    "kubernetes.io/cluster/${var.cluster_name}"  = "shared"
  }
}

The kubernetes.io/role/elb and kubernetes.io/role/internal-elb tags tell the AWS Load Balancer Controller which subnets to use.

EKS Cluster Terraform Code

IAM Role for the Cluster

resource "aws_iam_role" "eks_cluster" {
  name = "${var.cluster_name}-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

resource "aws_iam_role_policy_attachment" "eks_vpc_resource_controller" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
  role       = aws_iam_role.eks_cluster.name
}

EKS Cluster with OIDC Provider

resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  version  = "1.31"
  role_arn = aws_iam_role.eks_cluster.arn

  vpc_config {
    subnet_ids              = concat(aws_subnet.public[*].id, aws_subnet.private[*].id)
    endpoint_private_access = true
    endpoint_public_access  = true
    security_group_ids      = [aws_security_group.eks_cluster.id]
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
    aws_iam_role_policy_attachment.eks_vpc_resource_controller,
  ]

  tags = {
    Environment = var.environment
  }
}

# OIDC Provider for IRSA
data "tls_certificate" "eks" {
  url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.main.identity[0].oidc[0].issuer

  tags = {
    Name = "${var.cluster_name}-oidc"
  }
}

Managed Node Group

resource "aws_iam_role" "eks_nodes" {
  name = "${var.cluster_name}-node-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_nodes.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_nodes.name
}

resource "aws_iam_role_policy_attachment" "ecr_read_only" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_nodes.name
}

resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.cluster_name}-main"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = aws_subnet.private[*].id

  instance_types = ["t3.medium"]
  capacity_type  = "ON_DEMAND"

  scaling_config {
    desired_size = 3
    max_size     = 6
    min_size     = 2
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    role = "general"
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.ecr_read_only,
  ]

  tags = {
    Environment = var.environment
  }
}

IAM Roles for Service Accounts (IRSA)

IRSA lets Kubernetes pods assume IAM roles without using access keys. This is the right way to grant AWS permissions to pods.

# Example: Create an IRSA role for a pod that needs S3 access
locals {
  oidc_provider     = replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")
  oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
}

resource "aws_iam_role" "s3_reader" {
  name = "${var.cluster_name}-s3-reader"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = local.oidc_provider_arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "${local.oidc_provider}:aud" = "sts.amazonaws.com"
          "${local.oidc_provider}:sub" = "system:serviceaccount:default:s3-reader-sa"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "s3_reader" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
  role       = aws_iam_role.s3_reader.name
}

Then create the Kubernetes ServiceAccount that references this role:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-reader-sa
  namespace: default
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-cluster-s3-reader

kubectl Configuration

After terraform apply, configure kubectl:

aws eks update-kubeconfig \
  --region ap-south-1 \
  --name my-cluster

# Verify connectivity
kubectl get nodes
kubectl get pods -A

You should see 3 nodes in Ready state and system pods running in kube-system namespace.

Deploying a Test Application

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-test
  labels:
    app: nginx-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-test
  template:
    metadata:
      labels:
        app: nginx-test
    spec:
      containers:
      - name: nginx
        image: nginx:1.27-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 250m
            memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-test
spec:
  type: LoadBalancer
  selector:
    app: nginx-test
  ports:
  - port: 80
    targetPort: 80
kubectl apply -f test-app.yaml
kubectl get svc nginx-test  # Wait for EXTERNAL-IP

Nginx Ingress Controller Setup

Instead of creating a LoadBalancer per service, use an Ingress Controller:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.replicaCount=2 \
  --set controller.service.type=LoadBalancer \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"=nlb \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"=internet-facing

Then define Ingress resources:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: nginx-test
            port:
              number: 80

Monitoring with Prometheus

Deploy the kube-prometheus-stack for full cluster observability. See our monitoring guide for the CloudWatch side.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=your-secure-password \
  --set prometheus.prometheusSpec.retention=15d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

Access Grafana:

kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80

Production Best Practices

Multi-AZ Deployment

Always spread nodes across at least 3 availability zones. Our node group above uses aws_subnet.private[*].id which includes all 3 AZ subnets. EKS automatically distributes nodes.

Spot Instances for Non-Critical Workloads

resource "aws_eks_node_group" "spot" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.cluster_name}-spot"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = aws_subnet.private[*].id

  instance_types = ["t3.medium", "t3.large", "t3a.medium", "t3a.large"]
  capacity_type  = "SPOT"

  scaling_config {
    desired_size = 2
    max_size     = 10
    min_size     = 0
  }

  labels = {
    role     = "spot"
    workload = "non-critical"
  }

  taint {
    key    = "spot"
    value  = "true"
    effect = "NO_SCHEDULE"
  }
}

Use tolerations in your pod specs to schedule non-critical workloads on spot nodes.

Cluster Autoscaler

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=ap-south-1 \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789012:role/cluster-autoscaler-role

Common Mistakes

  • Missing subnet tags — without kubernetes.io/role/elb tags, load balancers won’t provision
  • Public endpoint only — enable both private and public access, then restrict public by CIDR
  • No IRSA — using instance roles or access keys in pods is a security risk
  • Single AZ node groups — if that AZ has issues, your entire cluster goes down
  • Skipping resource requests/limits — leads to noisy neighbors and OOM kills
  • Not enabling cluster logging — you’ll need audit logs when something goes wrong
  • Using latest tag for images — pin specific image versions for reproducibility

Frequently Asked Questions

How much does EKS cost compared to self-managed Kubernetes?

EKS charges $0.10/hour ($73/month) for the control plane. Self-managed K8s has no control plane fee but requires managing your own master nodes, etcd, and upgrades — which typically costs far more in engineering time.

What is the total pricing breakdown for EKS?

Control plane: $0.10/hour ($73/month). Worker nodes: standard EC2 pricing. A 3-node cluster with t3.medium costs roughly $73 + $90 = $163/month. Add NAT Gateway costs ($32+/month) for private subnets.

Should I use Fargate or managed node groups?

Managed node groups for most workloads — cheaper, support DaemonSets, GPU, and give you more control. Use Fargate for bursty workloads or when you want zero node management. Fargate doesn’t support DaemonSets or privileged containers.

How do I handle EKS version upgrades?

Update the cluster version in Terraform first, then update node groups. EKS supports in-place rolling upgrades. Always test in staging, review the Kubernetes changelog for deprecations, and ensure your add-ons are compatible.

What is the minimum cluster size for production?

Minimum 3 nodes across 3 availability zones. Use t3.medium or larger. This gives fault tolerance if an entire AZ goes down and enough capacity for system pods plus your workloads.

AG

Akshay Ghalme

AWS DevOps Engineer with 3+ years building production cloud infrastructure. AWS Certified Solutions Architect. Currently managing a multi-tenant SaaS platform serving 1000+ customers.

More Guides & Terraform Modules

Every guide comes with a matching open-source Terraform module you can deploy right away.