How to Deploy Containers on AWS with ECS Fargate and Terraform
To deploy containers on AWS with ECS Fargate, you need an ECS cluster, a task definition specifying CPU/memory/image, an ALB with target type "ip", an ECS service with circuit breaker rollback, and auto-scaling policies. Fargate runs your containers without managing any EC2 instances — you define resources, AWS handles the rest.
You have dockerized your application. It runs perfectly on your local machine. Now you need somewhere to actually run it in the cloud — and you do not want to spend your weekends patching EC2 instances or figuring out Kubernetes cluster upgrades.
That is exactly what ECS Fargate solves. You tell AWS how much CPU and memory your container needs, give it your Docker image, and Fargate handles everything else. No servers to manage. No cluster capacity to plan. I have been using Fargate for my personal projects and it is one of the simplest ways to run containers in production on AWS.
What You Will Build
- ECS cluster — logical grouping for your services
- Task definition — your container configuration (image, CPU, memory, ports, logs)
- ECS service — keeps your desired number of tasks running with rolling deployments
- Application Load Balancer — distributes traffic across your containers
- Auto-scaling — automatically adds or removes containers based on CPU/memory usage
- Secrets management — database credentials and API keys from AWS Secrets Manager
Why Fargate
With standard ECS on EC2, you manage the underlying instances yourself. You pick the instance types, you handle patching, you deal with capacity — making sure there is enough room on your instances for new containers.
Fargate removes all of that. You define the CPU and memory for each task, and AWS provisions the compute on demand. You pay only for the resources your containers actually use, billed per second.
The trade-off is that Fargate costs more per unit of compute compared to EC2. But for most teams, the operational simplicity more than makes up for it.
Prerequisites
- A Docker image pushed to Amazon ECR (or any container registry)
- A VPC with private subnets (see my VPC guide)
- Terraform 1.5 or later
- AWS CLI configured
Step 1: Create the ECS Cluster
An ECS cluster is just a logical grouping. With Fargate, there is no infrastructure to provision — the cluster is essentially a namespace.
resource "aws_ecs_cluster" "main" {
name = "${var.name}-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Name = "${var.name}-cluster"
}
}
Container Insights gives you CPU, memory, and network metrics at the cluster and service level. It costs extra but is worth it for visibility.
Step 2: Create the Task Definition
The task definition tells ECS exactly how to run your container. This is where most of the configuration lives.
resource "aws_ecs_task_definition" "app" {
family = "${var.name}-app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = 512
memory = 1024
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = "app"
image = "${var.ecr_repo_url}:${var.image_tag}"
portMappings = [
{
containerPort = var.container_port
protocol = "tcp"
}
]
environment = [
{ name = "NODE_ENV", value = "production" },
{ name = "PORT", value = tostring(var.container_port) }
]
secrets = [
{
name = "DATABASE_URL"
valueFrom = aws_secretsmanager_secret.db_url.arn
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/${var.name}"
"awslogs-region" = var.region
"awslogs-stream-prefix" = "app"
}
}
essential = true
}
])
}
Important: Fargate only supports specific CPU and memory combinations. 512 CPU (.5 vCPU) supports 1024 MB to 4096 MB of memory. Using an invalid combination will fail with a cryptic error message. Check the AWS docs for the full list.
Execution Role vs Task Role
This confuses almost everyone at first:
- Execution role — used by the ECS agent, not your code. It needs permission to pull images from ECR, fetch secrets from Secrets Manager, and push logs to CloudWatch.
- Task role — used by YOUR application code running inside the container. If your app reads from S3 or writes to DynamoDB, those permissions go here.
# Execution role — for ECS agent
resource "aws_iam_role" "ecs_execution" {
name = "${var.name}-ecs-execution"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "ecs_execution" {
role = aws_iam_role.ecs_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Task role — for your application code
resource "aws_iam_role" "ecs_task" {
name = "${var.name}-ecs-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}
Step 3: Set Up Secrets with Secrets Manager
Never hardcode database passwords or API keys in your task definition. Store them in Secrets Manager and reference them:
resource "aws_secretsmanager_secret" "db_url" {
name = "${var.name}/database-url"
}
resource "aws_secretsmanager_secret_version" "db_url" {
secret_id = aws_secretsmanager_secret.db_url.id
secret_string = var.database_url
}
The task definition references the secret ARN, and ECS injects the value as an environment variable at runtime. Your container sees it as a normal env var.
You also need to give the execution role permission to read the secret:
resource "aws_iam_role_policy" "secrets_access" {
name = "secrets-access"
role = aws_iam_role.ecs_execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["secretsmanager:GetSecretValue"]
Resource = [aws_secretsmanager_secret.db_url.arn]
}]
})
}
Step 4: Create the ALB and Target Group
resource "aws_lb" "app" {
name = "${var.name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.public_subnet_ids
tags = {
Name = "${var.name}-alb"
}
}
resource "aws_lb_target_group" "app" {
name = "${var.name}-tg"
port = var.container_port
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip" # Must be "ip" for Fargate, not "instance"
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
matcher = "200"
}
deregistration_delay = 30
}
resource "aws_lb_listener" "app" {
load_balancer_arn = aws_lb.app.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app.arn
}
}
The most common mistake here: using target_type = "instance" instead of "ip". Fargate tasks do not run on traditional instances — they get their own ENI with their own IP address. The target group must be set to "ip" or your tasks will never receive traffic.
Step 5: Create the ECS Service
resource "aws_ecs_service" "app" {
name = "${var.name}-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.desired_count
launch_type = "FARGATE"
deployment_minimum_healthy_percent = 100
deployment_maximum_percent = 200
deployment_circuit_breaker {
enable = true
rollback = true
}
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.app.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = "app"
container_port = var.container_port
}
depends_on = [aws_lb_listener.app]
}
The deployment_minimum_healthy_percent = 100 and deployment_maximum_percent = 200 means during a rolling deployment, ECS starts new tasks before stopping old ones. You never go below your desired count.
The circuit breaker with rollback is critical. If your new container fails health checks, ECS automatically rolls back to the previous working version instead of leaving you with a broken deployment.
Step 6: Set Up Auto-Scaling
resource "aws_appautoscaling_target" "app" {
max_capacity = var.max_count
min_capacity = var.min_count
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "cpu" {
name = "${var.name}-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.app.resource_id
scalable_dimension = aws_appautoscaling_target.app.scalable_dimension
service_namespace = aws_appautoscaling_target.app.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
resource "aws_appautoscaling_policy" "memory" {
name = "${var.name}-memory-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.app.resource_id
scalable_dimension = aws_appautoscaling_target.app.scalable_dimension
service_namespace = aws_appautoscaling_target.app.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
Scale-out cooldown is 60 seconds — react quickly to traffic spikes. Scale-in cooldown is 300 seconds — do not remove containers too fast during brief dips.
Deploying a New Version
The workflow is straightforward:
- Build your new Docker image
- Push it to ECR with a new tag
- Update the task definition with the new image tag
- ECS automatically performs a rolling deployment
# Build and push
docker build -t my-app:v2 .
docker tag my-app:v2 123456789.dkr.ecr.ap-south-1.amazonaws.com/my-app:v2
docker push 123456789.dkr.ecr.ap-south-1.amazonaws.com/my-app:v2
# Update service (via Terraform or CLI)
aws ecs update-service \
--cluster my-cluster \
--service my-service \
--force-new-deployment
Common Mistakes to Avoid
- Wrong target group type. Fargate requires
target_type = "ip". Using "instance" means your containers never receive traffic and health checks fail. - Missing execution role permissions. If the execution role cannot pull from ECR or read secrets, your task fails to start. The error messages are not always clear about this.
- Too little CPU or memory. If your container needs more resources than what you allocated, it gets killed by the OOM killer with no useful error. Start generous and scale down after monitoring.
- Not enabling circuit breaker. Without it, a bad deployment keeps trying to start failing containers indefinitely. With rollback enabled, ECS automatically reverts to the last working version.
- Assigning public IP in private subnets. Fargate tasks in private subnets should not have public IPs. They access the internet through the NAT Gateway in your VPC.
Frequently Asked Questions
What is the difference between ECS EC2 and ECS Fargate?
With ECS EC2, you manage the underlying EC2 instances — patching, scaling, capacity planning. With Fargate, AWS manages all of that. You define CPU and memory, and Fargate handles the rest. Fargate costs more per compute unit but saves significant operational work.
What CPU and memory combinations are valid for Fargate?
Fargate has specific valid combinations. For example: 256 CPU supports 512-2048 MB. 512 CPU supports 1024-4096 MB. 1024 CPU supports 2048-8192 MB. 2048 CPU supports 4096-16384 MB. 4096 CPU supports 8192-30720 MB. Invalid combinations cause deployment failures.
What is the difference between execution role and task role?
The execution role is used by ECS itself to pull images, fetch secrets, and push logs. The task role is used by your application code to access AWS services like S3 or DynamoDB. They serve different purposes and should have different permissions.
How do I deploy a new version of my container?
Push your new image to ECR, update the task definition with the new tag, and update the ECS service. ECS performs a rolling deployment — starting new tasks before stopping old ones. With circuit breaker enabled, it rolls back automatically if health checks fail.
Why does my Fargate task keep failing to start?
Most common reasons: execution role cannot pull from ECR, container exits immediately (check CloudWatch logs), security group blocks outbound access to ECR, or invalid CPU/memory combination.
Skip the Manual Setup — Use the Terraform Module
Everything covered here — ECS cluster, task definition, ALB, auto-scaling, secrets, circuit breaker — is packaged into one Terraform module.
module "ecs_service" {
source = "github.com/akshayghalme/terraform-ecs-fargate-service"
name = "my-app"
image = "123456789.dkr.ecr.ap-south-1.amazonaws.com/my-app:latest"
container_port = 3000
cpu = 512
memory = 1024
desired_count = 2
vpc_id = module.vpc.vpc_id
private_subnets = module.vpc.private_subnet_ids
public_subnets = module.vpc.public_subnet_ids
}
One module call. Run terraform apply. Your containerized app is live with auto-scaling, rolling deployments, and circuit breaker protection.