HomeToolsAWS Cheatsheet

AWS DevOps Cheatsheet

The single-page reference you bookmark and keep open in a tab. Copy-paste-ready IAM policies, Terraform snippets, kubectl commands, CLI one-liners, cost tips, and the errors you'll actually hit in production.

Written by an AWS DevOps engineer for AWS DevOps engineers. Free, printable, no signup. Updated as things break in real infra.

🎯 Pair with SAA practice →

🔐 IAM Policies

The five templates you'll copy 90% of the time. Replace ARNs, accounts, and conditions to your context.

Least-privilege S3 read on one prefix

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::my-bucket",
      "arn:aws:s3:::my-bucket/reports/*"
    ]
  }]
}

Cross-account AssumeRole with ExternalId (confused-deputy safe)

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::123456789012:root"},
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {"sts:ExternalId": "unique-per-vendor-value"}
    }
  }]
}

Enforce MFA on privileged actions

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["iam:*", "kms:Decrypt"],
    "Resource": "*",
    "Condition": {
      "Bool": {"aws:MultiFactorAuthPresent": "true"}
    }
  }]
}

Deny dangerous wildcards via SCP (Org-wide guardrail)

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": ["iam:DeleteRole", "iam:DeleteUser", "cloudtrail:StopLogging"],
    "Resource": "*"
  }]
}

Permissions boundary for developer roles

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "NotAction": ["iam:*", "organizations:*", "account:*"],
    "Resource": "*"
  }]
}

🛡 Security Groups

Stateful, allow-only. Stack rules tightly. Never open 22/3389/database ports to 0.0.0.0/0.

Rules you should actually use

PurposeFromPortSource
HTTPS from internetALB SG4430.0.0.0/0
HTTP redirectALB SG800.0.0.0/0 → redirect to 443
App from ALBApp SG8080ALB SG id
DB from appDB SG5432App SG id
SSH adminEC2 SG22SSM Session Manager (no SG rule needed)

Rules to NEVER use

  • 22 / 0.0.0.0/0 — SSH to world = brute-force bot heaven
  • 3389 / 0.0.0.0/0 — RDP to world = ransomware vector
  • 5432 / 0.0.0.0/0 or any DB port — credential theft
  • -1 / 0.0.0.0/0 egress — allows data exfiltration from compromised host
  • Any SG reference to 0.0.0.0/0 without a time-bound break-glass rule

Terraform Snippets

The blocks you paste into every new project. Pre-configured for security + encryption.

Remote state with S3 + DynamoDB lock + encryption

terraform {
  required_providers { aws = { source = "hashicorp/aws", version = "~> 5.0" } }
  backend "s3" {
    bucket         = "mycompany-tfstate"
    key            = "prod/main.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "tfstate-locks"
    kms_key_id     = "alias/tfstate"
  }
}

Production VPC (multi-AZ, private/public)

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"
  name    = "prod"
  cidr    = "10.0.0.0/16"
  azs             = ["us-east-1a","us-east-1b","us-east-1c"]
  private_subnets = ["10.0.1.0/24","10.0.2.0/24","10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24","10.0.102.0/24","10.0.103.0/24"]
  enable_nat_gateway   = true
  single_nat_gateway   = false  # prod = 1 per AZ; dev = true to save $
  enable_vpn_gateway   = false
  enable_dns_hostnames = true
}

RDS Postgres — encrypted, Multi-AZ, private, no final skip

resource "aws_db_instance" "prod" {
  identifier               = "prod-postgres"
  engine                   = "postgres"
  engine_version           = "16.3"
  instance_class           = "db.t3.medium"
  allocated_storage        = 100
  storage_type             = "gp3"
  storage_encrypted        = true
  kms_key_id               = aws_kms_key.rds.arn
  db_subnet_group_name     = aws_db_subnet_group.private.name
  vpc_security_group_ids   = [aws_security_group.db.id]
  publicly_accessible      = false
  multi_az                 = true
  backup_retention_period  = 7
  deletion_protection      = true
  skip_final_snapshot      = false
  final_snapshot_identifier = "prod-postgres-final"
  performance_insights_enabled = true
}

S3 bucket — private, encrypted, versioned, block-public

resource "aws_s3_bucket" "data" { bucket = "mycompany-data" }
resource "aws_s3_bucket_public_access_block" "data" {
  bucket = aws_s3_bucket.data.id
  block_public_acls = true; block_public_policy = true
  ignore_public_acls = true; restrict_public_buckets = true
}
resource "aws_s3_bucket_versioning" "data" {
  bucket = aws_s3_bucket.data.id
  versioning_configuration { status = "Enabled" }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id
  rule { apply_server_side_encryption_by_default { sse_algorithm = "aws:kms"; kms_master_key_id = aws_kms_key.s3.arn } }
}

ALB + Target Group + HTTP→HTTPS redirect

resource "aws_lb" "app" {
  name = "app"
  load_balancer_type = "application"
  subnets = module.vpc.public_subnets
  security_groups = [aws_security_group.alb.id]
}
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app.arn
  port     = 443
  protocol = "HTTPS"
  certificate_arn = aws_acm_certificate.cert.arn
  default_action { type = "forward"; target_group_arn = aws_lb_target_group.app.arn }
}
resource "aws_lb_listener" "http_redirect" {
  load_balancer_arn = aws_lb.app.arn
  port     = 80
  protocol = "HTTP"
  default_action {
    type = "redirect"
    redirect { port = "443"; protocol = "HTTPS"; status_code = "HTTP_301" }
  }
}

Lambda + CloudWatch role (bare minimum)

data "aws_iam_policy_document" "assume" {
  statement {
    actions = ["sts:AssumeRole"]
    principals { type = "Service"; identifiers = ["lambda.amazonaws.com"] }
  }
}
resource "aws_iam_role" "lambda" {
  name = "app-lambda"
  assume_role_policy = data.aws_iam_policy_document.assume.json
}
resource "aws_iam_role_policy_attachment" "basic_exec" {
  role       = aws_iam_role.lambda.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

🌐 VPC & Networking

Pick CIDRs that won't collide with your on-prem or partners.

CIDR sizing quick reference

PrefixIPsUsableTypical use
/1665,53665,531VPC
/204,0964,091Large subnet
/221,0241,019Medium subnet
/24256251Standard subnet
/266459Small subnet
/281611Minimum AWS subnet

AWS reserves 5 IPs per subnet: network, VPC router, DNS, future use, broadcast.

RFC 1918 private ranges (don't collide)

  • 10.0.0.0/8 — 16.7M IPs. Most common for AWS VPCs.
  • 172.16.0.0/12 — 1M IPs. Default Docker bridge is here (172.17.0.0/16).
  • 192.168.0.0/16 — 65K IPs. Home routers use this. Avoid for corp.

Security Groups vs NACLs — 30-sec version

Security GroupNACL
Stateful?Yes (returns allowed)No (both sides rule)
ScopeENI / instanceSubnet
RulesAllow only (implicit deny)Allow + explicit Deny
OrderAll rules evaluatedLowest number wins
DefaultDeny all in / allow outAllow all

🖥 EC2 Instance Families

The one-line rule for picking the right instance.

FamilyPurposeUse for
t3 / t4gBurstableDev, low-traffic web, CI runners
m5 / m6i / m7gGeneralMost production web apps, small DBs
c5 / c6i / c7gCompute-optimizedVideo encoding, batch, game servers
r5 / r6i / r7gMemory-optimizedRedis, in-memory DBs, analytics
x2iednHigh memorySAP HANA, large in-memory
i4i / im4gnStorage-optimizedNoSQL, search, data warehouse nodes
g5 / p5GPUML inference / training
inf2 / trn1AWS Inferentia / TrainiumML at optimized cost

Graviton (ARM) = t4g / m7g / c7g / r7g — up to 40% better price/performance over x86 for most modern Linux workloads. Skip if you ship x86-only binaries (legacy Windows, some proprietary).

🪣 S3 Storage Classes

Match storage class to access pattern — can cut object-storage costs by 70%+.

ClassPrice /GB/moMin durationRetrieval timeUse for
Standard$0.023msActive data, <30d access pattern
Intelligent-Tiering$0.023 → $0.004msUnknown/unpredictable access
Standard-IA$0.012530 daysmsMonthly-access backups
One Zone-IA$0.0130 daysmsRe-creatable / secondary copies
Glacier Instant$0.00490 daysmsQuarterly-access archives
Glacier Flexible$0.003690 days1 min → 12 hRarely-accessed archives
Glacier Deep Archive$0.00099180 days12 → 48 hCompliance retention 7+ yrs

Lifecycle rule: hot → cold → delete

resource "aws_s3_bucket_lifecycle_configuration" "logs" {
  bucket = aws_s3_bucket.logs.id
  rule {
    id = "logs-tiering"; status = "Enabled"
    transition { days = 30;   storage_class = "STANDARD_IA" }
    transition { days = 90;   storage_class = "GLACIER" }
    transition { days = 365;  storage_class = "DEEP_ARCHIVE" }
    expiration { days = 2555 }  # 7 years
  }
}

kubectl — the 20 commands you use daily

Aliases, context switching, troubleshooting.

Setup

# Connect to EKS cluster
aws eks update-kubeconfig --name prod-cluster --region us-east-1

# Check current context + namespace
kubectl config current-context
kubectl config view --minify --output 'jsonpath={..namespace}'

# Switch namespace (install kubens first: brew install kubectx)
kubens production

Inspect

kubectl get pods -A # all namespaces
kubectl get pods -o wide -l app=api # filter by label
kubectl describe pod <pod> # events + status
kubectl get events --sort-by=.lastTimestamp | tail -20
kubectl top pods # needs metrics-server
kubectl top nodes

Logs & debug

kubectl logs -f <pod> # tail
kubectl logs <pod> -c sidecar # specific container
kubectl logs <pod> --previous # crashed pod
kubectl logs -l app=api --tail=50 # all pods of a label
kubectl exec -it <pod> -- sh
kubectl debug -it <pod> --image=busybox --target=<container>

Apply & rollout

kubectl apply -f deployment.yaml
kubectl rollout status deployment/api
kubectl rollout undo deployment/api # rollback
kubectl rollout restart deployment/api
kubectl scale deployment api --replicas=5
kubectl set image deployment/api api=myapp:v1.2.4

Port-forward & copy

kubectl port-forward svc/api 8080:80
kubectl cp <pod>:/var/log/app.log ./app.log
kubectl cp ./fix.sh <pod>:/tmp/fix.sh

RBAC quick check

kubectl auth can-i create deployments -n prod
kubectl auth can-i '*' '*' --as=system:serviceaccount:prod:app-sa
kubectl get rolebindings,clusterrolebindings -A -o wide | grep <user>

AWS CLI One-Liners

Paste, replace placeholders, done.

S3

# Sync local → S3 with delete + SSE-KMS
aws s3 sync ./site s3://mybucket --delete --sse aws:kms

# Presigned GET URL valid 1 hour
aws s3 presign s3://mybucket/key --expires-in 3600

# Size of a bucket
aws s3 ls s3://mybucket --recursive --summarize | tail -2

# Delete all versions in versioned bucket
aws s3api delete-objects --bucket mybucket --delete "$(aws s3api list-object-versions --bucket mybucket --output=json --query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')"

EC2

# Running instances w/ names
aws ec2 describe-instances --query 'Reservations[].Instances[?State.Name==`running`].[InstanceId,Tags[?Key==`Name`].Value|[0],InstanceType]' --output table

# Stop all non-prod instances (filter by tag)
aws ec2 stop-instances --instance-ids $(aws ec2 describe-instances --filters Name=tag:Env,Values=dev Name=instance-state-name,Values=running --query 'Reservations[].Instances[].InstanceId' --output text)

# SSM start-session (SSH replacement)
aws ssm start-session --target i-0abc123

IAM

# Who am I?
aws sts get-caller-identity

# List users with last-used info
aws iam list-users --query 'Users[].[UserName,PasswordLastUsed]' --output table

# Rotate access key
aws iam create-access-key --user-name alice
aws iam update-access-key --user-name alice --access-key-id AKIA... --status Inactive
aws iam delete-access-key --user-name alice --access-key-id AKIA...

CloudFormation / Terraform helpers

# Account + region
aws configure list
aws configure get region

# Assume role quickly
eval "$(aws sts assume-role --role-arn arn:aws:iam::123:role/Admin --role-session-name me | jq -r '.Credentials|"export AWS_ACCESS_KEY_ID=\(.AccessKeyId) AWS_SECRET_ACCESS_KEY=\(.SecretAccessKey) AWS_SESSION_TOKEN=\(.SessionToken)"')"

📊 CloudWatch Logs Insights

Query your logs like SQL. Save these in the console for one-click access.

Top 10 slowest API Gateway requests

fields @timestamp, @message, @duration
| filter @type = "REPORT"
| sort @duration desc
| limit 10

Lambda errors grouped by function

fields @timestamp, @message
| filter @message like /ERROR|Exception|Traceback/
| stats count() by @logStream
| sort count desc

VPC Flow Logs — rejected connections by source IP

fields @timestamp, srcAddr, dstAddr, dstPort, action
| filter action = "REJECT"
| stats count() as attempts by srcAddr
| sort attempts desc
| limit 25

💰 Cost Optimization — Top 10 Wins

In rough order of highest impact for least effort.

  1. Commit to Savings Plans — up to 72% off steady-state EC2, Lambda, Fargate. Start with a low 1-year Compute SP.
  2. Right-size EC2 and RDSAWS Compute Optimizer runs ML on your CloudWatch data and picks the right instance.
  3. Turn off dev at night — Lambda + EventBridge: stop dev EC2 / RDS 7pm → 7am = 65% saved.
  4. S3 lifecycle rules — move logs and old data to IA / Glacier automatically.
  5. NAT Gateway VPC Endpoints — S3 + DynamoDB gateway endpoints are FREE and bypass NAT data-processing charges.
  6. EBS gp2 → gp3 — 20% cheaper, more throughput. Migrate with zero downtime via modify-volume.
  7. Delete unattached EIPs — $3.60/mo per unused EIP. Trusted Advisor flags them.
  8. Spot for stateless batch — up to 90% off for fault-tolerant workloads.
  9. CloudFront in front of S3 — cheaper egress + global performance + free SSL.
  10. Graviton (ARM) — 40% better price-performance on t4g / m7g / c7g / r7g.

🔒 Security Checklist — 12 rules

Go through this on every new AWS account before handing it to developers.

  1. Root user: hardware MFA, no access keys, locked in a safe.
  2. Enable MFA for every IAM user; require it for privileged actions.
  3. S3 Block Public Access at the account level — master switch.
  4. EBS encryption by default at the region level.
  5. CloudTrail multi-region, log-file validation, delivered to a separate logs account.
  6. GuardDuty enabled in every region; findings → Security Hub.
  7. IAM Access Analyzer to find unintended public/cross-account access.
  8. Use roles, not users — federate via IAM Identity Center; workloads via instance profiles / OIDC.
  9. Permissions boundaries for developer-created roles; SCPs at the OU level.
  10. Config enabled with the "aws-foundational-security-best-practices" rule pack.
  11. No SSH to the world — use SSM Session Manager instead.
  12. Secrets in Secrets Manager with rotation; never in env vars, Lambda config, or code.

🔧 Common Errors → Fixes

The errors you'll actually hit. Straight to the fix.

Terraform "Error acquiring the state lock"

# Check who holds the lock
aws dynamodb get-item --table-name tfstate-locks --key '{"LockID":{"S":"mybucket/prod/main.tfstate-md5"}}'

# If you're SURE nothing is running (check CI + teammates first)
terraform force-unlock <lock-id>

"AccessDenied" on S3 despite IAM allow

  • Check the bucket policy — resource policy can explicitly Deny.
  • Check SCPs — parent OU may block the action.
  • Check Permissions boundary on the role — may cap permissions.
  • Check KMS key policy if the object is SSE-KMS.
  • Test with IAM Policy Simulator before debugging further.

Pod stuck in "Pending" or "ImagePullBackOff"

kubectl describe pod <pod>   # read Events at the bottom
# Common fixes:
# - Pending → nodes full; kubectl top nodes + kubectl describe node
# - ImagePullBackOff → ECR auth; kubectl create secret docker-registry regcred ...
# - CreateContainerConfigError → secret/configmap referenced doesn't exist

"Too many open connections" on RDS

  • Put RDS Proxy in front — pools connections across Lambda/app instances.
  • Lower idle timeouts in application DB client config.
  • If Lambda: init the client outside the handler so it's reused between invocations.

High NAT Gateway bill

  • Add S3 and DynamoDB Gateway VPC Endpoints — free and remove bulk of NAT data charges.
  • Use Interface Endpoints for Secrets Manager, STS, ECR, CloudWatch Logs — ~$7/mo per endpoint but saves on data + NAT.
  • In dev: use single-AZ NAT (1 gateway instead of 3) — single_nat_gateway = true in the TF VPC module.

EKS IAM auth failing ("You must be logged in to the server")

# Re-fetch kubeconfig
aws eks update-kubeconfig --name mycluster --region us-east-1

# Check aws-auth ConfigMap — does the role/user exist?
kubectl -n kube-system get cm aws-auth -o yaml

# Add a new IAM role to mapRoles (requires cluster admin access)
kubectl -n kube-system edit cm aws-auth

🎯 Pair with AWS SAA practice

This cheatsheet is the reference. The game builds the muscle memory.