Scaling from 1K to 1M Users on AWS — What Breaks at Each Stage
Scaling an AWS workload from 1,000 users to 1 million is not one problem. It is a sequence of five or six very specific problems, each with a different fix, and each one happens at a roughly predictable point. This post walks through what actually breaks at 1K, 10K, 100K, 500K, and 1M users, why it breaks, and the exact AWS architecture change that unblocks you to the next stage. Every stage is also a place where teams over-engineer and waste a year of engineering time solving the wrong problem — I will call out those traps as they come up.
I have watched several startups walk this path. The thing nobody tells you is that the bottlenecks are almost always the same, in almost the same order. It is not random. The database will hit a wall before the web tier does. The cache will save you twice before it stops being enough. The CI pipeline will break at 10,000 users because you suddenly have more than one engineer deploying. Each stage has a signature failure, and recognizing it early is half the battle.
This is the map I wish I had when I started. No hand-waving, no "it depends," just the actual sequence.
Stage 1: 0 to 1,000 Users — The Single Instance
At this stage, one properly-sized EC2 instance with a local or managed database is enough for anything. Do not overthink it. You are not Netflix. You will not need Kubernetes, a service mesh, or multi-region failover for months.
A reasonable starting architecture:
your app runs here] EC2 --> RDS[(RDS db.t4g.small
Postgres / MySQL)] EC2 --> S3[(S3
static assets + uploads)] classDef aws fill:#6C3CE1,stroke:#00D4AA,stroke-width:2px,color:#fff; class R53,EC2,RDS,S3 aws;
That is the entire stack. A single t3.medium can comfortably serve 1,000 concurrent users for a typical CRUD web application. RDS t4g.small handles the database. Uploads go to S3 with a CloudFront distribution in front for static assets.
What Breaks First
Almost nothing, computationally. What breaks first at this stage is deploys. The moment you have paying customers, every deploy that restarts the EC2 instance causes a visible outage. The second thing that breaks is the blast radius of a single server dying — AWS will occasionally retire an instance, and your site goes down until you notice.
The Fix
Put a load balancer (ALB) in front, run two EC2 instances in an Auto Scaling Group across two AZs. That is it. You now have zero-downtime deploys (rolling replacement through the ASG) and you survive an AZ failure. This is the point where you also want a proper production VPC with public and private subnets, because you are about to start caring about security boundaries.
The Over-Engineering Trap
Teams frequently introduce Kubernetes, microservices, or a message queue at this stage because a blog post said to. Do not. A monolith on two EC2 instances will outperform a 6-service K8s cluster in every metric that matters until you are well past 100,000 users. The cost of premature complexity is measured in engineering months lost to infrastructure that did not need to exist yet.
Stage 2: 1,000 to 10,000 Users — The Deploy Pipeline Breaks
At 10,000 users you typically have more than one engineer, and that is where the second set of problems hits. The infrastructure can still handle the load — a pair of t3.large instances behind an ALB will serve 10,000 concurrent users without breaking a sweat — but the process around the infrastructure starts failing.
What Breaks
- Manual deploys — someone SSH-ing into the box to
git pulldoes not survive two engineers pushing changes at the same time. - Secrets scattered in
.envfiles — they end up in git, Slack, or nowhere, depending on who set them up. - No rollback — when a deploy breaks, there is no "undo" button, just a git revert and a nervous re-deploy.
- Logs live on the instance — when the instance is replaced, you lose the logs you need to debug the incident that replaced it.
The Fix
This is the stage where you invest in the boring infrastructure that pays for itself for the next five years:
- CI/CD pipeline — GitHub Actions or GitLab CI building Docker images and deploying via ECS or a simple Auto Scaling Group rolling update. See GitHub Actions to AWS with OIDC for the pattern to use.
- Secrets Manager or Parameter Store — every secret fetched at runtime, nothing committed to git.
- Centralized logging — CloudWatch Logs or a third-party like Datadog. The instance becomes disposable.
- IaC for everything — Terraform in git. No more console clicks for production changes.
Compute-wise, you are still on 2-4 EC2 instances or a small ECS cluster. Database is still a single RDS instance, maybe db.t4g.medium now. The architecture has barely changed. The practices around it have changed completely.
Stage 3: 10,000 to 100,000 Users — The Database Becomes the Bottleneck
This is where it gets interesting. At 100,000 users, your compute layer is still almost embarrassingly cheap — three or four t3.large instances can easily serve that traffic. The database is always what breaks first.
What Breaks
The signals, roughly in the order they appear:
- RDS CPU climbs past 70% during peak. Queries that used to be fast are suddenly slow.
- Connection pool exhaustion. Each app server holds 20-50 connections, you now have 6 app servers, and your RDS instance only allows 200 connections total. Requests start queuing and then timing out.
- The "N+1 query" problem becomes visible. Code that issued 50 extra queries per request was fine at 1K users. At 100K users, it is burning 90% of your database CPU.
- Table locks during migrations. Adding a column to a large table takes the whole site down for 15 minutes.
The Fix — Read Replicas, Caching, and Connection Pooling
Three architectural changes, in this order:
1. Add a read replica. Most web apps are 80% reads. Point your analytics, search, and heavy read queries at a replica. This usually cuts primary database CPU by 50-70% overnight. Cost is roughly another $100-200/month for a matching replica instance, which is trivial compared to an outage.
2. Add RDS Proxy or PgBouncer. Connection pooling in the middle means your app servers can each hold 100 connections without actually consuming 100 real RDS connections. The proxy multiplexes the traffic. This alone buys you another 5-10x headroom on the database.
3. Add Redis (ElastiCache). Identify the 10-20 queries that run most often and cache their results. User sessions, feed generation, search results, anything computed from joins. A well-tuned Redis layer absorbs 80-95% of the read traffic for the hot path. Your database suddenly looks idle again.
Architecture at this stage:
Load Balancer] ALB --> E1[ECS Task 1] ALB --> E2[ECS Task 2] ALB --> E3[ECS Task 3] ALB --> E4[ECS Task N] E1 --> REDIS[(ElastiCache
Redis)] E2 --> REDIS E3 --> REDIS E4 --> REDIS E1 --> PROXY[RDS Proxy
connection pooling] E2 --> PROXY E3 --> PROXY E4 --> PROXY PROXY --> PRIMARY[(RDS Primary
writes)] PROXY --> REPLICA[(Read Replica
reads)] classDef aws fill:#6C3CE1,stroke:#00D4AA,stroke-width:2px,color:#fff; classDef cache fill:#4A1DB5,stroke:#00D4AA,stroke-width:2px,color:#fff; classDef db fill:#047857,stroke:#00D4AA,stroke-width:2px,color:#ffffff; class R53,CF,ALB,E1,E2,E3,E4 aws; class REDIS,PROXY cache; class PRIMARY,REPLICA db;
Also at this stage: turn on CloudFront in front of your API if you have not already. Even with a 30-second cache TTL on read endpoints, you will absorb a huge chunk of traffic at the edge before it ever touches your origin. And enable right-sizing on your compute instances — the defaults are almost always wrong.
The Over-Engineering Trap
This is where teams split their monolith into microservices because they think that is what scaling looks like. It is not. The database is still the bottleneck, and splitting your app into 10 services means 10 services all fighting for the same database connections. The correct move is to fix the database layer first — caching, replicas, query optimization, proper indexes — and leave the application monolith alone for now.
Stage 4: 100,000 to 500,000 Users — Background Work and Real Observability
Your web tier has auto-scaling. Your database has replicas and a cache in front. The obvious bottlenecks are gone. So what breaks next?
What Breaks
- Synchronous work in the request path. Sending an email, resizing an image, posting to a third-party API — each one adds 500ms to 2s of latency to a user request. At 100K users this was annoying. At 500K users it is the bottleneck.
- The deploy takes an hour. Your monolith is now big enough that the build, test, and rollout cycle is slow enough to matter.
- You have no idea why the site is slow. CloudWatch gives you CPU graphs but not traces. You cannot see which endpoint is slow or which query is the culprit.
- One bad query can still take down the whole site. There is no circuit breaker, no isolation.
The Fix
1. Introduce a background job queue. SQS plus a worker fleet, or SQS plus Lambda, or Celery/Sidekiq if you are in Python/Ruby. Every non-critical piece of work moves out of the request path. Emails, image processing, webhook deliveries, analytics writes — all of it goes on a queue. User response time drops dramatically because the request now just enqueues a job and returns.
2. Real observability. This is the stage where you stop pretending CloudWatch metrics are enough and pay for a real APM tool — Datadog, New Relic, or set up OpenTelemetry to a backend like Honeycomb. You need distributed tracing, not just dashboards. See my Prometheus and Grafana setup guide if you want the self-hosted path.
3. Database partitioning or archiving. Your biggest tables are now tens of millions of rows. Partition them by date, or start archiving old data to S3 and querying it with Athena when needed. Keeping 5 years of data in the hot database is what kills RDS performance at this stage.
4. Feature flags. LaunchDarkly, Unleash, or a homegrown flag table. You need to be able to ship code dark and turn features on for 1% of users at a time. Big-bang deploys are too risky at this size.
Stage 5: 500,000 to 1,000,000 Users — Breaking Up the Monolith (Maybe)
This is the stage where splitting services finally starts to make sense — but only for the parts that have genuinely different scaling profiles. Splitting for the sake of splitting is still wrong. You split a service off when one of these is true:
- One part of the application needs to scale independently (e.g., a video encoding service that needs GPUs sometimes, not all the time).
- One part has completely different reliability requirements (e.g., payments must stay up even when the feed is down).
- Two teams are stepping on each other in the same codebase and deploys are blocking each other.
Typical services to split off first:
- Media processing (uploads, thumbnails, transcoding) — bursty, expensive, isolated.
- Notifications (email, push, SMS) — high fan-out, different scaling, different failure domain.
- Payments — different reliability budget, different compliance requirements.
- Search — often wants a dedicated index (OpenSearch/Elasticsearch), not your main database.
What Breaks at This Scale
- Single database write capacity. Even with a beefy
db.r6g.4xlargeprimary, write throughput becomes a ceiling. This is when sharding (or moving specific high-write tables to DynamoDB) starts being a real conversation. - Cross-AZ data transfer costs. You suddenly notice that your bill has a line item for "Regional Data Transfer" that is bigger than your compute. Time to audit which services are unnecessarily chatty across AZs.
- Cold cache events. When Redis restarts or fails over, the stampede of cold requests hitting the database can take the site down. You need request coalescing, graceful degradation, and probably a warm standby cache.
- The CI/CD pipeline is a bottleneck. 30-minute test runs and 15-minute deploys stack up when you have 50 engineers pushing changes. Invest in faster pipelines, parallel test runs, and canary deploys.
The Fix
Canary deployments (deploy to 1% of traffic, watch metrics, roll forward), EKS or ECS with fine-grained service boundaries, a dedicated observability stack with SLO-based alerting, and for the database, either vertical scaling to the biggest RDS instance you can afford or starting the sharding conversation. If you need the K8s playbook, see my production EKS cluster guide.
The Universal Lessons
A few things that held true at every stage:
- The database breaks before the web tier. Always. Every single time. If you are optimizing your application servers before your database, you are solving the wrong problem.
- Caching is the single highest-leverage intervention. A well-placed Redis layer buys you more headroom than any other architectural change, and it costs less than one extra database instance.
- Premature microservices are the most common scaling mistake. A monolith is fine until it is not, and "not fine" usually happens around 500K+ users — not at 10K.
- Observability pays for itself. The tools feel expensive until you have your first production incident where you cannot see what happened. After that, they feel cheap.
- Cost discipline matters at every stage. Hidden charges like NAT Gateway data processing or data transfer between AZs compound fast at scale. Audit early, audit often.
- Boring technology wins. Postgres, Redis, ALB, ECS, SQS. These are the tools that still work at 1 million users. The shiny new thing is almost always the wrong bet.
Frequently Asked Questions
At what user count does a single EC2 instance stop being enough?
A single properly-sized EC2 instance can comfortably serve 1,000 to 10,000 users for most web applications. The real limit is rarely CPU or memory — it is usually the database connection pool, single points of failure, and deploy downtime. Most teams move off a single-instance setup well before they actually hit the compute ceiling, simply because they need zero-downtime deploys and an SLA.
When should I add read replicas to RDS?
Add read replicas when your database CPU stays above 60% during peak and most of the load is read traffic. For most web applications, the 80/20 rule holds — 80% of queries are reads. A single read replica usually buys you another 3 to 5x headroom before you need to think about caching or sharding. If your load is write-heavy, read replicas will not save you — you need caching or a different database.
When do I actually need Redis or a cache layer?
You need a cache when the same expensive query runs over and over again — typically user sessions, feed generation, search results, or anything computed from multiple joins. The signal is that your database CPU is climbing even though read replicas are already in place. A properly tuned Redis layer usually absorbs 80 to 95% of read traffic for the hot path, which means your database suddenly looks idle again.
Do I need microservices to scale past 100,000 users?
No. A well-built monolith on AWS can comfortably serve millions of users. Stack Overflow famously runs on a small number of physical servers and handles over 100 million pageviews a month. You only need to split services when different parts of your application have different scaling profiles, different deploy cadences, or different teams owning them. Splitting too early is one of the most common and expensive scaling mistakes.
What is usually the first thing to break at 100,000 users?
The database. Specifically, either the write capacity of a single primary instance, or the connection pool getting exhausted by too many application servers each holding their own pool. Almost every scaling problem at this level traces back to the database — and the fix is usually a combination of read replicas, connection pooling via PgBouncer or RDS Proxy, and an aggressive cache layer in front.
Related Reading From This Site
Every stage in this post maps to a deeper guide. If you want the specific how-to for any of them:
- Production VPC on AWS with Terraform — the networking foundation for everything here.
- Production RDS Database on AWS — the database setup that holds up as you scale.
- CI/CD from GitHub Actions to AWS with OIDC — the Stage 2 pipeline fix.
- Production EKS Cluster with Terraform — the Stage 5 K8s option.
- Prometheus + Grafana on AWS — the self-hosted observability stack.
- AWS NAT Gateway Cost Optimization — the hidden bill killer that hits every stage.
- AWS EC2 Right-Sizing — the other half of keeping your compute cost sane.