Design distributed systems the right way.
The open-source High-Level Design handbook. 159 modules, 727K+ words, 653 diagrams. From TCP to LLM serving. Read the chapters on the web or GitHub — no sign-up required to read.
Social Media Feed.
The fan-out decision, and how to survive a celebrity.
very timeline is a lie. When you open Instagram you are not seeing "the latest posts from your friends." You are seeing a snapshot pre-stitched seconds ago by a fan-out service running across a farm of machines. This chapter is about when the stitching should happen — and how to keep Justin Bieber from taking down the write-path.
Every other system design repo has a problem.
You've searched. You've starred five repositories. You still do not have a coherent place to learn. Here is why.
Link dumps
Awesome-lists that redirect to scattered blog posts of uneven quality. Great for curation, useless as teaching.
Teaser-and-redirect
READMEs that tease substance then steer you to ByteByteGo, DesignGurus, Educative, or InterviewReady. The real explanations are somewhere else.
Frozen in 2022
No AI system design, no LLM serving, no vector search, no CRDT-based collaboration. The field moved on; the repos did not.
Monolithic README
A single 110 KB file with no order, no search, no progress tracking. You scroll with Cmd-F and hope.
Dead links
Half the citations are 404s. The other half are behind a login wall the maintainer never noticed.
A handbook, not a list.
Every chapter is a complete teaching article with diagrams, trade-offs, exercises, and real citations. Ordered, opinionated, and openly licensed.
100% content, 0% links
Every concept is explained here with diagrams we drew, trade-offs we defended, and exercises we designed. External links are supplementary, never required.
Ordered curriculum
159 modules in 12 parts, sequenced by dependency. Every chapter declares its prerequisites. You always know what to read next.
Recommendations, not lists
We pick an approach and justify it. "It depends" is always followed by what it depends on. No fence-sitting.
Post-2024 topics included
LLM serving, RAG, AI agents, multi-agent orchestration, LLM evaluation, cost optimisation, safety, ML fundamentals, feature stores, recommendations, multimodal, voice agents.
Diagrams everywhere
653 Mermaid flowcharts, sequence diagrams, and state machines plus hand-drawn ASCII architecture. All render on GitHub AND this site.
CC BY-SA 4.0, irrevocable
Fork it, translate it, teach from it, improve it. The ShareAlike clause is deliberate — the chapter text stays open to read and remix.
12 parts, 159 modules, 2424 pages.
Comparable in scope to Designing Data-Intensive Applications (562 pages) or Alex Xu's two volumes combined. Openly licensed.
Prerequisites
Networking, OS, data structures, databases, APIs. The foundation.
Core Fundamentals
Scalability, CAP, estimation, interview framework. The vocabulary.
Building Blocks
Load balancers, caches, queues, databases, rate limiters.
Distributed Systems Theory
Consensus, CRDTs, clocks, consistent hashing. The theory.
Data Systems
Storage engines, OLAP, streams, search, vectors.
Architecture Patterns
Microservices, event-driven, CQRS, multi-region.
Reliability and Operations
Observability, SLOs, chaos, deployment strategies.
Security at Scale
OAuth2, JWT, mTLS, DDoS.
Case Studies
56 end-to-end system designs.
AI & ML System Design
LLM serving, RAG, agents, multi-agent orchestration, evaluation, cost, safety, ML fundamentals, feature stores, recommendations, multimodal, voice.
Emerging Patterns
Green computing and forward-looking topics that have not yet settled into a canonical home. Slim by design: new primitives land here first, then graduate into the relevant Part once they mature.
Interview Framework
RESHADED, diagramming, trade-off articulation, company-specific flavours.
Our most-read chapters.
Engineers at every level open these first — from first standalone design round to Staff+ loop.
Scalability
Vertical vs horizontal, stateless services, when scaling is the wrong answer.
CAP and PACELC
What CAP actually says. Why most people get it wrong. How to classify real databases.
Back-of-envelope estimation
Powers of two, storage and bandwidth math, Twitter and YouTube worked examples.
URL Shortener
Bitly-scale: 3.5 trillion short codes, 100:1 read-heavy, sub-50ms redirects.
Chat System (WhatsApp)
500M DAU, 50B messages/day. Connection management, ordering, end-to-end encryption.
Social Media Feed
The fan-out-on-write vs fan-out-on-read decision. Solving the celebrity problem.
Honest about where we stand.
There are good paid resources and good free ones. This is what you're trading off.
The principles.
These are not aspirational. They are enforced in code review and CI.
One hundred percent inline content
Every chapter is a full teaching article. No "read this external blog for the details." If a concept matters, it is explained here, with diagrams and trade-offs.
Progressive, opinionated curriculum
Chapters are numbered and ordered. Each one declares its prerequisites. "It depends" is always followed by "on what."
Honest about complexity
We do not use the words "simply" or "just." Distributed systems are hard. We say so, and walk through the hard parts.
Modern and maintained
Every file has a date_updated. Part 9 is a dedicated AI & ML System Design track with agents, evaluation, and cost covered end-to-end. Out-of-date numbers are flagged in review.
Peer-reviewed
Every chapter is reviewed by someone other than the author. For case studies, a reviewer with production experience is strongly preferred.
Open-licensed, readable without a sign-up
CC BY-SA 4.0 for content, MIT for code. The ShareAlike clause keeps the handbook text itself open to read and remix — nobody can lock the chapters behind a paywall. Any future companion tools live alongside the handbook, not on top of it.
Ready to read?
Start from the beginning, or jump straight to your target. The chapters are open to read — no sign-up, no paywall — so you can go as deep as you like.