Design a File Sync Service (Dropbox / Google Drive)

TL;DR. A file sync service splits into two layers: a content-addressable block store holding immutable chunks keyed by SHA-256, and a metadata service tracking the file tree, versioning, and permissions. Content-defined chunking ensures local edits only re-upload local chunks. Block-level deduplication means the same 10 MB PDF across 100 million inboxes is stored once. Conflict resolution uses version vectors and dual-copy semantics because silent last-writer-wins destroys data^[1]. Dropbox runs this at ~5 exabytes across 600,000+ drives with 12+ nines of durability^[2]^[3].

Learning Objectives#

Design a two-layer file sync architecture that separates metadata from content at exabyte scale
Compare fixed-size chunking vs content-defined chunking and justify when each applies
Implement block-level deduplication with the need_blocks negotiation protocol
Apply rsync-style delta sync to minimize wire bytes on file edits
Resolve conflicts from concurrent offline edits without silent data loss
Estimate capacity for 700M+ users storing petabytes of daily writes

Intuition#

A file sync service looks like a trivial CRUD app. Upload a file, download it elsewhere. Handles 10 users fine. At 700 million users storing 5 exabytes, it collapses, and the reasons are three-fold.

First, clients spend most of their time offline or on flaky networks. Two laptops editing the same spreadsheet on separate airplanes cannot run a consensus protocol. When they reconnect, the server faces divergent state and must reconcile without losing either user's work^[4].

Second, the same 10 MB PDF circulates across millions of inboxes. Storing one copy per user wastes petabytes. Cross-user deduplication becomes economically mandatory at scale^[5]^[2:1].

Third, files mutate in place. An editor rewriting one cell in a 2 GB spreadsheet must not re-upload 2 GB. The system must identify which bytes actually changed and transfer only those^[6]^[7].

The insight that unlocks the design: separate what a file IS (its metadata: name, path, permissions, version history) from what a file CONTAINS (immutable byte chunks keyed by their cryptographic hash). Once you make this split, versioning lives in metadata, dedup lives in content, conflicts live in metadata, and blocks are dumb. Every other design choice follows.

Requirements#

Clarifying Questions#

Q: What file sizes must we support? Assume: 1 KB text notes to 50 GB video files. Median file ~2 MB, p99 ~500 MB.
Q: How many devices per user? Assume: Average 3 devices (phone, laptop, desktop). Max 10.
Q: Do we need real-time collaborative editing? Assume: No. That is a separate system (OT/CRDT). We handle offline-first sync with conflict detection.
Q: What consistency model for cross-device sync? Assume: Eventual consistency with causal ordering. Changes propagate within seconds when online.
Q: End-to-end encryption required? Assume: Optional tier. Standard mode allows server-side dedup; E2E mode sacrifices cross-user dedup for privacy.
Q: Multi-region? Assume: Yes. Three zones (West, Central, East) with cross-zone replication in ~1 second^[5:1].

Functional Requirements#

Upload files from any device; sync to all linked devices within seconds when online
Resume interrupted uploads and downloads without restarting from zero
Detect and surface conflicts from concurrent offline edits (no silent data loss)
Support file versioning with configurable retention (30-day default)
Share files and folders with other users with granular permissions
LAN sync: transfer blocks peer-to-peer when devices share a local network^[8]

Non-Functional Requirements#

Users: 700M+ registered, ~18M paying^[2:2]
Storage: ~5 exabytes total customer data^[2:3]
Durability: 12+ nines (>99.9999999999%)^[3:1]
Availability: 99.99% for reads and writes^[2:4]
Latency: Change notification to peer devices p99 < 1 second when online
Consistency: Eventual with causal ordering per namespace; strong consistency between client and server views^[4:1]

Capacity Estimation#

Metric	Value	Derivation
Registered users	700M	Dropbox public disclosure^[2:5]
Active devices (peak)	~200M	700M users x 0.3 DAU ratio x 1 device avg active
Daily file changes	~2B	200M active devices x 10 changes/day avg
Write QPS (avg)	~23K	2B / 86,400
Write QPS (peak, 3x)	~70K	Burst during business hours
Avg change size	4 MB	One block (Dropbox block size)^[6:1]
Daily ingress (pre-dedup)	8 PB	2B changes x 4 MB
Daily ingress (post-dedup)	~3 PB	~60% dedup ratio (estimated)
Total storage	~5 EB	Dropbox FY2025^[2:6]
Storage drives	600,000+	Across all zones^[2:7]
Block Index entries	~1.5T	5 EB / 4 MB avg block

Key ratios: read:write is roughly 3:1 (each change fans out to ~3 devices). Block dedup eliminates ~40-60% of raw ingress. Cold data (>1 year old) represents >90% of bytes but <5% of access^[9].

API and Data Model#

API Design#

POST /v1/files/commit
  Body: { "namespace_id": "ns123", "path": "/docs/report.pdf",
          "blocklist": ["sha256_h1", "sha256_h2", "sha256_h3"],
          "parent_version": 42 }
  Returns: 200 { "version": 43, "need_blocks": ["sha256_h2"] }
           409 conflict (parent_version stale)

PUT /v1/blocks/{block_hash}
  Headers: Content-Type: application/octet-stream, Block-Format: broccoli
  Body: <compressed block bytes, max 4 MB>
  Returns: 201 stored | 200 already exists

GET /v1/blocks/{block_hash}
  Headers: Accept-Encoding: broccoli, identity
  Returns: 200 <block bytes>

GET /v1/files/list?namespace_id=ns123&cursor=41
  Returns: 200 { "entries": [...], "cursor": 43, "has_more": false }

GET /v1/notifications/longpoll?namespace_id=ns123&cursor=41
  Returns: 200 { "new_cursor": 43 } (blocks until change or timeout)

The commit endpoint is the core protocol: client sends the blocklist, server replies with need_blocks indicating which hashes are missing. Client uploads only those, then retries commit^[6:2]. Idempotency is inherent: uploading the same block hash twice is a no-op.

Data Model#

-- Metadata layer (sharded MySQL / Panda)
CREATE TABLE file_journal (
    namespace_id  BIGINT,
    journal_id    BIGINT AUTO_INCREMENT,  -- monotonic per namespace
    path          VARCHAR(4096),
    file_id       UUID,                   -- stable across renames
    blocklist     JSON,                   -- ordered list of SHA-256 hashes
    size_bytes    BIGINT,
    version_vector JSON,                  -- {client_id: counter}
    is_deleted    BOOLEAN DEFAULT FALSE,
    created_at    TIMESTAMP,
    PRIMARY KEY (namespace_id, journal_id)
);

-- Content layer (Magic Pocket / S3)
CREATE TABLE block_index (
    block_hash    CHAR(64) PRIMARY KEY,   -- SHA-256 hex
    cell_id       INT,
    bucket_id     BIGINT,
    checksum      INT,                    -- corruption detection
    ref_count     INT DEFAULT 1
);

The file journal is append-only: every mutation (create, edit, rename, delete) appends a new row with an incremented journal_id. Clients poll by cursor (their last-seen journal_id)^[7:1]. The block index maps hash to physical location; it has no concept of files or users^[5:2].

High-Level Architecture#

The architecture splits cleanly: metadata handles correctness (file tree, versioning, permissions) while the content layer handles bytes (immutable, content-addressed, globally deduplicated). Neither knows the other's internals.

Write path: Device A chunks the file into 4 MB blocks, computes SHA-256 per block, calls commit with the blocklist. The metadata service checks which hashes exist in the block index, returns need_blocks. Device A uploads missing blocks via gRPC PutBlock^[10], retries commit, and the metadata service appends a new journal entry. Kafka propagates the change event to the notification service.

Read path: Device B holds a long-poll connection to the notification service. On cursor advance, it calls list to fetch new journal entries, then GetBlock for any blocks not in local cache. It reconstructs the file by concatenating blocks in blocklist order and atomically renames the temp file over the old one.

LAN sync: Devices on the same network discover each other via UDP broadcast on port 17500^[8:1]. Block transfers happen peer-to-peer over HTTPS on TCP ports 17599-17609 (the first available), authenticated with per-namespace SSL certificates^[11]. Metadata always comes from the server; only opaque block bytes transfer locally^[8:2].

Deep Dives#

Deep dive 1: Block-level deduplication with content-defined chunking#

The problem: Naive per-user storage at 700M users is economically impossible. The same stock image, OS installer, or PDF exists in millions of accounts. Without dedup, storage costs scale linearly with user count.

Content-addressing: Every block is keyed by SHA-256 of its uncompressed bytes. The block store is a pure key-value mapping: hash -> encrypted compressed bytes^[5:3]. It has no concept of files, users, or permissions. If two users upload the same 4 MB block, it is stored once.

The need_blocks protocol: On commit, the client sends the full blocklist. The server checks each hash against the block index. Hashes already present are skipped; only missing hashes require upload^[6:3]. For a fully duplicated file (the 10 MB PDF in 100M inboxes), zero bytes cross the wire after the first upload.

Fixed vs content-defined chunking: Dropbox uses fixed 4 MB blocks^[6:4]^[7:2]. This is simple and aligns with erasure-coding stripe sizes, but inserting one byte at offset 0 shifts every subsequent boundary and invalidates all downstream hashes^[12]. Content-defined chunking (CDC) uses a rolling hash (Rabin fingerprint or FastCDC) to declare boundaries where content matches a pattern. Boundaries track content, not position, so edits only disturb local chunks^[12:1]^[13]. Modern systems like Hugging Face xet use ~64 KB average CDC chunks for ML model versioning^[14].

The dedup protocol: client sends hashes, server identifies novelty, client uploads only new blocks. For fully duplicated files, zero bytes cross the wire.

Scale numbers: Magic Pocket stores ~5 EB across 600,000+ drives^[2:8]. Blocks are compressed with Broccoli (Dropbox's Rust Brotli encoder with minor tweaks that enable O(1) concatenation while still emitting standard Brotli output) before wire transfer, cutting median upload size from 3.5 MB to 1.6 MB^[10:1]. FastCDC achieves ~10x the throughput of classic Rabin CDC for systems that choose variable-size chunks^[13:1].

Privacy risk: Client-side dedup reveals whether a hash exists on the server. An attacker can probe for specific files by observing upload timing^[15]. Mitigations: server-side-only dedup (client always uploads, server dedupes on receive) or randomized thresholds. Apple's Advanced Data Protection side-steps this entirely with per-user E2E keys, but cross-user dedup collapses^[16].

Deep dive 2: Delta sync with the rsync algorithm#

The problem: Even with chunking, a 4 MB block that changes by 100 bytes still uploads 4 MB. For large files with small edits (database files, Photoshop scratch), this wastes bandwidth.

The rsync algorithm (Tridgell and Mackerras, 1996) completes in one round trip^[17]. The receiver splits its copy into S-byte blocks, computes a cheap rolling 32-bit checksum and a 128-bit MD4 hash for each, and sends the signature to the sender. The sender rolls a window across its version computing the rolling checksum at every byte offset. On a match, it verifies with the MD4 hash. The sender emits instructions: "use receiver's block k" or "here are N literal bytes."

Wire bytes scale with the actual difference, not the file size. For a 100 MB file with a 1 KB edit, rsync transfers ~1 KB plus overhead^[17:1].

Dropbox's implementation: The open-source fast_rsync crate (Rust, SIMD-accelerated) computes signatures at 3.6 GiB/s on an Intel i7-6820HQ, roughly 6x faster than librsync^[18]. Caveat: MD4 is cryptographically broken; callers must separately verify integrity with SHA-256^[18:1].

Streaming sync: Dropbox overlaps upload and download. The server accepts not-yet-committed blocklists in memcache keyed by uploader. Downloader clients prefetch those blocks before the formal metadata commit lands. A 500 MB sync dropped from 383 seconds to 293 seconds in Dropbox's tests^[7:3].

Streaming sync overlaps upload and download via a prefetch cache. The downloader pulls blocks before the metadata commit lands, roughly halving end-to-end sync time.

Deep dive 3: Conflict resolution with version vectors#

The problem: Two clients editing offline produce divergent versions. Last-writer-wins silently destroys one user's work. This happens every day on shared folders^[1:1].

Version vectors: Each file version carries a vector: a map from client ID to logical counter. On reconnect, the server compares the incoming vector against the current head. If one dominates (every component >=), it is a clean fast-forward. If neither dominates, the versions are concurrent^[1:2]^[19].

Dual-copy resolution: On detected concurrency, the server keeps both versions: report.docx (the first to arrive) and report (conflicted copy 2026-05-04).docx (the second). Both are visible to the user, who reconciles manually^[1:3]. This looks ugly but is strictly better than silent data loss for binary files where 3-way merge is impossible (Photoshop, Sketch, video).

Why LWW is forbidden: Unlike a cache or a counter, a file represents hours of human work. Silently discarding one version is unacceptable. Every production file sync service (Dropbox, Google Drive, OneDrive) uses dual-copy or a variant^[1:4]^[4:2].

Two offline clients produce divergent version vectors. The server detects concurrency (neither vector dominates) and preserves both versions as a dual-copy, surfacing the conflict to users.

Nucleus (Dropbox's sync engine rewrite): Assigns globally unique file identifiers preserved across renames and moves. Atomic moves are independent of subtree size. Strong consistency checks where "any discrepancy is a bug"^[4:3]. The edge case: Alberto moves Archives into January while Beatrice moves Drafts into Archives. Applied naively, this creates a directory cycle. Nucleus detects this via globally unique IDs and rejects the cycle^[4:4].

Real-World Example#

Dropbox Magic Pocket: from S3 to exabyte-scale in-house storage#

Dropbox started on Amazon S3. By 2015, they stored over 500 PB and the economics no longer worked. The migration to Magic Pocket, their in-house immutable block storage system, peaked at over 0.5 Tbps of network traffic and reportedly saved ~$74.6 million net over two years^[20]^[3:2]^[21].

Cell architecture: Magic Pocket is a three-zone system (West Coast, Central, East Coast). Inside a zone, Frontends consult a Block Index (sharded MySQL: hash -> cell, bucket, checksum) to locate blocks. Each zone contains Cells, self-contained 50 to 100 PB units holding Object Storage Devices on SMR drives^[5:4]^[2:9]. Writes go into 4x-replicated volumes; when a volume fills and closes, it gets erasure-coded. Reed-Solomon 6+3 carries a 1.5x storage overhead (9 fragments stored per 6 data fragments); Local Reconstruction Codes drive that down to roughly 1.33x with the same single-failure read cost, at the cost of losing coverage for rarer 3-failure patterns^[5:5]^[2:10].

Engineering philosophy: "Boring tech wins." Sharded MySQL for the Block Index rather than a novel KV store, "more than capable" and "leverages operational competency"^[5:6]. Centralized Master per cell rather than Paxos. Immutable blocks rather than mutable blocks. The Master caps cell size at ~100 PB due to memory/CPU, but cells are independent so the system scales horizontally.

Panda: The metadata layer evolved from raw sharded MySQL to Panda, a custom petabyte-scale transactional key-value store providing ACID transactions, range transfers, MVCC, and hybrid-logical clocks^[22]. This powers the file journal, permissions, and sharing graph.

Cold storage optimization: Data older than one year (>90% of bytes, <5% of access) uses XOR-parity striping across 3 regions (2+1), reducing replication from 2x to 1.5x for 25% disk savings^[9:1]. Cross-zone replication completes in ~1 second^[5:7]. For comparison, Google Drive sits on Colossus (cluster file system scaling to exabytes per cluster with tens of thousands of machines) and Spanner for metadata^[23].

Trade-offs#

Approach	Pros	Cons	When to Use	Our Pick
Fixed 4 MB chunks	Simple, deterministic, aligns with stripe sizes	Misaligns on insert/delete; no delta savings	Append-only workloads; pair with compression	Use at scale with Broccoli compression
Content-defined chunking (8 KB avg)	Robust to edits; dedup-friendly across files	Variable size; CPU-heavy rolling hash; min/max clamps needed	General-purpose sync with frequent edits	Default for new systems
Per-user dedup only	Strong privacy; no side channel	No cross-user savings; huge storage cost	E2E encrypted tiers (iCloud ADP)	Only when privacy mandates it
Global cross-user dedup	Massive storage savings at scale	Side-channel leak^[15:1]; incompatible with per-user E2E keys	Consumer sync with privacy mitigations	Default for consumer products
Dual-copy conflict resolution	Safe, user-resolvable, no data loss	Inbox clutter when conflicts are frequent	Default for all file sync	Always
Erasure coding (Reed-Solomon 6+3, 1.5x overhead)	Storage-efficient vs 3x replication; tolerates 3 drive failures	Higher read latency (reconstruction); CPU for encode/decode	Cold data; hot data uses replication	Cold tier (>1 year old)
Local Reconstruction Codes (~1.33x overhead)	Cheaper than Reed-Solomon 6+3; single-failure read cost unchanged	Loses coverage for rarer 3-failure patterns vs RS(6+3)	Warm/cold tiers where 1-failure repair dominates	When per-byte cost outweighs failure-pattern coverage

The single biggest trade-off: dedup vs end-to-end encryption. When each user encrypts with a unique key, identical plaintext produces different ciphertext and cross-user dedup collapses^[16:1]. Apple's Advanced Data Protection makes this trade-off explicit: standard mode allows server-side dedup via metadata checksums; ADP mode encrypts everything with user-held keys and accepts higher storage costs^[16:2].

Scaling and Failure Modes#

At 10x load (7B users): The Block Index becomes the bottleneck. Sharded MySQL handles it today at 1.5T entries, but 15T entries requires either wider sharding or migration to a distributed KV. Cell count grows from ~100 to ~1,000. Cross-zone replication bandwidth approaches network saturation.

At 100x load (70B files, 50 EB): The metadata layer needs a global database (Spanner-class). The block store needs multi-region erasure coding by default, not just for cold data. Client-side CDC becomes mandatory to reduce ingress.

At 1000x load: The architecture shifts to edge-first: blocks cached at CDN edge, metadata replicated to edge PoPs, sync decisions made locally with lazy server reconciliation.

Failure modes:

Zone outage: Cross-zone replication ensures blocks exist in 2+ zones. Metadata failover to a secondary zone within seconds. Clients retry against the surviving zone. RPO: ~1 second of replication lag^[5:8].
Block corruption (bit rot): Each block has a checksum in the Block Index. Background scrubbers verify checksums continuously. Corrupted blocks are repaired from erasure-coded parity or cross-zone replicas. Repair SLA: <48 hours; ~4 extents repaired per second, each 1 to 2 GB^[2:11].
Metadata inconsistency: A file must not appear in metadata before its blocks land, or reads return 404. The two-phase commit protocol (upload blocks first, then commit metadata) prevents this. Streaming sync's prefetch cache adds complexity: memcache entries must expire safely^[7:4].

Common Pitfalls#

Warning

Fixed-size chunking on mutable files. Insert one byte at offset 0 of a 10 MB file. Every subsequent 4 MB block has a new hash. The sync engine uploads the entire file. Use CDC, or compensate with compression and streaming sync^[12:2]^[10:2].

Warning

Silent last-writer-wins on conflicts. Without a version vector, the server cannot distinguish concurrent from sequential writes and silently discards one user's work^[1:5]. Always detect divergence and preserve both versions.

Warning

Full re-upload on rename or move. If the client represents moves as delete + add pairs, a transient failure can drop the add and the file disappears^[4:5]. Use globally unique stable file IDs preserved across moves.

Warning

No resumable uploads. A user on hotel Wi-Fi uploads a 2 GB file. Connection drops at 1.8 GB. Block-level upload means a drop loses at most the last 4 MB chunk. On retry, need_blocks identifies only missing chunks^[6:5]^[7:5].

Warning

Cross-user dedup side channel. If client-side dedup skips upload when the hash exists, an attacker can probe whether specific files are on the server^[15:2]. Use server-side-only dedup or randomized thresholds.

Follow-up Questions#

1 How do you handle end-to-end encryption without losing dedup?

You cannot have both. Apple's Advanced Data Protection makes this explicit: standard mode keeps server-accessible checksums for dedup; ADP mode encrypts with user-held keys and accepts higher storage costs^[16:3]. A middle ground: convergent encryption (hash the plaintext to derive the key), but this leaks whether two users have the same file, which is the same side channel^[15:3].

2 How does LAN sync work for devices on the same network?

Each client broadcasts UDP on port 17500 announcing supported namespaces. Peers with matching namespaces establish HTTPS connections on TCP ports 17599-17609 (the first available), authenticated with per-namespace SSL certificates. Block transfers happen peer-to-peer; metadata always comes from the server^[8:3]^[11:1]. Dropbox caps the number of connections made to any single peer and the number of peers asked for a given block to prevent hotspots^[8:4].

3 How would you add AI-powered search over cloud files?

Index file content server-side (for non-E2E users) using an async pipeline off the change event stream. For E2E users, index on-device only. Dropbox Dash and Google Gemini in Drive take the server-side approach for non-E2E content.

4 What versioning retention policy makes sense?

30-day rolling window for free tier (store all journal entries for 30 days, then GC). Paid tier: 180 days or unlimited. Each version is just a blocklist pointer; storage cost is only the delta of new blocks.

5 How do you handle GDPR right-to-erasure across deduplicated blocks?

Decrement ref_count on user deletion. Only physically delete a block when ref_count reaches zero. The user's metadata (file journal entries, permissions) is hard-deleted immediately. Blocks shared with other users remain.

6 How do you support offline-first mobile clients?

The client maintains a local SQLite journal mirroring the server's file journal. On reconnect, it replays local mutations against the server, handling conflicts via version vectors. Selective sync (pin specific folders) keeps mobile storage bounded.

7 How would you handle large-file streaming (50 GB video) vs chunked sync?

Same protocol, but with parallel block uploads (8 to 16 concurrent PutBlock RPCs) and HTTP/2 multiplexing. The client prioritizes blocks sequentially so a partial upload is still useful. Resume is free: need_blocks on retry skips already-uploaded chunks.

Exercise#

Exercise 1: Quantify wire bytes for a Photoshop edit#

A user has a 2 GB Photoshop file. They edit it, touching 200 scattered locations. Compare wire bytes for fixed 4 MB chunks vs 8 KB CDC chunks.

Hint

Calculate how many chunks each edit pattern invalidates under each scheme. With fixed chunks, each edit that lands in a different 4 MB block invalidates that entire block. With CDC, each edit invalidates only the ~8 KB chunk containing it.

Solution

Fixed 4 MB chunks (Dropbox model):

2 GB / 4 MB = 512 blocks total.
200 scattered edits could touch up to 200 distinct blocks (worst case).
Wire cost: 200 x 4 MB = 800 MB uploaded.
With Broccoli compression (~33% savings^[10:3]): ~530 MB on the wire.

8 KB CDC chunks:

2 GB / 8 KB = ~262,144 chunks total.
200 scattered edits disturb ~200 chunks (CDC boundaries track content, so each edit invalidates 1 to 3 chunks).
Wire cost: 200 x 8 KB = ~1.6 MB uploaded.
This is a ~500x improvement for this workload.

Verdict: For Photoshop-class workloads with scattered edits, CDC with small average chunk size dominates. Fixed 4 MB chunks only make sense when paired with aggressive compression and when the workload is predominantly append-only or whole-file replacement.

Key Takeaways#

Metadata/content split is the architectural keystone. Versioning, permissions, and conflicts live in metadata. Dedup lives in content. Blocks are dumb.
Content-defined chunking is the single most important algorithmic choice. Get it wrong and every edit re-uploads the whole file.
Global block dedup buys enormous storage wins (5 EB with fewer physical bytes than user-visible bytes^[2:12]) but requires careful privacy analysis for the side-channel risk^[15:4].
Conflicts are inevitable. Design for them visibly (dual-copy) rather than pretending they will not happen. LWW is never acceptable for files^[1:6].
"Boring tech wins": Sharded MySQL for the Block Index, centralized Master per cell, immutable blocks. Operational simplicity at exabyte scale^[5:9].

Flashcards#

Q: What are the two layers in a file sync architecture?

A: A metadata layer (sharded SQL, file tree, versioning, permissions, blocklists) and a content layer (immutable content-addressed block store, keyed by SHA-256, globally deduplicated). The content layer has no concept of users or files.

Q: Why does fixed-size chunking fail on byte insertions?

A: Inserting one byte at offset 0 shifts every subsequent block boundary. Every block hash changes, so the sync engine re-uploads the entire file. CDC boundaries track content, not position, so only the affected chunk changes.

Q: What is the need_blocks protocol?

A: The client sends a blocklist (ordered SHA-256 hashes) to the metadata server. The server replies with the subset of hashes it does not already have. The client uploads only those blocks, achieving zero-byte transfer for fully duplicated files.

Q: Why is last-writer-wins forbidden for file sync?

A: Two users editing offline produce concurrent versions. Without a version vector, the server cannot distinguish concurrent from sequential writes and silently discards one user's work. Dual-copy conflict resolution preserves both versions visibly.

Q: How does streaming sync reduce latency?

A: The server accepts not-yet-committed blocklists in a prefetch cache. Downloader clients pull blocks before the formal metadata commit lands, overlapping upload and download. Dropbox measured a 500 MB sync dropping from 383s to 293s.

Q: What is the dedup side-channel attack?

A: If client-side dedup skips upload when the hash already exists on the server, an attacker can probe whether specific files are stored by observing upload timing or bandwidth. Mitigations include server-side-only dedup or randomized thresholds.

Q: How does Dropbox handle a rename without re-uploading?

A: Content-addressable storage means a rename is just a metadata path change. The block hashes are identical, so the client short-circuits. Nucleus uses globally unique stable file IDs preserved across moves.

Q: What does Apple's Advanced Data Protection sacrifice for E2E encryption?

A: Cross-user dedup collapses because identical plaintext encrypted with different per-user keys produces different ciphertext. Storage costs increase, but only the user can decrypt their data.

Q: What is Dropbox Magic Pocket's durability target?

A: 12+ nines (>99.9999999999%) with 99.99% availability. Achieved via 4x replication for hot data, Reed-Solomon erasure coding (6+3) for warm data, and cross-zone XOR-parity for cold data.

Q: How does LAN sync work?

A: Clients broadcast UDP on port 17500 announcing supported namespaces. Peers with matching namespaces transfer blocks peer-to-peer over HTTPS on TCP ports 17599-17609 (the first available), authenticated with per-namespace SSL certificates. Metadata always comes from the server; only opaque block bytes transfer locally.

References#

"Version vector," Wikipedia (based on Parker et al., 1983). https://en.wikipedia.org/wiki/Version_vector ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Facundo Agriel, "Magic Pocket: Dropbox's Exabyte-Scale Blob Storage System," InfoQ, May 15, 2023. https://www.infoq.com/articles/dropbox-magic-pocket-exabyte-storage ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Akhil Gupta, "Scaling to exabytes and beyond," Dropbox Tech Blog, March 14, 2016. https://dropbox.tech/infrastructure/magic-pocket-infrastructure ↩︎ ↩︎ ↩︎
Sujay Jayakar, "Rewriting the heart of our sync engine," Dropbox Tech Blog, March 9, 2020. https://dropbox.tech/infrastructure/rewriting-the-heart-of-our-sync-engine ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
James Cowling, "Inside the Magic Pocket," Dropbox Tech Blog, May 6, 2016. https://dropbox.tech/infrastructure/inside-the-magic-pocket ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Nipunn Koorapati, "Streaming File Synchronization," Dropbox Tech Blog, July 10, 2014. https://dropbox.tech/infrastructure/streaming-file-synchronization ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Nipunn Koorapati, "Streaming File Synchronization," Dropbox Tech Blog, July 10, 2014. https://dropbox.tech/infrastructure/streaming-file-synchronization ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Matt Dee, "Inside LAN Sync," Dropbox Tech Blog, October 13, 2015. https://dropbox.tech/infrastructure/inside-lan-sync ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Preslav Le, "How we optimized Magic Pocket for cold storage," Dropbox Tech Blog, May 6, 2019. https://dropbox.tech/infrastructure/how-we-optimized-magic-pocket-for-cold-storage ↩︎ ↩︎
Rishabh Jain and Daniel Reiter Horn, "Broccoli: Syncing faster by syncing less," Dropbox Tech Blog, August 4, 2020. https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less ↩︎ ↩︎ ↩︎ ↩︎
Dropbox, "How to enable LAN sync," Dropbox Help Center. https://help.dropbox.com/installs-integrations/sync-uploads/lan-sync-overview ↩︎ ↩︎
Athicha Muthitacharoen, Benjie Chen, and David Mazieres, "A Low-bandwidth Network File System," Proc. 18th Symposium on Operating Systems Principles (SOSP), October 2001. https://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf ↩︎ ↩︎ ↩︎
Wen Xia et al., "FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication," USENIX ATC 2016. https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia ↩︎ ↩︎
Hugging Face xet documentation, "Content-Defined Chunking Algorithm." https://huggingface.co/docs/xet/main/en/chunking ↩︎
Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg, "Side Channels in Cloud Services: Deduplication in Cloud Storage," IEEE Security and Privacy, 2010. https://ieeexplore.ieee.org/document/5655241 ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Apple, "iCloud data security overview," updated January 5, 2026. https://support.apple.com/en-us/102651 ↩︎ ↩︎ ↩︎ ↩︎
Andrew Tridgell and Paul Mackerras, "The rsync algorithm," Technical Report TR-CS-96-05, Australian National University, 1996. https://www.samba.org/rsync/tech_report/ ↩︎ ↩︎
Dropbox, "fast_rsync: An optimized implementation of librsync in pure Rust," GitHub README. https://github.com/dropbox/fast_rsync ↩︎ ↩︎
MIT 6.824 Lecture Notes, "Consistency with Timestamps / Version Vectors," 2006. https://pdos.csail.mit.edu/archive/6.824-2006/notes/l10.txt ↩︎
Sujeet Jaiswal, "Dropbox Magic Pocket: Building Exabyte-Scale Blob Storage," 2024. https://sujeet.pro/articles/dropbox-magic-pocket ↩︎
Dropbox, Inc., Form S-1 Registration Statement, SEC Filing, February 23, 2018. https://www.sec.gov/Archives/edgar/data/1467623/000119312518055809/d451946ds1.htm ↩︎
Preslav Le, Tanay Lathia, and Mehant Baid, "Future-proofing our metadata stack with Panda," Dropbox Tech Blog, November 3, 2022. https://dropbox.tech/infrastructure/panda-metadata-stack-petabyte-scale-transactional-key-value-store ↩︎
Dean Hildebrand and Denis Serenyi, "Colossus under the hood: a peek into Google's scalable storage system," Google Cloud Blog, April 20, 2021. https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system ↩︎

Learning Objectives#

Intuition#

Requirements#

Clarifying Questions#

Functional Requirements#

Non-Functional Requirements#

Capacity Estimation#

API and Data Model#

API Design#

Data Model#

High-Level Architecture#

Deep Dives#

Deep dive 1: Block-level deduplication with content-defined chunking#

Deep dive 2: Delta sync with the rsync algorithm#

Deep dive 3: Conflict resolution with version vectors#

Real-World Example#

Dropbox Magic Pocket: from S3 to exabyte-scale in-house storage#

Trade-offs#

Scaling and Failure Modes#

Common Pitfalls#

Follow-up Questions#

Exercise#

Exercise 1: Quantify wire bytes for a Photoshop edit#

Key Takeaways#

Further Reading#

Flashcards#

References#