$systems_glossary

Systems Glossary

A guide to the concepts, patterns, and trade-offs that power scalable systems.

42 terms

Apache Kafka

Also known as: event-streaming, message-broker, distributed-log
infrastructure
A distributed append-only log that turns database writes into an event stream you can rewind, replay, and process in real-time.

Kafka is a distributed event streaming platform. Instead of sending data directly to consumers, you write events to Kafka topics (append-only logs). Consumers read from these logs at their own pace. Events are durably stored (days, weeks, forever), so consumers can rewind and replay.

Key concepts: (1) Topics: named streams (e.g., "ride-requests", "driver-status-changes"). (2) Partitions: topics split into parallel logs for scale. (3) Consumer groups: multiple consumers share partitions for load balancing. (4) Retention: events persist for configured time, enabling replay.

Why it's powerful: decouples producers from consumers. Your app writes "ride requested" to Kafka. Multiple consumers can process it—one for matching, one for analytics, one for fraud detection—without coordinating. If analytics crashes, it resumes from its last checkpoint.

[→] WHEN TO USE
When you need to broadcast events to multiple systems, process events asynchronously, or maintain an audit log of everything that happened.
[!] TRADEOFFS
Operational complexity—managing a Kafka cluster requires expertise (ZooKeeper, replication, partition rebalancing). At-least-once delivery (duplicates possible). But: handles millions of events/sec, durability, and replay for free.
Appears in:

At-Least-Once Delivery

Also known as: duplicate delivery, guaranteed delivery
reliability
Messages will arrive, maybe twice—receivers must handle duplicates gracefully.

At-least-once delivery guarantees every message is delivered to the receiver at least once, but possibly more than once. If the sender isn't sure the message arrived (timeout, network blip), it retries—potentially sending duplicates.

This is simpler and more reliable than exactly-once (which is effectively impossible in distributed systems). The trade-off: receivers must be idempotent—processing a message twice shouldn't corrupt state.

Example: "charge user $5." If this runs twice, you double-charge. Fix: use idempotency keys (transaction IDs) to detect and ignore duplicates.

[→] WHEN TO USE
Default delivery guarantee for most message queues and distributed systems. Combine with idempotent receivers.
[!] TRADEOFFS
Receivers must handle duplicates (deduplication logic, idempotency). But: simpler than exactly-once, and more reliable than at-most-once.
Appears in:

Backpressure

Also known as: flow-control, load-shedding
reliability
When downstream is drowning, slow upstream—don't accept work you can't finish.

Backpressure is the ability of a system to push back on producers when consumers are overwhelmed. Think of it as saying "slow down" instead of accepting unbounded work and crashing.

Example: API accepts 10K requests/sec, but database can only handle 1K writes/sec. Without backpressure, requests queue up, memory fills, system crashes. With backpressure: return 429 Too Many Requests or 503 Service Unavailable when queue depth exceeds limits. Clients back off and retry.

This keeps the system stable under overload. Better to reject some requests cleanly than accept everything and fail catastrophically.

[→] WHEN TO USE
Any system with bounded resources (queue depth, worker capacity, database connections). Critical for graceful degradation under load.
[!] TRADEOFFS
Requires clients that respect rate limits and retry with backoff. May reject legitimate requests during spikes. But: prevents cascading failures.
Appears in:

Cache Invalidation

caching
When data changes, delete the stale cache entry—next read refills with fresh data.

Cache invalidation is the art of removing stale cached data when the underlying data changes. The classic approach: on write, DELETE the cache key. Next read will be a cache miss and refill from the database with fresh data.

This is simple and explicit but requires discipline: every write path must invalidate relevant keys. Miss one, and users see stale data indefinitely (until TTL expires).

Alternative: use short TTLs and tolerate bounded staleness (cache expires after 30s, data is at most 30s stale). Or: write-through (update cache and database together). Each has trade-offs.

[→] WHEN TO USE
Any cache-aside setup where data can change. Critical for user-facing edits (profile updates, post edits).
[!] TRADEOFFS
Requires invalidation on every write path (easy to forget). Slightly higher miss rate (refills after invalidation). But: keeps cache honest without tight coupling.

Cache Stampede

Also known as: thundering-herd, dogpile
reliability
When a popular cache entry expires, thousands of requests notice simultaneously and stampede the database like concert-goers rushing the stage.

Picture this: a celebrity's profile is cached with a 1-hour TTL. At exactly 3:00:00 PM, the cache entry expires. At 3:00:01, 50,000 fans refresh the page. All 50,000 requests miss the cache and sprint to the database at once.

The database, which was calmly serving cached reads, suddenly faces 50,000 identical queries in one second. It chokes. Latency spikes to 5+ seconds. More requests time out and retry, making it worse.

The fix: coordination. Use cache locking (only the first requester fetches; others wait and reuse the result) or probabilistic early refresh (start refreshing the cache *before* it expires, so the herd never gathers). Both approaches ensure only one or a few requests hit the database, no matter how popular the key.

[→] WHEN TO USE
Any time you have popular cached data with synchronized expirations. Especially important for viral content or celebrity accounts.
[!] TRADEOFFS
Locking adds a small wait for non-first requests. Early refresh increases background work. But both beat the alternative: your database melting.

Cache-Aside Pattern

Also known as: lazy-loading, on-demand caching
caching
Like asking a librarian before the stacks—check the cache first; on miss, fetch from the source and leave a copy for the next reader.

Cache-aside (also called "lazy-loading") is the simplest caching strategy: your application checks the cache before querying the database. On a hit, serve the cached value. On a miss, fetch from the database, write it to the cache, and return it.

The database remains the source of truth. The cache is just a convenient memory. When data changes, you explicitly delete (invalidate) the cached key so the next read refills with fresh data. This keeps the cache honest without tightly coupling it to every write.

It's coordination-free and works beautifully for read-heavy workloads where misses are rare. The main downside: the first reader after invalidation pays the "refill tax"—a slightly slower request while the cache warms up.

[→] WHEN TO USE
When reads vastly outnumber writes, and you want simple, explicit control over what gets cached.
[!] TRADEOFFS
Slightly higher latency immediately after invalidations. Requires discipline to invalidate on writes. But: battle-tested, easy to reason about, and doesn't couple cache to your write path.

Chunking / Multipart Upload

Also known as: resumable-upload, multipart-upload
data
Split large files into bite-sized pieces—upload fails at 98%? Resume from chunk 245, not byte zero.

Chunking divides large files into fixed-size blocks (chunks) that can be uploaded independently. Each chunk gets an ID and position. The server tracks which chunks arrived. If the connection drops, the client resumes from the last successful chunk, not the beginning.

Implementation: client splits a 5GB file into 10MB chunks (500 chunks). POST /upload/session → get session_id. Then POST /upload/session/123/chunk/0, chunk/1, etc. Server stores chunks in temporary storage. When all chunks arrive, concatenate them into the final file.

This is how Dropbox, Google Drive, and S3's multipart uploads work. Without chunking, a 5GB upload that fails at 99% must restart from scratch—brutal on mobile networks.

[→] WHEN TO USE
For any file upload > 100MB, or when users have unreliable networks (mobile, remote). Essential for video uploads, backups, and large media files.
[!] TRADEOFFS
Adds complexity—must track upload sessions, handle out-of-order chunks, clean up abandoned uploads. More HTTP requests (overhead). But: dramatically improves user experience for large files.
Appears in:

Conflict Resolution

Also known as: merge-strategy, crdt
realtime
Two users edit the same file offline—whose changes win? Last-write-wins? Manual merge? Or operational transforms?

Conflict resolution strategies handle concurrent edits to the same data:

1. **Last-Write-Wins (LWW)**: Simplest. Most recent timestamp wins. Loses data silently—not great for user files. 2. **Detect and prompt**: Dropbox approach. Detect conflicts (using version vectors), save both versions as "file.txt" and "file (conflicted copy).txt", let user merge manually. 3. **Operational Transforms (OT)**: Google Docs approach. Transform concurrent operations so they converge to the same result. Complex but seamless UX. 4. **CRDTs**: Conflict-free replicated data types. Data structures designed to merge automatically without conflicts.

Most file sync systems use strategy #2: detect conflicts, preserve both versions, notify user.

[→] WHEN TO USE
Whenever multiple clients can edit the same data offline or concurrently. Essential for collaborative editing, file sync, and distributed databases.
[!] TRADEOFFS
LWW is simple but loses data. Manual merge disrupts UX but is safe. OT/CRDTs provide seamless experience but are complex to implement and debug.
Appears in:

Consistent Hashing

Also known as: hash ring, distributed hashing
scaling
A clever circle trick: when you add or remove a server, only nearby keys move—not the entire universe.

Traditional hashing (key % N) falls apart when N (the number of servers) changes: adding one server rehashes every key, causing a full data migration. Consistent hashing fixes this by arranging servers on a virtual ring (0 to 2^32).

Each key hashes to a position on the ring and is assigned to the next clockwise server. When you add a server, only keys between the new server and the previous one move—typically 1/N of your data. Remove a server? Only its keys move to the next server.

Virtual nodes (each physical server owns 100-200 ring positions) smooth out imbalances and hotspots. It's the foundation of systems like DynamoDB, Cassandra, and distributed caches.

[→] WHEN TO USE
When building distributed systems (caches, databases, storage) that need to add/remove nodes dynamically without full reshuffles.
[!] TRADEOFFS
Slightly more complex than modulo hashing. Membership changes still require some data movement. Virtual nodes add overhead. But: minimal disruption during scaling.

Content Delivery Network (CDN)

Also known as: edge-caching, edge nodes
infrastructure
A global army of caching servers near your users—Tokyo user hits Tokyo server, not Virginia, and physics finally smiles.

A CDN is a geographically distributed network of servers (edge nodes) that cache your content close to users. When a user in Tokyo requests an image, the CDN serves it from a Tokyo edge node (50ms away) instead of your origin in Virginia (250ms away).

The first request from a region is a cache miss—the edge fetches from your origin, caches it, and serves it. Subsequent requests hit the cache. Popular files (viral images, shared videos) are served locally without touching your origin.

CDNs shine for immutable, publicly-shared content (images, videos, CSS, JS). They're less useful for dynamic, personalized, or frequently-changing data. Set long TTLs (cache lifetimes) and use versioned URLs for invalidation.

[→] WHEN TO USE
For static assets (images, videos, scripts) and any immutable content served to geographically distributed users.
[!] TRADEOFFS
Adds cost (CDN fees). Cache invalidation requires versioning or purge APIs. First-request-per-region still hits origin. But: massive latency wins and origin bandwidth savings.

Content-Addressed Storage

Also known as: content-addressable, hash-based storage
data
Store files by their content hash, not arbitrary names—same bytes get the same address, deduplication for free.

Content-addressed storage uses the hash of file content (SHA-256) as the storage key. Upload "cat.jpg" → hash it → store at blobs/a3f5d9.... Upload the identical "cat.jpg" again → same hash → same key → no duplicate storage.

This enables automatic deduplication: 500 users upload the same 50MB video, you store it once. It also makes objects immutable—any change produces a different hash, so you can cache aggressively.

The catch: you can't query by filename (use a separate metadata DB to map names → hashes). Hash collisions are theoretically possible but astronomically unlikely with SHA-256. And you need reference counting to know when to delete (don't delete a blob if any file still references it).

[→] WHEN TO USE
For deduplicating storage (Dropbox, Git), immutable assets (versioned builds), or when content integrity matters (verify hash on download).
[!] TRADEOFFS
Requires metadata DB for name→hash mapping. Reference counting adds complexity. Can't edit objects (immutable by design). But: automatic deduplication and integrity verification.
Appears in:

Cursor-Based Pagination

Also known as: seek pagination, keyset pagination
data
Skip the row-counting marathon—use a WHERE clause to jump directly to page N, no matter how deep.

Cursor pagination uses a WHERE clause to seek to the next page instead of OFFSET. Store the last row's sort key (e.g., created_at timestamp) as a cursor. The next query: WHERE created_at < cursor ORDER BY created_at DESC LIMIT 50.

With an index on the sort column, the database jumps directly to the cursor position in O(log n) time—no scanning. Whether you're on page 1 or page 1000, query time stays constant (15-25ms).

Compare to OFFSET 10000 LIMIT 50: the database must scan and discard 10,000 rows before returning 50. At deep pages, this becomes catastrophically slow. Cursor pagination sidesteps the problem entirely.

[→] WHEN TO USE
For infinite scroll, deep pagination, or any time users navigate beyond page 5-10. Especially critical for large datasets (millions of rows).
[!] TRADEOFFS
Can't jump to arbitrary pages ("show me page 50"). Breaks if underlying data changes mid-scroll. Requires index on sort column. But: constant-time performance at any depth.
Appears in:

Database Indexing

Also known as: b-tree, secondary index
data
A sorted shortcut that turns "read every row" into "jump straight there"—like a book's index, but for WHERE clauses.

Without an index, the database must scan every row in a table to find matches—fine for 100 rows, catastrophic for 10 million. An index is a separate data structure (typically a B-tree) that keeps a sorted copy of one or more columns, with pointers back to the full rows.

When you query WHERE short_code = 'abc123', the database uses the index to jump directly to matching rows in O(log n) time instead of scanning everything in O(n) time. It's the difference between 5ms and 5 seconds.

The catch: indexes consume disk space and slow down writes (every INSERT/UPDATE must also update the index). You can't index everything—pick columns you frequently filter or sort by.

[→] WHEN TO USE
Index columns used in WHERE, JOIN, ORDER BY clauses—especially on frequently queried, low-cardinality or unique columns.
[!] TRADEOFFS
Indexes speed reads but slow writes and consume disk space. Over-indexing wastes resources; under-indexing kills performance. Monitor query plans and index usage.

Delta Sync

Also known as: rsync, block-level sync, incremental sync
data
Upload only what changed, not the whole file—edit 3 lines in 10MB, sync 3KB.

Delta sync detects which parts of a file changed and transfers only those parts. Split the file into fixed-size blocks (4KB), hash each block (SHA-256), compare hashes with the server's copy. Upload only blocks whose hashes differ.

This turns "3-line edit in 10MB file = 10MB upload" into "3-line edit = 8KB upload" (only the changed block). It's how Dropbox, rsync, and Git efficiently sync large files over slow networks.

The algorithm: client hashes local blocks, requests server's block hashes, compares, uploads mismatched blocks. Server patches its copy by replacing changed blocks. The catch: CPU cost of hashing and complexity of handling insertions/deletions (rolling hashes help).

[→] WHEN TO USE
For syncing large files with incremental edits: code repos, documents, disk images, databases.
[!] TRADEOFFS
CPU cost for hashing. Complex implementation (especially rolling hashes for shifted content). But: 90%+ bandwidth savings on incremental edits.
Appears in:

Ephemeral State

Also known as: transient data, temporary state
realtime
Data that lives for seconds, not years—typing indicators, cursors, presence—gone when you refresh.

Ephemeral state is data with a short lifespan (seconds to minutes) that doesn't need durability. Examples: "Alice is typing," "Bob is online," "cursor at line 42." It's relevant now but worthless in 10 seconds.

Store it in memory (Redis, in-process cache) with short TTLs, not databases. Writing every keystroke to disk is wasteful—you're persisting data that will never be queried again. Ephemeral state is disposable: if Redis restarts and loses it, that's fine.

This is the opposite of durable state (user profiles, messages, posts) which must survive restarts and be queryable forever.

[→] WHEN TO USE
For real-time features with second-level relevance: typing indicators, presence, live cursors, temporary locks.
[!] TRADEOFFS
Data is lost on restart (by design). Requires in-memory storage. Can't query historically. But: blazing fast, zero disk I/O, and perfectly fits the use case.
Related concepts:
Appears in:

Eventual Consistency

Also known as: async-replication, convergent-consistency
distributed-systems
Writes propagate lazily—reads might see stale data for a moment, but everyone converges eventually.

Eventual consistency means: if no new updates are made, all replicas will eventually converge to the same value. Unlike strong consistency (all replicas agree immediately), eventual consistency allows temporary divergence.

Example: You upload a file to Dropbox. It immediately appears on your laptop. Your phone sync happens 10 seconds later due to network delay. For 10 seconds, your phone has stale state (the file doesn't exist yet). Eventually, they converge.

This is a fundamental CAP theorem tradeoff: in a network partition, you can have availability (accept writes) OR consistency (reject writes until partition heals). Eventual consistency chooses availability.

[→] WHEN TO USE
When availability matters more than immediate consistency—social feeds, file sync, DNS, shopping carts. When low latency is critical and stale reads are acceptable.
[!] TRADEOFFS
Application must handle stale reads and out-of-order updates. Conflict resolution becomes critical. But: high availability, low latency, partition tolerance.
Appears in:

Fanout-on-Read

Also known as: pull model, gather scatter
scaling
Compute feeds on-demand when users request them—slow writes, flexible ranking, but reads do the JOIN dance.

When Bob loads his feed, fanout-on-read queries: "fetch recent posts from everyone Bob follows, rank them, return top 30." The work happens at read-time, not write-time.

This eliminates write amplification—Alice's post to 20 million followers is just one INSERT, not 20 million. It also allows algorithmic ranking: you can compute personalized relevance scores on-the-fly using real-time signals.

The catch: reads are slower and more expensive. Every feed load requires JOINing posts and follows, sorting by ranking score, and limiting results. At scale, you mitigate this with caching (store per-user candidate sets) and precomputed ranking scores.

[→] WHEN TO USE
When you need algorithmic ranking, follower counts are huge, or freshness requirements allow minute-level delays.
[!] TRADEOFFS
Reads are slower and more complex. Requires robust caching and score precomputation at scale. But: no write amplification, and ranking is flexible.
Appears in:

Fanout-on-Write

Also known as: inbox-pattern, push model
scaling
Pay once at write-time to precompute feeds for all followers—fast reads, but publishing becomes a broadcast ceremony.

When Alice posts a photo, fanout-on-write immediately writes that post to the inbox/feed of every one of her followers. Think of it as "proactive delivery"—you do the heavy lifting at publish-time so reads become trivial lookups.

This is amazing for typical users who follow 200-500 accounts: their feed is precomputed and loading it is just a simple index query. Reads are fast, predictable, and don't require expensive JOINs.

The catch: write amplification. When a celebrity with 20 million followers posts, you're writing 20 million rows. At scale, this can saturate your database and delay posts for hours. That's why real systems often use a hybrid: fanout-on-write for regular users, fanout-on-read for accounts with huge followings.

[→] WHEN TO USE
When reads vastly outnumber writes, follower counts are modest (<10K), and you want guaranteed fast feed loads.
[!] TRADEOFFS
Write cost scales with follower count. Pathological for mega-influencers. Changing feed algorithms requires backfills. But: reads are blazing fast and predictable.
Appears in:

Geo-Distribution

Also known as: multi-region deployment, geographic distribution
scaling
Deploy your app stack in multiple continents—Tokyo users hit Tokyo servers, escaping the speed-of-light tax.

Geo-distribution means running your application in multiple geographic regions (US East, Europe, Asia) with regional databases/caches/servers. Users are routed (via DNS or global load balancer) to the nearest region.

This slashes latency by eliminating intercontinental round-trips. A Tokyo user hits Tokyo servers (~15ms) instead of Virginia (~250ms). It also provides disaster recovery—if the US region fails, redirect traffic to Europe.

The complexity: data consistency. Writes to Tokyo must replicate to other regions. Do you accept eventual consistency (region lag)? Do you have a single global primary for writes (slower)? Do you partition by geography (users can't easily move)? These are hard trade-offs.

[→] WHEN TO USE
When you have global users and latency matters. Typically beyond 100K users or when regional latency exceeds 200ms.
[!] TRADEOFFS
Operationally complex: multi-region databases, replication lag, cross-region data consistency. But: massive latency wins and built-in disaster recovery.

Geohash

Also known as: geohash-encoding, grid-based-indexing
geospatial
Turn latitude/longitude into a short string that groups nearby locations—same prefix = same neighborhood.

Geohash is a geocoding system that encodes latitude and longitude into a short alphanumeric string. The key insight: nearby locations share a common prefix. For example, "dr5ru" and "dr5rv" are neighbors, while "dr5ru" and "dqcjq" are far apart.

How it works: Geohash recursively divides the world into a grid. Each character in the hash represents a finer subdivision. "d" is a large region, "dr" is smaller, "dr5" smaller still. A 6-character geohash represents a cell about 1.2km × 600m.

This makes spatial queries simple: to find nearby drivers, compute the rider's geohash, then query for drivers whose geohash starts with the same prefix. In SQL: WHERE driver_geohash LIKE 'dr5r%'. The database index can efficiently find all matching rows.

[→] WHEN TO USE
For basic spatial indexing when you need to find "nearby" items (drivers, restaurants, events) and your database doesn't have native geospatial support.
[!] TRADEOFFS
Edge cases at cell boundaries—two locations 10m apart can have different prefixes if they cross a boundary. Querying neighboring cells helps but isn't perfect. For production, consider S2 or H3 for boundary-free hierarchical indexing.
Appears in:

Horizontal Scaling

Also known as: scale-out, adding nodes
scaling
Add more machines instead of bigger machines—10 small servers, not 1 giant one.

Horizontal scaling means adding more servers to distribute load, rather than making one server bigger (vertical scaling). Traffic doubles? Add another server. It doubles again? Add two more.

This provides near-infinite scalability—you can keep adding servers—and resilience (one server fails, others keep running). It's how hyperscalers (Google, Facebook) handle billions of users.

The catch: you need stateless servers or shared state (Redis, database). Servers must be interchangeable—any server can handle any request. This rules out in-process caches and requires careful session management (sticky sessions or external session store).

[→] WHEN TO USE
When vertical scaling hits limits (biggest server is maxed out) or you need resilience (multiple servers, not one).
[!] TRADEOFFS
Requires stateless design or shared state. Load balancing complexity. More servers = more operational overhead. But: nearly infinite scalability and resilience.

Hot Key Problem

Also known as: hotspot, celebrity problem
reliability
One key gets 99% of traffic while others nap—that shard melts, scaling does nothing.

A hot key is a single cache/database key that receives disproportionate traffic. A celebrity posts their profile link → 500K requests/sec for key "profile:celebrity123" → the one shard that owns it maxes out while others idle.

Normal sharding doesn't help: adding 10 shards distributes load evenly across *keys*, but one hot key still lands on one shard. That shard becomes the bottleneck.

Fixes: replicate the hot key to multiple shards (requests spread across replicas), use local caching (app-level cache before hitting shared cache), or apply request coalescing (first request fetches, others wait and reuse result).

[→] WHEN TO USE
Monitor for hot keys in any sharded system. Especially important for social apps (celebrity profiles), viral content, and shared resources.
[!] TRADEOFFS
Hot key replication adds complexity (managing replica list, keeping replicas in sync). But: spreads load, prevents single-shard meltdown.

Idempotency

Also known as: idempotent operation, safe retry
reliability
Do it once, do it twice, do it ten times—the outcome is the same, so retries are safe.

An idempotent operation produces the same result no matter how many times you execute it. DELETE key "abc" is idempotent: delete it once, delete it again, the key is still gone. SET key="abc" value="123" is idempotent: set it ten times, value is still "123."

Idempotency makes retries safe. If a request times out, did it succeed or fail? If the operation is idempotent, just retry—worst case, you do redundant work but don't corrupt state.

Non-idempotent: INCREMENT counter. Retry it, and you double-count. The fix: use unique request IDs (idempotency keys) so the server can detect and ignore duplicates.

[→] WHEN TO USE
Design all APIs and operations to be idempotent when possible. Critical for distributed systems, message queues, and retryable operations.
[!] TRADEOFFS
Requires careful API design. Sometimes needs deduplication tracking (idempotency keys, unique IDs). But: makes systems resilient to retries and failures.

LRU (Least Recently Used)

Also known as: least-recently-used, recency-based eviction
caching
Evict the item you haven't touched in the longest time—like clearing out the back of your fridge.

LRU is a cache eviction policy: when the cache is full and you need space, evict the item that was accessed longest ago. The intuition: recently-used items are more likely to be used again soon (temporal locality).

Implementation: maintain a doubly-linked list ordered by access time. On access, move the item to the front. On eviction, remove from the back. With a hash map for O(1) lookup, both operations are O(1).

LRU is simple and effective for many workloads. The problem: it's purely recency-based. A one-time scan of a million cold items can evict your entire working set of hot items. That's where admission control (TinyLFU) or segmented LRU helps.

[→] WHEN TO USE
Default eviction policy for most caches. Works well when recent access predicts future access.
[!] TRADEOFFS
Vulnerable to scans polluting the cache (scan poisoning). No frequency awareness—one-off items evict regulars. But: simple, widely understood, and works well for typical workloads.

Message Queue

Also known as: task queue, job queue, work queue
infrastructure
A to-do list for your servers—enqueue work, workers pull tasks, everything gets done eventually.

A message queue decouples producers (who create work) from consumers (who do work). Producers enqueue messages/tasks; workers pull and process them. If workers are busy, tasks wait in the queue. Add more workers to process faster.

This enables asynchronous processing: user uploads photo → enqueue "resize image" task → return success immediately → worker processes task in background. Users don't wait for slow work.

Queues also provide buffering and backpressure: traffic spikes fill the queue instead of overwhelming workers. Workers process at their own pace. Common systems: RabbitMQ, AWS SQS, Redis lists.

[→] WHEN TO USE
For background jobs, async processing, batch work, or any time you want to decouple request acceptance from work completion.
[!] TRADEOFFS
Adds complexity: need workers, retry logic, dead-letter queues. Work happens asynchronously (not instant). But: decouples systems, handles bursts, scales independently.

Object Storage

Also known as: blob-store, S3-style storage
infrastructure
Cheap, durable buckets for big binary blobs—images, videos, backups—where a URL is all you need.

Object storage (AWS S3, Google Cloud Storage, Azure Blob) stores unstructured data as objects: binary blobs identified by keys (like file paths). It's cheap (~$0.02/GB/month), durable (99.999999999% durability via replication), and scales to petabytes.

Unlike databases, object storage doesn't do queries, indexes, or transactions. It's simple: PUT an object, GET it by key, DELETE it. Objects are immutable—editing means uploading a new version. It's perfect for photos, videos, backups, logs—anything large and static.

The catch: eventual consistency (writes might not be immediately visible), higher latency than local disk (~50-100ms), and no metadata queries (you can't "find all JPEGs uploaded this week" without external indexing).

[→] WHEN TO USE
For large binary files (images, videos, backups) that don't need querying.Anything immutable and > 1MB.
[!] TRADEOFFS
Higher latency than local disk. Eventual consistency. No metadata queries. But: incredibly cheap, durable, and scales infinitely.

Offset-Based Pagination

Also known as: page-number pagination
data
Count rows from the start, skip N, return M—simple for page 1, painful for page 1000.

Offset pagination uses OFFSET and LIMIT: OFFSET 100 LIMIT 20 means "skip the first 100 rows, return the next 20." It's intuitive and allows jumping to arbitrary pages (page 5 = OFFSET 80 LIMIT 20).

The problem: the database must scan all skipped rows even though it discards them. OFFSET 10000 reads 10,000 rows from disk, throws them away, and returns the next 50. Query time increases linearly with offset—page 1 is fast, page 100 is slow, page 1000 times out.

It's fine for small datasets or shallow pagination (first 5-10 pages). Beyond that, cursor-based pagination is far superior.

[→] WHEN TO USE
For small datasets (<10K rows), shallow pagination only, or when you need "jump to page N" UX.
[!] TRADEOFFS
Performance degrades linearly with offset depth. Wasteful for deep pagination. But: simple, familiar UX (page numbers), and easy to implement.
Related concepts:
Appears in:

OLTP vs OLAP

Also known as: transactional vs analytical, operational vs warehouse
data
OLTP: fast needle-finds for users. OLAP: slow scans for analysts. Don't make them share a database.

OLTP (Online Transaction Processing) handles user-facing queries: "get user by ID," "insert message," "update profile." Queries are fast (<10ms), touch few rows, and run constantly. Databases are row-oriented and optimized for indexes and point lookups.

OLAP (Online Analytical Processing) handles reporting and analytics: "total revenue by region this month," "trending hashtags." Queries are slow (seconds to minutes), scan millions of rows, and aggregate data. Databases are column-oriented (data warehouses like BigQuery, Redshift) and optimized for scans.

Running OLAP queries on OLTP databases starves transactional traffic. Solution: replicate OLTP data to a separate OLAP warehouse (via ETL pipelines). Analysts query the warehouse; users query the OLTP database.

[→] WHEN TO USE
Separate OLTP (production DB) from OLAP (warehouse) once analytics queries start degrading user-facing performance.
[!] TRADEOFFS
Adds ETL complexity and another system (warehouse). Data in warehouse lags behind production (minutes to hours). But: protects production, enables complex analytics.
Appears in:

Optimistic Locking

Also known as: compare-and-swap, version-based concurrency
data
Assume nobody else is editing—check the version number when you save; if it changed, someone beat you to it.

Optimistic locking is the "forgiveness over permission" approach to concurrency control. Instead of acquiring exclusive locks that block other writers, you proceed hopefully and check for conflicts only when committing.

Here's how it works: each record has a version number. When Alice downloads file version 5, edits it offline, and uploads later, she says "I'm updating version 5 to version 6." The server checks: "Is the current version still 5?" If yes, accept the update and increment to version 6. If no—meaning Bob already updated it to version 6 while Alice was offline—reject with HTTP 409 Conflict.

This is perfect for offline-first apps and collaborative editing where conflicts are rare but data loss is unacceptable. Users work independently most of the time; when conflicts happen, the app can show both versions and let the user decide how to merge or rename.

[→] WHEN TO USE
When concurrent edits are uncommon but catastrophic data loss must be prevented. Offline-first apps, collaborative docs, eventual-consistency scenarios.
[!] TRADEOFFS
Conflicts require user intervention (UX complexity). High-contention workloads experience frequent retries. But: no locks means no blocking, great for distributed/offline systems.
Appears in:

Power of Two Choices

Also known as: load-balancing, random-two-choice, least-loaded-of-two
scaling
Pick two servers at random, send the request to whichever looks less busy—shockingly effective with zero coordination.

The Power of Two Choices is a simple but powerful load-balancing algorithm: when you need to assign a request to a server, randomly pick two candidate servers, check which one has fewer active requests (or lower load), and send the request there.

Remarkably, this is almost as good as always picking the globally least-loaded server, but without any global coordination. Picking one server at random (pure random) is terrible—load imbalance can be large. Picking the best of two random servers dramatically evens out the load with minimal overhead.

The math: with random assignment, max load is O(log n / log log n). With power-of-two, max load drops to O(log log n)—an exponential improvement. In practice, two random choices are nearly as good as scanning all servers.

[→] WHEN TO USE
For distributed load balancing when you have many servers and want to avoid centralized coordination. Great for hot-key replication, consistent hashing rebalancing, or any "pick a server" decision.
[!] TRADEOFFS
Requires each server to expose load metrics (active requests, CPU, queue depth). Two lookups per request (vs one for pure random). But: excellent load distribution with no global state.

Pub/Sub (Publish-Subscribe)

Also known as: message broker, topic-based messaging
scaling
A message bulletin board—publishers post to topics, subscribers listen, and nobody needs to know who's on the other end.

Pub/sub decouples message senders (publishers) from receivers (subscribers). Publishers send messages to named topics/channels. Subscribers register interest in topics and receive all messages published there.

In a chat app: when Alice sends a message to group #42, her server publishes to Redis channel "group:42". All servers subscribed to that channel receive the message and deliver to their connected clients. No server needs to know where Bob's WebSocket lives.

This enables horizontal scaling (add servers, subscribe them to channels) and fanout (one publish → many subscribers). The catch: pub/sub is typically fire-and-forget—if a subscriber is offline during publish, it misses the message (unless you add persistent queues).

[→] WHEN TO USE
For real-time broadcasting across multiple servers: chat groups, live notifications, typing indicators, presence updates.
[!] TRADEOFFS
No built-in persistence (missed messages are lost). Requires message broker (Redis, Kafka). Ordering guarantees vary. But: simple, scalable, decouples publishers from subscribers.

Read Replicas

Also known as: asynchronous-replication, follower nodes
scaling
Clone your database for reads-only duty—writes go to the primary, reads spread across replicas, and nobody waits in line.

Read replicas are copies of your primary database that receive writes asynchronously (shortly after they happen on the primary). Clients send all writes to the primary, but reads can go to any replica.

This multiplies your read capacity horizontally: 3 replicas = 3× read throughput. It also provides basic disaster recovery—if the primary fails, promote a replica to become the new primary.

The trade-off is replication lag: replicas might be a few milliseconds (or seconds, if network hiccups) behind the primary. Reads from replicas can return slightly stale data. For most apps (news feeds, product catalogs), this is fine. For others (bank balances), you might need to read from the primary or use synchronous replication.

[→] WHEN TO USE
When reads vastly outnumber writes (90%+ read traffic) and your workload tolerates eventual consistency (second-level staleness).
[!] TRADEOFFS
Replication lag means slightly stale reads. Failover requires promotion logic. Replicas consume resources. But: easy horizontal read scaling and built-in backup.

S2 Geometry

Also known as: s2-cells, hierarchical-spatial-index, google-s2
geospatial
A spherical geometry library that divides Earth into hierarchical cells—no boundary discontinuities, just smooth spatial indexing.

S2 is a geospatial indexing library from Google that represents Earth as a sphere divided into cells at 30 hierarchical levels. Unlike geohash (flat grid with hard boundaries), S2 cells have no discontinuities—cells at boundaries smoothly overlap.

Key features: (1) Hierarchical: level 0 = 6 large cells covering Earth, level 30 = 1cm² cells. (2) Covering algorithm: express any region (circle, polygon) as a union of S2 cells at mixed levels. (3) Fast containment checks: "Is point X in region Y?" reduces to "Is cell ID in covering set?"

For ride-hailing: compute a covering of a 2km radius around the rider using cells at levels 13-15. Query drivers: WHERE s2_cell = ANY(covering_cells). This finds nearby drivers without boundary misses, using standard database indexes (B-tree or GIN).

[→] WHEN TO USE
For production geospatial queries where boundary edge cases matter. When you need multi-level precision (coarse cells for sparse areas, fine cells for dense cities).
[!] TRADEOFFS
Steeper learning curve than geohash. Requires S2 library or extension. Cell covering algorithm has tuning parameters (min/max level, max cells). But: eliminates boundary issues and scales beautifully.
Appears in:

Segmented LRU (SLRU)

Also known as: two-tier lru, probationary-protected
caching
Split your cache into probationary (for newcomers) and protected (for proven regulars)—tourists stay in the cheap seats.

Segmented LRU (SLRU) divides the cache into two segments: a small probationary queue (e.g., 20% of space) for new entries, and a large protected queue (80%) for items that have been accessed multiple times.

How it works: new items enter probationary. If accessed again while still in probationary, promote to protected. Evict from probationary first. Protected items only get evicted if both queues are full and the protected item is LRU.

This prevents one-time scans from polluting the cache. A scan of a million cold items fills probationary, gets evicted before touching protected. Meanwhile, your working set stays safe in protected. It's like having a bouncer who makes tourists wait outside while regulars get VIP access.

[→] WHEN TO USE
Pair with TinyLFU for robust scan resistance. Use when your cache serves mixed workloads (hot keys + occasional scans).
[!] TRADEOFFS
Requires tuning the split ratio (20/80? 10/90?). Adds implementation complexity. New popular items take two accesses to reach protected (slight latency). But: dramatically improves hit rate under scans.

Sharding

Also known as: horizontal partitioning, data partitioning
scaling
Split your data across multiple databases so no single box holds everything—users A-M on server 1, N-Z on server 2.

Sharding divides your dataset across multiple independent databases (shards). Each shard holds a subset of rows, typically partitioned by a shard key (user ID, geographic region, hash of primary key).

Unlike read replicas (which copy all data), sharding splits data so each shard stores only part of the whole. This scales both reads and writes—writes to user 'alice' hit shard 1, writes to 'zara' hit shard 2.

The complexity: queries that span shards (analytics, JOINs across users) become expensive or impossible. Rebalancing when adding shards requires migrating data. Choose your shard key carefully—bad keys create hotspots where one shard gets 90% of traffic.

[→] WHEN TO USE
When a single database can't handle your write volume or data size, even with vertical scaling. Typically beyond 1-5TB or 10K+ writes/sec.
[!] TRADEOFFS
Operationally complex: no cross-shard transactions, JOINs become app-level logic, rebalancing is painful. But: unlimited horizontal scale for writes and storage.

State Machine

Also known as: finite-state-machine, state-transition
architecture
Enforce valid transitions—a trip can't go from REQUESTED to COMPLETED without MATCHED in between.

A state machine defines a set of states and valid transitions between them. For a ride: REQUESTED → MATCHED → IN_PROGRESS → COMPLETED. Invalid transitions (REQUESTED → COMPLETED) are rejected.

Why this matters: prevents bugs from race conditions. If a rider cancels while a driver accepts, the state machine enforces rules: "Can't transition to CANCELLED from MATCHED unless time < 5min" or "Can't match if already CANCELLED." These rules are encoded in the UPDATE query's WHERE clause.

Combined with optimistic locking (version numbers), state machines make concurrent updates safe: UPDATE trips SET status='MATCHED', version=version+1 WHERE id=? AND status='REQUESTED' AND version=?. If status isn't REQUESTED, the update fails—someone else already changed it.

[→] WHEN TO USE
Whenever your domain has clear states and rules about transitions—orders, trips, bookings, workflows. Essential for preventing invalid state from race conditions.
[!] TRADEOFFS
Requires thinking through all valid transitions upfront. Complex state machines can be hard to visualize. But: prevents entire classes of bugs and makes your system's behavior predictable.

Stream Processing

Also known as: real-time-aggregation, stream-analytics, flink, kafka-streams
infrastructure
Process events as they arrive—compute rolling counts, windows, and joins over infinite streams without waiting for batch jobs.

Stream processing frameworks (Flink, Kafka Streams, Spark Streaming) let you run continuous queries over event streams. Instead of batch jobs that run every hour, you compute results in real-time as events arrive.

Core concepts: (1) Windowing: group events by time (5-minute tumbling windows, 1-hour sliding windows). (2) Aggregation: count, sum, average over windows. (3) Stateful processing: maintain counters, joins, or other state across events. (4) Exactly-once semantics: even if processing restarts, results are correct (no duplicates or missing events).

For surge pricing: events stream to Kafka → Flink groups by geohash → computes requests/drivers ratio per 5-min window → writes surge multiplier to Redis. Total latency: <60 seconds, no database load.

[→] WHEN TO USE
When you need real-time metrics, dashboards, or decisions over high-volume event streams. When batch jobs are too slow (hours) and you need answers in seconds.
[!] TRADEOFFS
Stateful stream processing is hard to reason about—windowing, late arrivals, exactly-once semantics. Operational complexity (managing stream processors, checkpoints, backpressure). But: unlocks real-time insights at scale.
Appears in:

TinyLFU

Also known as: frequency-based admission, bloom-filter admission
caching
A tiny memory that counts how popular each key is—bouncers use it to decide if newcomers deserve to get in.

TinyLFU (Tiny Least-Frequently-Used) is an admission policy for caches. It maintains a compact sketch that approximates how often each key has been accessed recently. When the cache is full and you want to insert a new item, TinyLFU asks: "Is this new key more popular than the victim we'd evict?" If yes, admit. If no, reject the insertion—the new key doesn't deserve to evict a regular.

The "tiny" part is key: instead of tracking exact counts (expensive in memory), TinyLFU uses a Count-Min Sketch—a probabilistic data structure that gives approximate frequency in just a few kilobytes. It also uses a time-decay window so old popularity fades (yesterday's viral meme shouldn't monopolize cache forever).

TinyLFU is often paired with Segmented LRU (SLRU): new items enter a small "probationary" queue. If they prove popular (accessed again), they graduate to a larger "protected" queue. One-off scan items never graduate, so they can't evict the working set.

[→] WHEN TO USE
When your cache suffers from scan poisoning—large one-off queries evicting your hot working set. Especially important for shared caches serving mixed workloads.
[!] TRADEOFFS
Adds complexity (Count-Min Sketch, decay logic). Approximate counts mean rare false rejections. Tuning the admission threshold requires experimentation. But: protects cache hit rate from scans.

TTL (Time-To-Live)

Also known as: expiration, cache lifetime
caching
An expiration timer—cache this for 1 hour, then throw it away and fetch fresh.

TTL is how long a cached value remains valid before expiring. Set a key with TTL=3600 (1 hour): after 3600 seconds, the cache automatically deletes it. The next request will be a cache miss and refill from the source.

Short TTLs (30s-5min) keep data fresh but increase miss rate and database load. Long TTLs (hours-days) maximize cache hits but risk serving stale data. The right TTL depends on how often data changes and how much staleness you tolerate.

Tip: use different TTLs for different data types. User profiles (rarely change): 1 hour. Trending posts (change frequently): 1 minute. Immutable content (never changes): 1 year or infinite.

[→] WHEN TO USE
On every cached item. Match TTL to data mutability: short for volatile data, long for stable data.
[!] TRADEOFFS
Short TTLs increase load and miss rate. Long TTLs serve stale data. Finding the sweet spot requires monitoring miss rate, staleness complaints, and database load.

Version Vector

Also known as: vector-clock, logical-clock
distributed-systems
Track who saw what when—detect if Alice's edit happened after, before, or concurrently with Bob's.

A version vector is a map of {node_id → counter} that tracks causal history in distributed systems. Each node maintains a counter. When a node makes a change, it increments its counter. When comparing two version vectors, you can detect:

- **A happened before B**: all of A's counters ≤ B's counters - **B happened before A**: all of B's counters ≤ A's counters - **Concurrent (conflict)**: some counters higher in A, some higher in B

Example: File has version {Alice:2, Bob:1}. Alice edits offline → {Alice:3, Bob:1}. Bob edits offline → {Alice:2, Bob:2}. When they sync: neither vector dominates → conflict detected → manual merge required.

Dropbox, Dynamo, and Riak use version vectors for conflict detection.

[→] WHEN TO USE
For distributed systems where multiple nodes can update the same data independently (multi-master, offline-first apps, collaborative editing).
[!] TRADEOFFS
Version vectors grow with the number of nodes (storage overhead). Garbage collection is tricky (when can you remove old node IDs?). But: precise conflict detection without global coordination.
Appears in:

WebSockets

Also known as: persistent-connection, bidirectional-communication
realtime
An open phone line between client and server—no dialing for every message, just talk whenever you want.

WebSockets establish a persistent, bidirectional connection between client and server. Unlike HTTP (request → response → connection closed), WebSocket connections stay open. Either side can send messages anytime without creating new connections.

This is perfect for real-time apps: chat messages, live notifications, multiplayer games. The server can push updates the instant they happen (no polling). Clients can send without HTTP overhead.

The trade-off: stateful connections. Servers must maintain connection state (which user is on which socket), complicating load balancing and horizontal scaling. You typically need sticky sessions or shared state (Redis) to route messages correctly across multiple servers.

[→] WHEN TO USE
For real-time, bidirectional communication where sub-second latency matters: chat, live dashboards, collaborative editing, gaming.
[!] TRADEOFFS
Stateful servers complicate scaling and load balancing. Connection limits per server. But: eliminates polling waste and delivers true real-time experience.
Appears in:

Write-Through Caching

Also known as: synchronous write, write-to-cache
caching
Every write updates both cache and database synchronously—slow but always consistent.

Write-through caching updates the cache and database together in the same transaction: write to cache, write to database, return success. Both are always in sync.

This guarantees consistency: reads always see the latest write (no stale cache). It's simple conceptually—no invalidation needed, no cache-aside logic.

The catch: writes are slower (two systems on the critical path) and more fragile (cache failure fails the whole write). It also couples your cache to every write path. For these reasons, cache-aside (write to DB, invalidate cache) is more popular.

[→] WHEN TO USE
When you need guaranteed cache consistency and write latency is acceptable. Less common than cache-aside.
[!] TRADEOFFS
Slower writes. Cache becomes critical dependency (downtime fails writes). Tight coupling. But: guaranteed consistency, no invalidation logic.