Systems Glossary
A guide to the concepts, patterns, and trade-offs that power scalable systems.
Apache Kafka
Kafka is a distributed event streaming platform. Instead of sending data directly to consumers, you write events to Kafka topics (append-only logs). Consumers read from these logs at their own pace. Events are durably stored (days, weeks, forever), so consumers can rewind and replay.
Key concepts: (1) Topics: named streams (e.g., "ride-requests", "driver-status-changes"). (2) Partitions: topics split into parallel logs for scale. (3) Consumer groups: multiple consumers share partitions for load balancing. (4) Retention: events persist for configured time, enabling replay.
Why it's powerful: decouples producers from consumers. Your app writes "ride requested" to Kafka. Multiple consumers can process it—one for matching, one for analytics, one for fraud detection—without coordinating. If analytics crashes, it resumes from its last checkpoint.
At-Least-Once Delivery
At-least-once delivery guarantees every message is delivered to the receiver at least once, but possibly more than once. If the sender isn't sure the message arrived (timeout, network blip), it retries—potentially sending duplicates.
This is simpler and more reliable than exactly-once (which is effectively impossible in distributed systems). The trade-off: receivers must be idempotent—processing a message twice shouldn't corrupt state.
Example: "charge user $5." If this runs twice, you double-charge. Fix: use idempotency keys (transaction IDs) to detect and ignore duplicates.
Backpressure
Backpressure is the ability of a system to push back on producers when consumers are overwhelmed. Think of it as saying "slow down" instead of accepting unbounded work and crashing.
Example: API accepts 10K requests/sec, but database can only handle 1K writes/sec. Without backpressure, requests queue up, memory fills, system crashes. With backpressure: return 429 Too Many Requests or 503 Service Unavailable when queue depth exceeds limits. Clients back off and retry.
This keeps the system stable under overload. Better to reject some requests cleanly than accept everything and fail catastrophically.
Cache Invalidation
Cache invalidation is the art of removing stale cached data when the underlying data changes. The classic approach: on write, DELETE the cache key. Next read will be a cache miss and refill from the database with fresh data.
This is simple and explicit but requires discipline: every write path must invalidate relevant keys. Miss one, and users see stale data indefinitely (until TTL expires).
Alternative: use short TTLs and tolerate bounded staleness (cache expires after 30s, data is at most 30s stale). Or: write-through (update cache and database together). Each has trade-offs.
Cache Stampede
Picture this: a celebrity's profile is cached with a 1-hour TTL. At exactly 3:00:00 PM, the cache entry expires. At 3:00:01, 50,000 fans refresh the page. All 50,000 requests miss the cache and sprint to the database at once.
The database, which was calmly serving cached reads, suddenly faces 50,000 identical queries in one second. It chokes. Latency spikes to 5+ seconds. More requests time out and retry, making it worse.
The fix: coordination. Use cache locking (only the first requester fetches; others wait and reuse the result) or probabilistic early refresh (start refreshing the cache *before* it expires, so the herd never gathers). Both approaches ensure only one or a few requests hit the database, no matter how popular the key.
Cache-Aside Pattern
Cache-aside (also called "lazy-loading") is the simplest caching strategy: your application checks the cache before querying the database. On a hit, serve the cached value. On a miss, fetch from the database, write it to the cache, and return it.
The database remains the source of truth. The cache is just a convenient memory. When data changes, you explicitly delete (invalidate) the cached key so the next read refills with fresh data. This keeps the cache honest without tightly coupling it to every write.
It's coordination-free and works beautifully for read-heavy workloads where misses are rare. The main downside: the first reader after invalidation pays the "refill tax"—a slightly slower request while the cache warms up.
Chunking / Multipart Upload
Chunking divides large files into fixed-size blocks (chunks) that can be uploaded independently. Each chunk gets an ID and position. The server tracks which chunks arrived. If the connection drops, the client resumes from the last successful chunk, not the beginning.
Implementation: client splits a 5GB file into 10MB chunks (500 chunks). POST /upload/session → get session_id. Then POST /upload/session/123/chunk/0, chunk/1, etc. Server stores chunks in temporary storage. When all chunks arrive, concatenate them into the final file.
This is how Dropbox, Google Drive, and S3's multipart uploads work. Without chunking, a 5GB upload that fails at 99% must restart from scratch—brutal on mobile networks.
Conflict Resolution
Conflict resolution strategies handle concurrent edits to the same data:
1. **Last-Write-Wins (LWW)**: Simplest. Most recent timestamp wins. Loses data silently—not great for user files. 2. **Detect and prompt**: Dropbox approach. Detect conflicts (using version vectors), save both versions as "file.txt" and "file (conflicted copy).txt", let user merge manually. 3. **Operational Transforms (OT)**: Google Docs approach. Transform concurrent operations so they converge to the same result. Complex but seamless UX. 4. **CRDTs**: Conflict-free replicated data types. Data structures designed to merge automatically without conflicts.
Most file sync systems use strategy #2: detect conflicts, preserve both versions, notify user.
Consistent Hashing
Traditional hashing (key % N) falls apart when N (the number of servers) changes: adding one server rehashes every key, causing a full data migration. Consistent hashing fixes this by arranging servers on a virtual ring (0 to 2^32).
Each key hashes to a position on the ring and is assigned to the next clockwise server. When you add a server, only keys between the new server and the previous one move—typically 1/N of your data. Remove a server? Only its keys move to the next server.
Virtual nodes (each physical server owns 100-200 ring positions) smooth out imbalances and hotspots. It's the foundation of systems like DynamoDB, Cassandra, and distributed caches.
Content Delivery Network (CDN)
A CDN is a geographically distributed network of servers (edge nodes) that cache your content close to users. When a user in Tokyo requests an image, the CDN serves it from a Tokyo edge node (50ms away) instead of your origin in Virginia (250ms away).
The first request from a region is a cache miss—the edge fetches from your origin, caches it, and serves it. Subsequent requests hit the cache. Popular files (viral images, shared videos) are served locally without touching your origin.
CDNs shine for immutable, publicly-shared content (images, videos, CSS, JS). They're less useful for dynamic, personalized, or frequently-changing data. Set long TTLs (cache lifetimes) and use versioned URLs for invalidation.
Content-Addressed Storage
Content-addressed storage uses the hash of file content (SHA-256) as the storage key. Upload "cat.jpg" → hash it → store at blobs/a3f5d9.... Upload the identical "cat.jpg" again → same hash → same key → no duplicate storage.
This enables automatic deduplication: 500 users upload the same 50MB video, you store it once. It also makes objects immutable—any change produces a different hash, so you can cache aggressively.
The catch: you can't query by filename (use a separate metadata DB to map names → hashes). Hash collisions are theoretically possible but astronomically unlikely with SHA-256. And you need reference counting to know when to delete (don't delete a blob if any file still references it).
Cursor-Based Pagination
Cursor pagination uses a WHERE clause to seek to the next page instead of OFFSET. Store the last row's sort key (e.g., created_at timestamp) as a cursor. The next query: WHERE created_at < cursor ORDER BY created_at DESC LIMIT 50.
With an index on the sort column, the database jumps directly to the cursor position in O(log n) time—no scanning. Whether you're on page 1 or page 1000, query time stays constant (15-25ms).
Compare to OFFSET 10000 LIMIT 50: the database must scan and discard 10,000 rows before returning 50. At deep pages, this becomes catastrophically slow. Cursor pagination sidesteps the problem entirely.
Database Indexing
Without an index, the database must scan every row in a table to find matches—fine for 100 rows, catastrophic for 10 million. An index is a separate data structure (typically a B-tree) that keeps a sorted copy of one or more columns, with pointers back to the full rows.
When you query WHERE short_code = 'abc123', the database uses the index to jump directly to matching rows in O(log n) time instead of scanning everything in O(n) time. It's the difference between 5ms and 5 seconds.
The catch: indexes consume disk space and slow down writes (every INSERT/UPDATE must also update the index). You can't index everything—pick columns you frequently filter or sort by.
Delta Sync
Delta sync detects which parts of a file changed and transfers only those parts. Split the file into fixed-size blocks (4KB), hash each block (SHA-256), compare hashes with the server's copy. Upload only blocks whose hashes differ.
This turns "3-line edit in 10MB file = 10MB upload" into "3-line edit = 8KB upload" (only the changed block). It's how Dropbox, rsync, and Git efficiently sync large files over slow networks.
The algorithm: client hashes local blocks, requests server's block hashes, compares, uploads mismatched blocks. Server patches its copy by replacing changed blocks. The catch: CPU cost of hashing and complexity of handling insertions/deletions (rolling hashes help).
Ephemeral State
Ephemeral state is data with a short lifespan (seconds to minutes) that doesn't need durability. Examples: "Alice is typing," "Bob is online," "cursor at line 42." It's relevant now but worthless in 10 seconds.
Store it in memory (Redis, in-process cache) with short TTLs, not databases. Writing every keystroke to disk is wasteful—you're persisting data that will never be queried again. Ephemeral state is disposable: if Redis restarts and loses it, that's fine.
This is the opposite of durable state (user profiles, messages, posts) which must survive restarts and be queryable forever.
Eventual Consistency
Eventual consistency means: if no new updates are made, all replicas will eventually converge to the same value. Unlike strong consistency (all replicas agree immediately), eventual consistency allows temporary divergence.
Example: You upload a file to Dropbox. It immediately appears on your laptop. Your phone sync happens 10 seconds later due to network delay. For 10 seconds, your phone has stale state (the file doesn't exist yet). Eventually, they converge.
This is a fundamental CAP theorem tradeoff: in a network partition, you can have availability (accept writes) OR consistency (reject writes until partition heals). Eventual consistency chooses availability.
Fanout-on-Read
When Bob loads his feed, fanout-on-read queries: "fetch recent posts from everyone Bob follows, rank them, return top 30." The work happens at read-time, not write-time.
This eliminates write amplification—Alice's post to 20 million followers is just one INSERT, not 20 million. It also allows algorithmic ranking: you can compute personalized relevance scores on-the-fly using real-time signals.
The catch: reads are slower and more expensive. Every feed load requires JOINing posts and follows, sorting by ranking score, and limiting results. At scale, you mitigate this with caching (store per-user candidate sets) and precomputed ranking scores.
Fanout-on-Write
When Alice posts a photo, fanout-on-write immediately writes that post to the inbox/feed of every one of her followers. Think of it as "proactive delivery"—you do the heavy lifting at publish-time so reads become trivial lookups.
This is amazing for typical users who follow 200-500 accounts: their feed is precomputed and loading it is just a simple index query. Reads are fast, predictable, and don't require expensive JOINs.
The catch: write amplification. When a celebrity with 20 million followers posts, you're writing 20 million rows. At scale, this can saturate your database and delay posts for hours. That's why real systems often use a hybrid: fanout-on-write for regular users, fanout-on-read for accounts with huge followings.
Geo-Distribution
Geo-distribution means running your application in multiple geographic regions (US East, Europe, Asia) with regional databases/caches/servers. Users are routed (via DNS or global load balancer) to the nearest region.
This slashes latency by eliminating intercontinental round-trips. A Tokyo user hits Tokyo servers (~15ms) instead of Virginia (~250ms). It also provides disaster recovery—if the US region fails, redirect traffic to Europe.
The complexity: data consistency. Writes to Tokyo must replicate to other regions. Do you accept eventual consistency (region lag)? Do you have a single global primary for writes (slower)? Do you partition by geography (users can't easily move)? These are hard trade-offs.
Geohash
Geohash is a geocoding system that encodes latitude and longitude into a short alphanumeric string. The key insight: nearby locations share a common prefix. For example, "dr5ru" and "dr5rv" are neighbors, while "dr5ru" and "dqcjq" are far apart.
How it works: Geohash recursively divides the world into a grid. Each character in the hash represents a finer subdivision. "d" is a large region, "dr" is smaller, "dr5" smaller still. A 6-character geohash represents a cell about 1.2km × 600m.
This makes spatial queries simple: to find nearby drivers, compute the rider's geohash, then query for drivers whose geohash starts with the same prefix. In SQL: WHERE driver_geohash LIKE 'dr5r%'. The database index can efficiently find all matching rows.
Horizontal Scaling
Horizontal scaling means adding more servers to distribute load, rather than making one server bigger (vertical scaling). Traffic doubles? Add another server. It doubles again? Add two more.
This provides near-infinite scalability—you can keep adding servers—and resilience (one server fails, others keep running). It's how hyperscalers (Google, Facebook) handle billions of users.
The catch: you need stateless servers or shared state (Redis, database). Servers must be interchangeable—any server can handle any request. This rules out in-process caches and requires careful session management (sticky sessions or external session store).
Hot Key Problem
A hot key is a single cache/database key that receives disproportionate traffic. A celebrity posts their profile link → 500K requests/sec for key "profile:celebrity123" → the one shard that owns it maxes out while others idle.
Normal sharding doesn't help: adding 10 shards distributes load evenly across *keys*, but one hot key still lands on one shard. That shard becomes the bottleneck.
Fixes: replicate the hot key to multiple shards (requests spread across replicas), use local caching (app-level cache before hitting shared cache), or apply request coalescing (first request fetches, others wait and reuse result).
Idempotency
An idempotent operation produces the same result no matter how many times you execute it. DELETE key "abc" is idempotent: delete it once, delete it again, the key is still gone. SET key="abc" value="123" is idempotent: set it ten times, value is still "123."
Idempotency makes retries safe. If a request times out, did it succeed or fail? If the operation is idempotent, just retry—worst case, you do redundant work but don't corrupt state.
Non-idempotent: INCREMENT counter. Retry it, and you double-count. The fix: use unique request IDs (idempotency keys) so the server can detect and ignore duplicates.
LRU (Least Recently Used)
LRU is a cache eviction policy: when the cache is full and you need space, evict the item that was accessed longest ago. The intuition: recently-used items are more likely to be used again soon (temporal locality).
Implementation: maintain a doubly-linked list ordered by access time. On access, move the item to the front. On eviction, remove from the back. With a hash map for O(1) lookup, both operations are O(1).
LRU is simple and effective for many workloads. The problem: it's purely recency-based. A one-time scan of a million cold items can evict your entire working set of hot items. That's where admission control (TinyLFU) or segmented LRU helps.
Message Queue
A message queue decouples producers (who create work) from consumers (who do work). Producers enqueue messages/tasks; workers pull and process them. If workers are busy, tasks wait in the queue. Add more workers to process faster.
This enables asynchronous processing: user uploads photo → enqueue "resize image" task → return success immediately → worker processes task in background. Users don't wait for slow work.
Queues also provide buffering and backpressure: traffic spikes fill the queue instead of overwhelming workers. Workers process at their own pace. Common systems: RabbitMQ, AWS SQS, Redis lists.
Object Storage
Object storage (AWS S3, Google Cloud Storage, Azure Blob) stores unstructured data as objects: binary blobs identified by keys (like file paths). It's cheap (~$0.02/GB/month), durable (99.999999999% durability via replication), and scales to petabytes.
Unlike databases, object storage doesn't do queries, indexes, or transactions. It's simple: PUT an object, GET it by key, DELETE it. Objects are immutable—editing means uploading a new version. It's perfect for photos, videos, backups, logs—anything large and static.
The catch: eventual consistency (writes might not be immediately visible), higher latency than local disk (~50-100ms), and no metadata queries (you can't "find all JPEGs uploaded this week" without external indexing).
Offset-Based Pagination
Offset pagination uses OFFSET and LIMIT: OFFSET 100 LIMIT 20 means "skip the first 100 rows, return the next 20." It's intuitive and allows jumping to arbitrary pages (page 5 = OFFSET 80 LIMIT 20).
The problem: the database must scan all skipped rows even though it discards them. OFFSET 10000 reads 10,000 rows from disk, throws them away, and returns the next 50. Query time increases linearly with offset—page 1 is fast, page 100 is slow, page 1000 times out.
It's fine for small datasets or shallow pagination (first 5-10 pages). Beyond that, cursor-based pagination is far superior.
OLTP vs OLAP
OLTP (Online Transaction Processing) handles user-facing queries: "get user by ID," "insert message," "update profile." Queries are fast (<10ms), touch few rows, and run constantly. Databases are row-oriented and optimized for indexes and point lookups.
OLAP (Online Analytical Processing) handles reporting and analytics: "total revenue by region this month," "trending hashtags." Queries are slow (seconds to minutes), scan millions of rows, and aggregate data. Databases are column-oriented (data warehouses like BigQuery, Redshift) and optimized for scans.
Running OLAP queries on OLTP databases starves transactional traffic. Solution: replicate OLTP data to a separate OLAP warehouse (via ETL pipelines). Analysts query the warehouse; users query the OLTP database.
Optimistic Locking
Optimistic locking is the "forgiveness over permission" approach to concurrency control. Instead of acquiring exclusive locks that block other writers, you proceed hopefully and check for conflicts only when committing.
Here's how it works: each record has a version number. When Alice downloads file version 5, edits it offline, and uploads later, she says "I'm updating version 5 to version 6." The server checks: "Is the current version still 5?" If yes, accept the update and increment to version 6. If no—meaning Bob already updated it to version 6 while Alice was offline—reject with HTTP 409 Conflict.
This is perfect for offline-first apps and collaborative editing where conflicts are rare but data loss is unacceptable. Users work independently most of the time; when conflicts happen, the app can show both versions and let the user decide how to merge or rename.
Power of Two Choices
The Power of Two Choices is a simple but powerful load-balancing algorithm: when you need to assign a request to a server, randomly pick two candidate servers, check which one has fewer active requests (or lower load), and send the request there.
Remarkably, this is almost as good as always picking the globally least-loaded server, but without any global coordination. Picking one server at random (pure random) is terrible—load imbalance can be large. Picking the best of two random servers dramatically evens out the load with minimal overhead.
The math: with random assignment, max load is O(log n / log log n). With power-of-two, max load drops to O(log log n)—an exponential improvement. In practice, two random choices are nearly as good as scanning all servers.
Pub/Sub (Publish-Subscribe)
Pub/sub decouples message senders (publishers) from receivers (subscribers). Publishers send messages to named topics/channels. Subscribers register interest in topics and receive all messages published there.
In a chat app: when Alice sends a message to group #42, her server publishes to Redis channel "group:42". All servers subscribed to that channel receive the message and deliver to their connected clients. No server needs to know where Bob's WebSocket lives.
This enables horizontal scaling (add servers, subscribe them to channels) and fanout (one publish → many subscribers). The catch: pub/sub is typically fire-and-forget—if a subscriber is offline during publish, it misses the message (unless you add persistent queues).
Read Replicas
Read replicas are copies of your primary database that receive writes asynchronously (shortly after they happen on the primary). Clients send all writes to the primary, but reads can go to any replica.
This multiplies your read capacity horizontally: 3 replicas = 3× read throughput. It also provides basic disaster recovery—if the primary fails, promote a replica to become the new primary.
The trade-off is replication lag: replicas might be a few milliseconds (or seconds, if network hiccups) behind the primary. Reads from replicas can return slightly stale data. For most apps (news feeds, product catalogs), this is fine. For others (bank balances), you might need to read from the primary or use synchronous replication.
S2 Geometry
S2 is a geospatial indexing library from Google that represents Earth as a sphere divided into cells at 30 hierarchical levels. Unlike geohash (flat grid with hard boundaries), S2 cells have no discontinuities—cells at boundaries smoothly overlap.
Key features: (1) Hierarchical: level 0 = 6 large cells covering Earth, level 30 = 1cm² cells. (2) Covering algorithm: express any region (circle, polygon) as a union of S2 cells at mixed levels. (3) Fast containment checks: "Is point X in region Y?" reduces to "Is cell ID in covering set?"
For ride-hailing: compute a covering of a 2km radius around the rider using cells at levels 13-15. Query drivers: WHERE s2_cell = ANY(covering_cells). This finds nearby drivers without boundary misses, using standard database indexes (B-tree or GIN).
Segmented LRU (SLRU)
Segmented LRU (SLRU) divides the cache into two segments: a small probationary queue (e.g., 20% of space) for new entries, and a large protected queue (80%) for items that have been accessed multiple times.
How it works: new items enter probationary. If accessed again while still in probationary, promote to protected. Evict from probationary first. Protected items only get evicted if both queues are full and the protected item is LRU.
This prevents one-time scans from polluting the cache. A scan of a million cold items fills probationary, gets evicted before touching protected. Meanwhile, your working set stays safe in protected. It's like having a bouncer who makes tourists wait outside while regulars get VIP access.
State Machine
A state machine defines a set of states and valid transitions between them. For a ride: REQUESTED → MATCHED → IN_PROGRESS → COMPLETED. Invalid transitions (REQUESTED → COMPLETED) are rejected.
Why this matters: prevents bugs from race conditions. If a rider cancels while a driver accepts, the state machine enforces rules: "Can't transition to CANCELLED from MATCHED unless time < 5min" or "Can't match if already CANCELLED." These rules are encoded in the UPDATE query's WHERE clause.
Combined with optimistic locking (version numbers), state machines make concurrent updates safe: UPDATE trips SET status='MATCHED', version=version+1 WHERE id=? AND status='REQUESTED' AND version=?. If status isn't REQUESTED, the update fails—someone else already changed it.
Stream Processing
Stream processing frameworks (Flink, Kafka Streams, Spark Streaming) let you run continuous queries over event streams. Instead of batch jobs that run every hour, you compute results in real-time as events arrive.
Core concepts: (1) Windowing: group events by time (5-minute tumbling windows, 1-hour sliding windows). (2) Aggregation: count, sum, average over windows. (3) Stateful processing: maintain counters, joins, or other state across events. (4) Exactly-once semantics: even if processing restarts, results are correct (no duplicates or missing events).
For surge pricing: events stream to Kafka → Flink groups by geohash → computes requests/drivers ratio per 5-min window → writes surge multiplier to Redis. Total latency: <60 seconds, no database load.
TinyLFU
TinyLFU (Tiny Least-Frequently-Used) is an admission policy for caches. It maintains a compact sketch that approximates how often each key has been accessed recently. When the cache is full and you want to insert a new item, TinyLFU asks: "Is this new key more popular than the victim we'd evict?" If yes, admit. If no, reject the insertion—the new key doesn't deserve to evict a regular.
The "tiny" part is key: instead of tracking exact counts (expensive in memory), TinyLFU uses a Count-Min Sketch—a probabilistic data structure that gives approximate frequency in just a few kilobytes. It also uses a time-decay window so old popularity fades (yesterday's viral meme shouldn't monopolize cache forever).
TinyLFU is often paired with Segmented LRU (SLRU): new items enter a small "probationary" queue. If they prove popular (accessed again), they graduate to a larger "protected" queue. One-off scan items never graduate, so they can't evict the working set.
TTL (Time-To-Live)
TTL is how long a cached value remains valid before expiring. Set a key with TTL=3600 (1 hour): after 3600 seconds, the cache automatically deletes it. The next request will be a cache miss and refill from the source.
Short TTLs (30s-5min) keep data fresh but increase miss rate and database load. Long TTLs (hours-days) maximize cache hits but risk serving stale data. The right TTL depends on how often data changes and how much staleness you tolerate.
Tip: use different TTLs for different data types. User profiles (rarely change): 1 hour. Trending posts (change frequently): 1 minute. Immutable content (never changes): 1 year or infinite.
Version Vector
A version vector is a map of {node_id → counter} that tracks causal history in distributed systems. Each node maintains a counter. When a node makes a change, it increments its counter. When comparing two version vectors, you can detect:
- **A happened before B**: all of A's counters ≤ B's counters - **B happened before A**: all of B's counters ≤ A's counters - **Concurrent (conflict)**: some counters higher in A, some higher in B
Example: File has version {Alice:2, Bob:1}. Alice edits offline → {Alice:3, Bob:1}. Bob edits offline → {Alice:2, Bob:2}. When they sync: neither vector dominates → conflict detected → manual merge required.
Dropbox, Dynamo, and Riak use version vectors for conflict detection.
WebSockets
WebSockets establish a persistent, bidirectional connection between client and server. Unlike HTTP (request → response → connection closed), WebSocket connections stay open. Either side can send messages anytime without creating new connections.
This is perfect for real-time apps: chat messages, live notifications, multiplayer games. The server can push updates the instant they happen (no polling). Clients can send without HTTP overhead.
The trade-off: stateful connections. Servers must maintain connection state (which user is on which socket), complicating load balancing and horizontal scaling. You typically need sticky sessions or shared state (Redis) to route messages correctly across multiple servers.
Write-Through Caching
Write-through caching updates the cache and database together in the same transaction: write to cache, write to database, return success. Both are always in sync.
This guarantees consistency: reads always see the latest write (no stale cache). It's simple conceptually—no invalidation needed, no cache-aside logic.
The catch: writes are slower (two systems on the critical path) and more fragile (cache failure fails the whole write). It also couples your cache to every write path. For these reasons, cache-aside (write to DB, invalidate cache) is more popular.