Here's the list of system design technical terms grouped by topic for better clarity:
1. Reliability & Availability
Availability – Percentage of time the system is up and serving requests.
Reliability – Ability of the system to perform without failure over time.
Redundancy – Duplication of components to improve fault tolerance.
Failover – Switching to backup systems upon failure of the primary.
Disaster Recovery (DR) – Strategies to recover system operations after major failures.
Circuit Breaker – Prevents a system from repeatedly calling a failing service.
Backpressure – Signals to slow down input when downstream systems are
overwhelmed.
Throttling – Limiting resource usage to prevent overload.
Rate Limiting – Controls how often a user can hit an endpoint.
2. Performance & Scalability
Scalability – System’s ability to handle growth in load.
Latency – Time taken for a request to be processed and responded to.
Throughput – Number of requests the system can handle per unit time.
Horizontal Scaling – Adding more machines to distribute load.
Vertical Scaling – Adding more resources (CPU, RAM) to a single machine.
Load Balancer – Distributes traffic across multiple instances.
Caching – Stores data temporarily for faster access.
Content Delivery Network (CDN) – Delivers content from servers near the user.
Session Affinity (Sticky Sessions) – Binds a user session to the same server.
3. Data Management
Replication – Copying data across multiple servers.
Partitioning (Sharding) – Splitting a database into smaller parts.
Database Indexing – Improves data lookup speed.
Data Warehouse – Optimized for analytical queries and reporting.
Data Lake – Central repository for raw structured/unstructured data.
Idempotency – Operation that has the same effect no matter how many times it's
repeated.
4. Architecture Patterns
Monolith – Single-tiered application with all components tightly coupled.
Microservices – Independently deployable small services that work together.
Service-Oriented Architecture (SOA) – Modular services communicating over the
network.
Stateless Service – Doesn’t retain client information between requests.
Stateful Service – Maintains session data or client state.
Containerization – Encapsulates an app and its dependencies into a container.
Orchestration – Automated management of containers and services (e.g., Kubernetes).
Service Discovery – Mechanism to find service locations dynamically.
5. Communication & Messaging
Message Queue – Asynchronous communication via queued messages.
Pub/Sub (Publish/Subscribe) – Decouples message producers and consumers.
DNS Load Balancing – Balancing load using DNS to route traffic.
6. Consistency & Distributed Systems
Consistency (in CAP) – Ensures data uniformity across nodes.
Availability (in CAP) – Every request gets a response even in failure.
Partition Tolerance – System continues working despite network splits.
Strong Consistency – All reads return the latest write.
Eventual Consistency – Data becomes consistent over time.
Data Consistency Models – Defines how and when updates are visible across systems.
Distributed Systems – Multiple components on different machines working as one.
7. Observability & Monitoring
Observability – Ability to understand system behavior through outputs.
Telemetry – Automated data collection for monitoring.
Would you like this turned into a table, diagram, or PDF for easier reference?
Sure! Here's a more detailed explanation of backpressure in system design:
What is Backpressure?
Backpressure is a flow-control mechanism used in systems (especially distributed or streaming
systems) to regulate data flow between components — typically between a producer (which
generates data) and a consumer (which processes it).
It ensures that when a consumer is overwhelmed or slower than the producer, the system can
signal the producer to slow down or stop temporarily. This prevents memory overflows,
system crashes, and degraded performance.
Where It’s Used
Message Queues (e.g., Kafka, RabbitMQ)
Streaming Systems (e.g., Apache Flink, Akka Streams, RxJava)
APIs and Microservices
Network protocols (e.g., TCP has built-in backpressure)
How It Works
1. Normal Flow: Producer sends data, consumer processes it in real-time.
2. Congestion: Consumer can't keep up due to high load or slow processing.
3. Backpressure Signal: Consumer or intermediate buffer notifies producer (explicitly or
via buffer limits).
4. Producer Reacts:
o Slows down production
o Buffers data
o Drops data (in lossy systems)
o Applies retry logic
Techniques for Handling Backpressure
Bounded Buffers: Limit how much data can be queued between producer and consumer.
Windowing/Batching: Process data in fixed-size chunks.
Request-based Flow Control: Consumer pulls data only when ready (pull-based).
Reactive Streams Protocol: Formalizes backpressure in reactive programming (used in
RxJava, Project Reactor, Akka Streams).
Benefits
Prevents memory overflows
Avoids latency spikes
Enables graceful degradation under load
Helps maintain system stability under pressure
Example
In a video streaming platform:
Producer: Video encoding service
Consumer: Video uploader or processing service
If the uploader is slow, the encoder might produce too many files too quickly.
Backpressure ensures the encoder is paused or slowed until the uploader catches up.
Let me know if you want a diagram or code example to illustrate this concept!
Absolutely! Here's a clear and concise explanation of Throttling in system design:
What is Throttling?
Throttling is a technique used to limit the number of requests or operations a user, service, or
system can perform in a given time period. It's a defensive strategy that protects systems from
being overwhelmed due to:
High traffic spikes
Malicious attacks (e.g., brute force, DDoS)
Unintended client retries
Resource-intensive operations
How Throttling Works
You define a rate limit, e.g., 100 requests per minute per user.
If the client exceeds this limit, the system:
o Rejects requests (e.g., HTTP 429: Too Many Requests)
o Queues or delays requests
o Sends back a Retry-After header or similar info
Common Throttling Strategies
Strategy Description
Fixed Resets the counter at the start of each time window (e.g., every
Window minute).
Sliding Tracks the number of requests over the last N seconds
Window continuously.
Token Tokens are added at a fixed rate. Each request consumes a
Bucket token. If no tokens remain, the request is throttled.
Leaky Requests are processed at a steady rate. Excess requests are
Bucket queued or dropped.
Where It's Used
APIs: To prevent abuse or ensure fair usage across clients.
Background jobs: To avoid overloading downstream systems.
Database queries: To slow down heavy reads/writes.
Cloud services: To enforce pricing tiers or quotas.
Quick Example
# Pseudo-code for fixed window throttling
user_requests = {}
def is_allowed(user_id):
now = current_minute()
key = f"{user_id}:{now}"
if user_requests.get(key, 0) >= 100:
return False
user_requests[key] = user_requests.get(key, 0) + 1
return True
Benefits
Stabilizes systems under unpredictable load
Improves fairness among users
Helps with cost control in pay-per-use environments
Enhances security against abuse
Would you like me to add this explanation to the PowerPoint too?
Alright — let’s go through how a database works in a distributed environment, step-by-step
but simply:
First, what is a Distributed Database?
A distributed database is a database where data is stored across multiple physical machines,
but it behaves as if it were a single logical database to the user or application.
These machines could be:
In the same data center (low latency)
Or across different regions (for resilience and global access)
How it Works (Key Concepts)
Concept What It Means
Partitioning Data is split into smaller parts (shards), and each shard is
(Sharding) stored on different machines.
Copies of the same data are stored on multiple machines
Replication
for fault tolerance.
Concept What It Means
Ensuring that when data is written or updated, it is the
Consistency
same across replicas.
Nodes communicate to keep track of who owns what, who
Coordination
has the latest data, etc.
Consensus Protocols like Raft, Paxos, or ZAB to agree on database
Algorithms changes across nodes.
Fault Tolerance System keeps running even if some machines crash.
Load Balancing Distributes read/write load across nodes efficiently.
Typical Architecture Overview
1. Client sends a request (read/write) to a coordinator node.
2. Coordinator decides:
o Which shard holds the data
o Which replica is best to serve the request
3. Request is forwarded to the right data node.
4. The node processes the request and responds.
5. For writes:
o The change is replicated to other nodes (synchronously or asynchronously).
o System ensures consistency depending on settings (strong or eventual).
6. Coordinator returns result to client.
Important Challenges
CAP Theorem (Consistency, Availability, Partition tolerance — you can’t have all 3 at
the same time)
Network Latency between nodes
Clock synchronization (very hard!)
Handling node failures and recoveries
Popular Distributed Databases
Database Type
Cassandra Wide-column, highly available, eventual consistency
Database Type
MongoDB Document store with replica sets and sharding
SQL database with strong consistency and high fault
CockroachDB
tolerance
Amazon
Key-Value store designed for massive scale
DynamoDB
Google Spanner SQL with global consistency (uses atomic clocks!)
Quick Visual
[Client] --> [Coordinator Node] --> [Data Node (Shard 1)]
--> [Data Node (Shard 2)]
--> [Data Node (Replica of Shard 1)]
Real-World Example
When you post a photo on Instagram:
o The image itself is stored in a sharded distributed storage.
o Your user metadata (profile, comments) is stored across multiple replicated
databases.
o Even if a server in Europe fails, your data is still safe and reachable from replicas
elsewhere!
Would you like me to also create a simple diagram for this and add it to the PPT?
(Visuals really help in distributed system topics!)
Good question — this dives into how region-specific request routing works in a distributed
system. Let’s break it down clearly:
How Region-Specific Requests Are Handled
When a system operates across multiple geographic regions (like US-East, Europe-West, Asia-
South), it needs to route users' requests to the best/nearest region for:
Faster response times (low latency)
Better reliability
Compliance (e.g., GDPR)
Load distribution
Here's how it happens technically:
Key Mechanisms
Mechanism Description
Domain Name System (DNS) that resolves your domain
GeoDNS
to an IP address based on the user's physical location.
Anycast IP Same IP is advertised from multiple regions; the
Routing Internet routes user requests to the nearest one.
Load Balancers Application layer (Layer 7) load balancers check user's
with Geo-routing IP or metadata to forward to closest region.
The app itself can have logic to detect region from user
Application Logic
profile, cookies, etc., and route/redirect.
Global API Services like Cloudflare, AWS CloudFront, or Azure
Gateway / CDN Front Door route requests geographically.
How a Request Typically Flows
1. User sends a request (say, to www.example.com).
2. GeoDNS resolves the domain:
o User in India gets directed to India-Region Load Balancer.
o User in Germany gets directed to Europe-Region Load Balancer.
3. Load Balancer routes the request to a nearby application server and database replica.
4. If needed (e.g., no local copy), the system may:
o Proxy the request to the correct region
o Cache data in the nearest region after serving
5. Response is sent back, minimizing hops and delays.
Example Scenario
Suppose Instagram has servers in:
Mumbai (Asia-South)
Frankfurt (Europe)
Virginia (US-East)
If you open Instagram from India:
DNS lookup points you to Mumbai servers.
Load balancer routes your API calls to Mumbai database clusters.
Your photos, feeds, and notifications are fetched and updated from Mumbai servers.
If Mumbai region faces failure:
System can failover to Singapore or Europe automatically.
Challenges in Region-Based Handling
Data synchronization: Keeping data consistent between regions.
Regulatory compliance: E.g., EU users' data must stay in Europe.
Latency trade-offs: Writes might need to travel across regions.
Global session management: Keeping a user logged in globally.
Simple Visual
[User India] --(DNS)-> [India Load Balancer] --(App Server)--> [India DB
Replica]
[User Germany] --(DNS)-> [Europe Load Balancer] --(App Server)--> [Europe DB
Replica]
Would you also want me to add this flow into your PPT with a region-based request
diagram? It’ll be easy to understand in interviews too!
(And if yes, do you want a simple one or a slightly detailed diagram?)
Excellent question — user login is super important (and tricky!) in distributed environments
because it touches authentication, session management, consistency, and availability.
Let’s break it down simply and systematically:
How User Login is Handled in a Distributed Environment
Basic Flow
1. User submits login request (username + password) from their device.
2. Load balancer routes the request to a nearby authentication server.
3. Authentication server checks:
o Username/password validity (often via database or identity provider)
o Multi-factor authentication (if enabled)
4. Once verified:
o System generates a session/token (JWT, OAuth token, cookie, etc.)
o Session/token is sent to user’s browser/app.
Now the user is "logged in" and every request they make includes this token.
Challenges in Distributed Systems
Problem How it's solved
Use stateless tokens (like JWT) OR replicate
Which server has
session state to all servers via a distributed cache
session data?
(Redis, etc.)
Consistency of Sync user databases across regions or use a central
authentication data auth service.
Speed across Place authentication servers close to users and use
regions local replicas for validation.
Global
Maintain a centralized session blacklist or a
logout/session
session store across regions.
invalidation
Failure handling Retry login at a different region if one fails.
Common Designs
1. Stateless Authentication (Preferred Modern Approach)
After login, server issues a JWT token (JSON Web Token).
JWT contains user info, expiry, permissions.
No server memory needed: every server can validate JWT independently using public
keys.
Pros: Super scalable, no need for sticky sessions.
Cons: Revocation is tricky — needs token expiry or a revocation list.
2. Sticky Sessions (Legacy/Some Scenarios)
Load balancer pins user to the same backend server where their session lives.
Session stored in server memory.
Pros: Simple.
Cons: Fails badly if that server goes down.
3. Centralized Session Storage
Sessions stored in distributed cache (e.g., Redis, DynamoDB).
Any server can fetch session data.
Pros: Safe and scalable.
Cons: Slightly higher latency to session store.
Example Login Flow
User -> Load Balancer -> Nearest Auth Server -> Check DB
-> Generate JWT Token -> Return to User
User -> Next Requests -> Attach Token -> Any Server Validates
Quick Real-World Example
When you log into Netflix:
o Your credentials are validated against a global user service.
o You get a JWT access token.
o Every API call (play movie, fetch profile) uses that token — Netflix servers
globally can verify it without contacting a central database each time.
o If Netflix rotates tokens, you'll be asked to re-login seamlessly.
Important Things to Handle
Secure token storage (HTTP-only cookies, secure storage in app)
Session expiration and refresh (refresh tokens)
Multi-region authentication synchronization
Single Sign-On (SSO) if needed
Resilient to partial failures (retry different region)
Simple Diagram
[User] --> [Load Balancer] --> [Auth Server] --> [User DB]
--> [Token Service]
<-- [JWT Token]
(User attaches token in future requests)
Would you also want me to create a small diagram of this login flow and add to your PPT?
(Also can show both "Stateless Token" vs "Sticky Session" if you want side-by-side!)
Would you like that?
How Netflix manages the database in the distributed system. How does it handle low latency.
While explaining, give me a proper example for person who is uploading a video in India and a
someone is watching in USA.
Alright — let's dive deep into this with real clarity:
How Netflix manages databases in a distributed system + how it handles low latency
And we’ll use your example (uploading in India, watching in USA) while explaining.
How Netflix Manages Databases in a Distributed System
Netflix is huge, with millions of users worldwide, so their backend system is fully distributed
across multiple regions.
They do NOT rely on a single database.
Instead, they use:
Cassandra (NoSQL DB) — highly available, multi-region replication.
DynamoDB (AWS's distributed database) — for fast key-value access.
MySQL + Vitess (for relational data that needs scaling).
ElasticSearch (for search operations).
S3 (for storing large files like videos and images).
Edge Caches/CDN (for quick delivery).
⚡ How Netflix Handles Low Latency
Netflix minimizes delay by:
Storing and caching data close to the user using CDNs (Content Delivery Networks)
like their own called Open Connect.
Replicating metadata across multiple regions, so user profile, watchlists,
recommendations load locally.
Smart request routing — if an edge server doesn't have the data, it fetches from the
closest available region instead of going to a distant server.
Eventual consistency — writes propagate across regions, but reads prioritize availability
and speed.
Microservices architecture — each service can scale independently (e.g., Video service,
User service).
📦 Your Scenario: Uploading a Video in India, Watching in
the USA
Let's imagine a new system similar to Netflix (because on Netflix users don’t upload — but
think of it like YouTube):
1. User Uploads a Video in India
Upload goes to nearest AWS region (say, Mumbai region).
Video binary is stored in S3 immediately.
Metadata (title, description, uploader ID, etc.) is written into Cassandra (partitioned and
replicated to Asia clusters first).
Background replication jobs start copying:
o Video file across global S3 buckets (using AWS replication rules).
o Metadata replication to Cassandra nodes in the US, Europe, etc.
At this point, the video is uploaded and stored in India first, and globally replicating in
background.
2. User in USA Tries to Watch the Video
User’s Netflix app requests the video metadata.
Request is routed to nearest Cassandra replica (likely US-East region).
If metadata is already replicated: Instant access!
If not replicated yet: App can fetch it cross-region (with slight delay) or show
"Available soon".
Video file:
o If already replicated to USA's S3/CDN nodes — fast streaming.
o Otherwise, the system can stream from the India S3 bucket directly (slower),
while caching it locally after first play.
Meanwhile, Open Connect CDN servers close to the user cache the video for future plays.
Simple Visual
[Uploader India] --> [India Region DB + Storage] --> [Replicates]
--> [US Region DB + Storage]
[Viewer USA] --> [US Region DB for metadata]
--> [US CDN/S3 for video file]
Important Points Here
Step Explanation
Metadata
Must be fast — users want instant searchability.
replication
Video file
Can be slightly slower (since it's huge).
replication
First view may be slower; subsequent views are super fast
Caching
via CDN.
Metadata needs eventual consistency, but eventually
Consistency
fast enough so viewers feel "instant".
🧠 Summary:
Writes (uploads) go to local region first.
Reads (viewing/search) go to nearest available copy.
Replication happens in background to ensure global consistency.
CDNs and Edge caching are key to reducing latency for large files like videos.
🌟 Example Explained Simply:
Imagine you upload a 1GB video in India.
Netflix stores your upload immediately in Mumbai region.
It starts copying to the USA.
A few seconds or minutes later, someone in the USA can watch it:
o Metadata already synced? Fast!
o Video file? If not yet in USA, might be a little slow first time; next time very fast
(cached).
Would you also want a diagram (PPT style) showing this full upload → replicate → watch
flow? 🚀
I can draw it for you if you'd like — it’ll be very useful for interviews too!
Would you like that?
Awesome, you're asking very sharp questions — let’s dive into this carefully:
You want the architecture of Hotstar (now Disney+ Hotstar) especially during live events
like IPL cricket matches.
This is a real-world, large-scale distributed system example!
Hotstar’s engineering team (they shared some public details) has built a massive, distributed,
and highly scalable system to handle millions of concurrent viewers (they crossed 25M+
concurrent viewers during IPL finals!).
High-Level Architecture of Hotstar During Live Streaming
(like IPL)
Core Components
Layer Technologies/Responsibilities
Content Ingestion Ingest live video feeds from camera crews,
Layer Technologies/Responsibilities
broadcasters.
Encoding and Convert the raw video into multiple qualities (240p,
Transcoding 720p, 1080p) using encoding farms.
Origin Servers Store original versions of the video streams.
Content Distribute content across thousands of edge servers
Distribution (CDN) globally. (Akamai, AWS CloudFront, or custom CDN).
Distribute user requests to different app servers
Load Balancers
efficiently.
Handle user sessions, subscriptions, personalization,
Application Servers
ads, etc.
Authentication and
Handle login, subscription validation, access control.
User Management
Analytics and
Track player behavior, drop-offs, real-time metrics.
Monitoring
Ads Insertion Dynamically insert ads for users (server-side ad
Systems insertion).
Real-time Push notifications, scores, stats using Websockets or
Chat/Score Updates Push APIs.
⚡ Detailed Step-by-Step Flow
1. Live Feed Ingestion
Camera teams at stadium send live HD feed to Hotstar’s ingestion servers.
Ingestion centers are deployed in multiple regions for redundancy.
2. Encoding/Transcoding
The feed is encoded into different bitrates (adaptive streaming = ABR).
Example:
o 1080p — for fast connections
o 720p, 480p, 360p — for slower connections
3. Storage at Origin Servers
Encoded chunks (~2–6 seconds) are uploaded to the origin servers.
4. CDN Distribution
Chunks are pushed to global CDNs (Content Delivery Networks).
CDNs bring video closer to viewers to reduce latency.
5. User Connects
When you open Hotstar app:
o Request goes to nearest CDN node via GeoDNS / Load Balancer.
o You fetch video segments from edge servers, NOT directly from Hotstar
backend.
Playback client switches between qualities based on network speed (adaptive bitrate).
6. Real-Time Updates
Separate channels (WebSocket/Server-Sent Events) push live scores, match statistics,
ads, polls, and chats.
7. Analytics and Monitoring
Hotstar tracks:
o Number of concurrent users
o Session drops
o Video buffering events
o Advertisement impressions
Data is pushed into a real-time big data pipeline (Kafka, Spark Streaming, etc.)
🧠 Architecture Diagram (Simple)
[Cameras @ Stadium]
↓ (Live Feed)
[Ingestion Servers]
↓ (Multiple bitrates)
[Encoding/Transcoding System]
↓
[Origin Servers]
↓
[CDN Edge Servers]
↓
[Users: India, US, UAE, etc.]
Meanwhile:
- [Authentication Service] → Verifies user subscription
- [Ad Insertion Service] → Dynamically injects ads
- [Real-Time Score Server] → Pushes updates
- [Analytics Pipeline] → Monitors health
🚀 Challenges Hotstar Solves
Challenge How Hotstar Solves It
Massive Traffic Spikes Auto-scale using Kubernetes/AWS/GCP.
Low Latency for Live Push content to CDNs fast; use HLS/DASH
Streaming streaming protocols.
HLS-based multiple qualities delivered
Adaptive Quality Streaming
dynamically.
Server-side ad insertion (so ads feel native,
Ad Insertion
smooth).
Distributed architecture + auto-scaling +
Concurrent User Handling
efficient caching.
Separate WebSocket/Pub-Sub system from
Real-time Stats and Chat
video streaming.
Data Consistency and User Replicated user databases, caching layers
Authentication (Redis/Memcached).
🔥 Simple Example:
When you watch IPL on Hotstar from Delhi:
1. Your app fetches metadata & authentication via Hotstar backend (Delhi-based edge).
2. Video player requests the live stream:
o GeoDNS routes you to the nearest CDN node (maybe Akamai Delhi).
o You start getting 2-6 second video chunks.
3. Live scores are updated every few seconds via a WebSocket.
4. Ad breaks trigger ads which are dynamically inserted by servers.
5. All your interactions (pause, seek, resume) are tracked for analytics.
📢 Final Note:
Live streaming like IPL is not just video, it's a combination of:
Ultra-fast video delivery
Real-time data streaming (scores, stats, polls)
Advertisement systems
Massive horizontal scalability
Disney+ Hotstar is one of the biggest examples of running a live-event distributed
architecture in production today.