0% found this document useful (0 votes)
21 views20 pages

System Design Terms

The document provides a comprehensive overview of system design technical terms, categorized into topics such as Reliability & Availability, Performance & Scalability, Data Management, Architecture Patterns, Communication & Messaging, Consistency & Distributed Systems, and Observability & Monitoring. It also includes detailed explanations of concepts like Backpressure and Throttling, their mechanisms, benefits, and common use cases. Additionally, it discusses the handling of user login in distributed environments, addressing challenges and common designs for authentication and session management.

Uploaded by

anilvit7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views20 pages

System Design Terms

The document provides a comprehensive overview of system design technical terms, categorized into topics such as Reliability & Availability, Performance & Scalability, Data Management, Architecture Patterns, Communication & Messaging, Consistency & Distributed Systems, and Observability & Monitoring. It also includes detailed explanations of concepts like Backpressure and Throttling, their mechanisms, benefits, and common use cases. Additionally, it discusses the handling of user login in distributed environments, addressing challenges and common designs for authentication and session management.

Uploaded by

anilvit7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Here's the list of system design technical terms grouped by topic for better clarity:

1. Reliability & Availability

 Availability – Percentage of time the system is up and serving requests.


 Reliability – Ability of the system to perform without failure over time.
 Redundancy – Duplication of components to improve fault tolerance.
 Failover – Switching to backup systems upon failure of the primary.
 Disaster Recovery (DR) – Strategies to recover system operations after major failures.
 Circuit Breaker – Prevents a system from repeatedly calling a failing service.
 Backpressure – Signals to slow down input when downstream systems are
overwhelmed.
 Throttling – Limiting resource usage to prevent overload.
 Rate Limiting – Controls how often a user can hit an endpoint.

2. Performance & Scalability

 Scalability – System’s ability to handle growth in load.


 Latency – Time taken for a request to be processed and responded to.
 Throughput – Number of requests the system can handle per unit time.
 Horizontal Scaling – Adding more machines to distribute load.
 Vertical Scaling – Adding more resources (CPU, RAM) to a single machine.
 Load Balancer – Distributes traffic across multiple instances.
 Caching – Stores data temporarily for faster access.
 Content Delivery Network (CDN) – Delivers content from servers near the user.
 Session Affinity (Sticky Sessions) – Binds a user session to the same server.

3. Data Management

 Replication – Copying data across multiple servers.


 Partitioning (Sharding) – Splitting a database into smaller parts.
 Database Indexing – Improves data lookup speed.
 Data Warehouse – Optimized for analytical queries and reporting.
 Data Lake – Central repository for raw structured/unstructured data.
 Idempotency – Operation that has the same effect no matter how many times it's
repeated.
4. Architecture Patterns

 Monolith – Single-tiered application with all components tightly coupled.


 Microservices – Independently deployable small services that work together.
 Service-Oriented Architecture (SOA) – Modular services communicating over the
network.
 Stateless Service – Doesn’t retain client information between requests.
 Stateful Service – Maintains session data or client state.
 Containerization – Encapsulates an app and its dependencies into a container.
 Orchestration – Automated management of containers and services (e.g., Kubernetes).
 Service Discovery – Mechanism to find service locations dynamically.

5. Communication & Messaging

 Message Queue – Asynchronous communication via queued messages.


 Pub/Sub (Publish/Subscribe) – Decouples message producers and consumers.
 DNS Load Balancing – Balancing load using DNS to route traffic.

6. Consistency & Distributed Systems

 Consistency (in CAP) – Ensures data uniformity across nodes.


 Availability (in CAP) – Every request gets a response even in failure.
 Partition Tolerance – System continues working despite network splits.
 Strong Consistency – All reads return the latest write.
 Eventual Consistency – Data becomes consistent over time.
 Data Consistency Models – Defines how and when updates are visible across systems.
 Distributed Systems – Multiple components on different machines working as one.

7. Observability & Monitoring

 Observability – Ability to understand system behavior through outputs.


 Telemetry – Automated data collection for monitoring.

Would you like this turned into a table, diagram, or PDF for easier reference?

Sure! Here's a more detailed explanation of backpressure in system design:


What is Backpressure?

Backpressure is a flow-control mechanism used in systems (especially distributed or streaming


systems) to regulate data flow between components — typically between a producer (which
generates data) and a consumer (which processes it).

It ensures that when a consumer is overwhelmed or slower than the producer, the system can
signal the producer to slow down or stop temporarily. This prevents memory overflows,
system crashes, and degraded performance.

Where It’s Used

 Message Queues (e.g., Kafka, RabbitMQ)


 Streaming Systems (e.g., Apache Flink, Akka Streams, RxJava)
 APIs and Microservices
 Network protocols (e.g., TCP has built-in backpressure)

How It Works

1. Normal Flow: Producer sends data, consumer processes it in real-time.


2. Congestion: Consumer can't keep up due to high load or slow processing.
3. Backpressure Signal: Consumer or intermediate buffer notifies producer (explicitly or
via buffer limits).
4. Producer Reacts:
o Slows down production
o Buffers data
o Drops data (in lossy systems)
o Applies retry logic

Techniques for Handling Backpressure

 Bounded Buffers: Limit how much data can be queued between producer and consumer.
 Windowing/Batching: Process data in fixed-size chunks.
 Request-based Flow Control: Consumer pulls data only when ready (pull-based).
 Reactive Streams Protocol: Formalizes backpressure in reactive programming (used in
RxJava, Project Reactor, Akka Streams).
Benefits

 Prevents memory overflows


 Avoids latency spikes
 Enables graceful degradation under load
 Helps maintain system stability under pressure

Example

In a video streaming platform:

 Producer: Video encoding service


 Consumer: Video uploader or processing service
If the uploader is slow, the encoder might produce too many files too quickly.
Backpressure ensures the encoder is paused or slowed until the uploader catches up.

Let me know if you want a diagram or code example to illustrate this concept!

Absolutely! Here's a clear and concise explanation of Throttling in system design:

What is Throttling?

Throttling is a technique used to limit the number of requests or operations a user, service, or
system can perform in a given time period. It's a defensive strategy that protects systems from
being overwhelmed due to:
 High traffic spikes
 Malicious attacks (e.g., brute force, DDoS)
 Unintended client retries
 Resource-intensive operations

How Throttling Works

 You define a rate limit, e.g., 100 requests per minute per user.
 If the client exceeds this limit, the system:
o Rejects requests (e.g., HTTP 429: Too Many Requests)
o Queues or delays requests
o Sends back a Retry-After header or similar info

Common Throttling Strategies


Strategy Description

Fixed Resets the counter at the start of each time window (e.g., every
Window minute).

Sliding Tracks the number of requests over the last N seconds


Window continuously.

Token Tokens are added at a fixed rate. Each request consumes a


Bucket token. If no tokens remain, the request is throttled.

Leaky Requests are processed at a steady rate. Excess requests are


Bucket queued or dropped.

Where It's Used

 APIs: To prevent abuse or ensure fair usage across clients.


 Background jobs: To avoid overloading downstream systems.
 Database queries: To slow down heavy reads/writes.
 Cloud services: To enforce pricing tiers or quotas.

Quick Example
# Pseudo-code for fixed window throttling
user_requests = {}
def is_allowed(user_id):
now = current_minute()
key = f"{user_id}:{now}"
if user_requests.get(key, 0) >= 100:
return False
user_requests[key] = user_requests.get(key, 0) + 1
return True

Benefits

 Stabilizes systems under unpredictable load


 Improves fairness among users
 Helps with cost control in pay-per-use environments
 Enhances security against abuse

Would you like me to add this explanation to the PowerPoint too?

Alright — let’s go through how a database works in a distributed environment, step-by-step


but simply:

First, what is a Distributed Database?

A distributed database is a database where data is stored across multiple physical machines,
but it behaves as if it were a single logical database to the user or application.

These machines could be:

 In the same data center (low latency)


 Or across different regions (for resilience and global access)

How it Works (Key Concepts)


Concept What It Means

Partitioning Data is split into smaller parts (shards), and each shard is
(Sharding) stored on different machines.

Copies of the same data are stored on multiple machines


Replication
for fault tolerance.
Concept What It Means

Ensuring that when data is written or updated, it is the


Consistency
same across replicas.

Nodes communicate to keep track of who owns what, who


Coordination
has the latest data, etc.

Consensus Protocols like Raft, Paxos, or ZAB to agree on database


Algorithms changes across nodes.

Fault Tolerance System keeps running even if some machines crash.

Load Balancing Distributes read/write load across nodes efficiently.

Typical Architecture Overview

1. Client sends a request (read/write) to a coordinator node.


2. Coordinator decides:
o Which shard holds the data
o Which replica is best to serve the request
3. Request is forwarded to the right data node.
4. The node processes the request and responds.
5. For writes:
o The change is replicated to other nodes (synchronously or asynchronously).
o System ensures consistency depending on settings (strong or eventual).
6. Coordinator returns result to client.

Important Challenges

 CAP Theorem (Consistency, Availability, Partition tolerance — you can’t have all 3 at
the same time)
 Network Latency between nodes
 Clock synchronization (very hard!)
 Handling node failures and recoveries

Popular Distributed Databases


Database Type

Cassandra Wide-column, highly available, eventual consistency


Database Type

MongoDB Document store with replica sets and sharding

SQL database with strong consistency and high fault


CockroachDB
tolerance

Amazon
Key-Value store designed for massive scale
DynamoDB

Google Spanner SQL with global consistency (uses atomic clocks!)

Quick Visual
[Client] --> [Coordinator Node] --> [Data Node (Shard 1)]
--> [Data Node (Shard 2)]
--> [Data Node (Replica of Shard 1)]

Real-World Example

 When you post a photo on Instagram:


o The image itself is stored in a sharded distributed storage.
o Your user metadata (profile, comments) is stored across multiple replicated
databases.
o Even if a server in Europe fails, your data is still safe and reachable from replicas
elsewhere!

Would you like me to also create a simple diagram for this and add it to the PPT?
(Visuals really help in distributed system topics!)

Good question — this dives into how region-specific request routing works in a distributed
system. Let’s break it down clearly:

How Region-Specific Requests Are Handled

When a system operates across multiple geographic regions (like US-East, Europe-West, Asia-
South), it needs to route users' requests to the best/nearest region for:
 Faster response times (low latency)
 Better reliability
 Compliance (e.g., GDPR)
 Load distribution

Here's how it happens technically:

Key Mechanisms
Mechanism Description

Domain Name System (DNS) that resolves your domain


GeoDNS
to an IP address based on the user's physical location.

Anycast IP Same IP is advertised from multiple regions; the


Routing Internet routes user requests to the nearest one.

Load Balancers Application layer (Layer 7) load balancers check user's


with Geo-routing IP or metadata to forward to closest region.

The app itself can have logic to detect region from user
Application Logic
profile, cookies, etc., and route/redirect.

Global API Services like Cloudflare, AWS CloudFront, or Azure


Gateway / CDN Front Door route requests geographically.

How a Request Typically Flows

1. User sends a request (say, to www.example.com).


2. GeoDNS resolves the domain:
o User in India gets directed to India-Region Load Balancer.
o User in Germany gets directed to Europe-Region Load Balancer.
3. Load Balancer routes the request to a nearby application server and database replica.
4. If needed (e.g., no local copy), the system may:
o Proxy the request to the correct region
o Cache data in the nearest region after serving
5. Response is sent back, minimizing hops and delays.

Example Scenario

Suppose Instagram has servers in:


 Mumbai (Asia-South)
 Frankfurt (Europe)
 Virginia (US-East)

If you open Instagram from India:

 DNS lookup points you to Mumbai servers.


 Load balancer routes your API calls to Mumbai database clusters.
 Your photos, feeds, and notifications are fetched and updated from Mumbai servers.

If Mumbai region faces failure:

 System can failover to Singapore or Europe automatically.

Challenges in Region-Based Handling

 Data synchronization: Keeping data consistent between regions.


 Regulatory compliance: E.g., EU users' data must stay in Europe.
 Latency trade-offs: Writes might need to travel across regions.
 Global session management: Keeping a user logged in globally.

Simple Visual
[User India] --(DNS)-> [India Load Balancer] --(App Server)--> [India DB
Replica]
[User Germany] --(DNS)-> [Europe Load Balancer] --(App Server)--> [Europe DB
Replica]

Would you also want me to add this flow into your PPT with a region-based request
diagram? It’ll be easy to understand in interviews too!
(And if yes, do you want a simple one or a slightly detailed diagram?)

Excellent question — user login is super important (and tricky!) in distributed environments
because it touches authentication, session management, consistency, and availability.

Let’s break it down simply and systematically:

How User Login is Handled in a Distributed Environment


Basic Flow

1. User submits login request (username + password) from their device.


2. Load balancer routes the request to a nearby authentication server.
3. Authentication server checks:
o Username/password validity (often via database or identity provider)
o Multi-factor authentication (if enabled)
4. Once verified:
o System generates a session/token (JWT, OAuth token, cookie, etc.)
o Session/token is sent to user’s browser/app.

Now the user is "logged in" and every request they make includes this token.

Challenges in Distributed Systems


Problem How it's solved

Use stateless tokens (like JWT) OR replicate


Which server has
session state to all servers via a distributed cache
session data?
(Redis, etc.)

Consistency of Sync user databases across regions or use a central


authentication data auth service.

Speed across Place authentication servers close to users and use


regions local replicas for validation.

Global
Maintain a centralized session blacklist or a
logout/session
session store across regions.
invalidation

Failure handling Retry login at a different region if one fails.

Common Designs
1. Stateless Authentication (Preferred Modern Approach)

 After login, server issues a JWT token (JSON Web Token).


 JWT contains user info, expiry, permissions.
 No server memory needed: every server can validate JWT independently using public
keys.
Pros: Super scalable, no need for sticky sessions.

Cons: Revocation is tricky — needs token expiry or a revocation list.

2. Sticky Sessions (Legacy/Some Scenarios)

 Load balancer pins user to the same backend server where their session lives.
 Session stored in server memory.

Pros: Simple.

Cons: Fails badly if that server goes down.

3. Centralized Session Storage

 Sessions stored in distributed cache (e.g., Redis, DynamoDB).


 Any server can fetch session data.

Pros: Safe and scalable.

Cons: Slightly higher latency to session store.

Example Login Flow


User -> Load Balancer -> Nearest Auth Server -> Check DB
-> Generate JWT Token -> Return to User
User -> Next Requests -> Attach Token -> Any Server Validates

Quick Real-World Example

 When you log into Netflix:


o Your credentials are validated against a global user service.
o You get a JWT access token.
o Every API call (play movie, fetch profile) uses that token — Netflix servers
globally can verify it without contacting a central database each time.
o If Netflix rotates tokens, you'll be asked to re-login seamlessly.
Important Things to Handle

 Secure token storage (HTTP-only cookies, secure storage in app)


 Session expiration and refresh (refresh tokens)
 Multi-region authentication synchronization
 Single Sign-On (SSO) if needed
 Resilient to partial failures (retry different region)

Simple Diagram
[User] --> [Load Balancer] --> [Auth Server] --> [User DB]
--> [Token Service]
<-- [JWT Token]
(User attaches token in future requests)

Would you also want me to create a small diagram of this login flow and add to your PPT?
(Also can show both "Stateless Token" vs "Sticky Session" if you want side-by-side!)
Would you like that?

How Netflix manages the database in the distributed system. How does it handle low latency.
While explaining, give me a proper example for person who is uploading a video in India and a
someone is watching in USA.

Alright — let's dive deep into this with real clarity:


How Netflix manages databases in a distributed system + how it handles low latency
And we’ll use your example (uploading in India, watching in USA) while explaining.

How Netflix Manages Databases in a Distributed System


Netflix is huge, with millions of users worldwide, so their backend system is fully distributed
across multiple regions.

They do NOT rely on a single database.

Instead, they use:

 Cassandra (NoSQL DB) — highly available, multi-region replication.


 DynamoDB (AWS's distributed database) — for fast key-value access.
 MySQL + Vitess (for relational data that needs scaling).
 ElasticSearch (for search operations).
 S3 (for storing large files like videos and images).
 Edge Caches/CDN (for quick delivery).

⚡ How Netflix Handles Low Latency


Netflix minimizes delay by:

 Storing and caching data close to the user using CDNs (Content Delivery Networks)
like their own called Open Connect.
 Replicating metadata across multiple regions, so user profile, watchlists,
recommendations load locally.
 Smart request routing — if an edge server doesn't have the data, it fetches from the
closest available region instead of going to a distant server.
 Eventual consistency — writes propagate across regions, but reads prioritize availability
and speed.
 Microservices architecture — each service can scale independently (e.g., Video service,
User service).

📦 Your Scenario: Uploading a Video in India, Watching in


the USA
Let's imagine a new system similar to Netflix (because on Netflix users don’t upload — but
think of it like YouTube):

1. User Uploads a Video in India

 Upload goes to nearest AWS region (say, Mumbai region).


 Video binary is stored in S3 immediately.
 Metadata (title, description, uploader ID, etc.) is written into Cassandra (partitioned and
replicated to Asia clusters first).
 Background replication jobs start copying:
o Video file across global S3 buckets (using AWS replication rules).
o Metadata replication to Cassandra nodes in the US, Europe, etc.

At this point, the video is uploaded and stored in India first, and globally replicating in
background.
2. User in USA Tries to Watch the Video

 User’s Netflix app requests the video metadata.


 Request is routed to nearest Cassandra replica (likely US-East region).
 If metadata is already replicated: Instant access!
 If not replicated yet: App can fetch it cross-region (with slight delay) or show
"Available soon".
 Video file:
o If already replicated to USA's S3/CDN nodes — fast streaming.
o Otherwise, the system can stream from the India S3 bucket directly (slower),
while caching it locally after first play.

Meanwhile, Open Connect CDN servers close to the user cache the video for future plays.

Simple Visual
[Uploader India] --> [India Region DB + Storage] --> [Replicates]
--> [US Region DB + Storage]

[Viewer USA] --> [US Region DB for metadata]


--> [US CDN/S3 for video file]

Important Points Here


Step Explanation

Metadata
Must be fast — users want instant searchability.
replication

Video file
Can be slightly slower (since it's huge).
replication

First view may be slower; subsequent views are super fast


Caching
via CDN.

Metadata needs eventual consistency, but eventually


Consistency
fast enough so viewers feel "instant".

🧠 Summary:
 Writes (uploads) go to local region first.
 Reads (viewing/search) go to nearest available copy.
 Replication happens in background to ensure global consistency.
 CDNs and Edge caching are key to reducing latency for large files like videos.

🌟 Example Explained Simply:


Imagine you upload a 1GB video in India.

 Netflix stores your upload immediately in Mumbai region.


 It starts copying to the USA.
 A few seconds or minutes later, someone in the USA can watch it:
o Metadata already synced? Fast!
o Video file? If not yet in USA, might be a little slow first time; next time very fast
(cached).

Would you also want a diagram (PPT style) showing this full upload → replicate → watch
flow? 🚀
I can draw it for you if you'd like — it’ll be very useful for interviews too!
Would you like that?

Awesome, you're asking very sharp questions — let’s dive into this carefully:

You want the architecture of Hotstar (now Disney+ Hotstar) especially during live events
like IPL cricket matches.
This is a real-world, large-scale distributed system example!

Hotstar’s engineering team (they shared some public details) has built a massive, distributed,
and highly scalable system to handle millions of concurrent viewers (they crossed 25M+
concurrent viewers during IPL finals!).

High-Level Architecture of Hotstar During Live Streaming


(like IPL)

Core Components
Layer Technologies/Responsibilities

Content Ingestion Ingest live video feeds from camera crews,


Layer Technologies/Responsibilities

broadcasters.

Encoding and Convert the raw video into multiple qualities (240p,
Transcoding 720p, 1080p) using encoding farms.

Origin Servers Store original versions of the video streams.

Content Distribute content across thousands of edge servers


Distribution (CDN) globally. (Akamai, AWS CloudFront, or custom CDN).

Distribute user requests to different app servers


Load Balancers
efficiently.

Handle user sessions, subscriptions, personalization,


Application Servers
ads, etc.

Authentication and
Handle login, subscription validation, access control.
User Management

Analytics and
Track player behavior, drop-offs, real-time metrics.
Monitoring

Ads Insertion Dynamically insert ads for users (server-side ad


Systems insertion).

Real-time Push notifications, scores, stats using Websockets or


Chat/Score Updates Push APIs.

⚡ Detailed Step-by-Step Flow

1. Live Feed Ingestion

 Camera teams at stadium send live HD feed to Hotstar’s ingestion servers.


 Ingestion centers are deployed in multiple regions for redundancy.

2. Encoding/Transcoding

 The feed is encoded into different bitrates (adaptive streaming = ABR).


 Example:
o 1080p — for fast connections
o 720p, 480p, 360p — for slower connections
3. Storage at Origin Servers

 Encoded chunks (~2–6 seconds) are uploaded to the origin servers.

4. CDN Distribution

 Chunks are pushed to global CDNs (Content Delivery Networks).


 CDNs bring video closer to viewers to reduce latency.

5. User Connects

 When you open Hotstar app:


o Request goes to nearest CDN node via GeoDNS / Load Balancer.
o You fetch video segments from edge servers, NOT directly from Hotstar
backend.
 Playback client switches between qualities based on network speed (adaptive bitrate).

6. Real-Time Updates

 Separate channels (WebSocket/Server-Sent Events) push live scores, match statistics,


ads, polls, and chats.

7. Analytics and Monitoring

 Hotstar tracks:
o Number of concurrent users
o Session drops
o Video buffering events
o Advertisement impressions
 Data is pushed into a real-time big data pipeline (Kafka, Spark Streaming, etc.)

🧠 Architecture Diagram (Simple)


[Cameras @ Stadium]
↓ (Live Feed)
[Ingestion Servers]
↓ (Multiple bitrates)
[Encoding/Transcoding System]

[Origin Servers]

[CDN Edge Servers]

[Users: India, US, UAE, etc.]

Meanwhile:
- [Authentication Service] → Verifies user subscription
- [Ad Insertion Service] → Dynamically injects ads
- [Real-Time Score Server] → Pushes updates
- [Analytics Pipeline] → Monitors health

🚀 Challenges Hotstar Solves


Challenge How Hotstar Solves It

Massive Traffic Spikes Auto-scale using Kubernetes/AWS/GCP.

Low Latency for Live Push content to CDNs fast; use HLS/DASH
Streaming streaming protocols.

HLS-based multiple qualities delivered


Adaptive Quality Streaming
dynamically.

Server-side ad insertion (so ads feel native,


Ad Insertion
smooth).

Distributed architecture + auto-scaling +


Concurrent User Handling
efficient caching.

Separate WebSocket/Pub-Sub system from


Real-time Stats and Chat
video streaming.

Data Consistency and User Replicated user databases, caching layers


Authentication (Redis/Memcached).

🔥 Simple Example:
When you watch IPL on Hotstar from Delhi:

1. Your app fetches metadata & authentication via Hotstar backend (Delhi-based edge).
2. Video player requests the live stream:
o GeoDNS routes you to the nearest CDN node (maybe Akamai Delhi).
o You start getting 2-6 second video chunks.
3. Live scores are updated every few seconds via a WebSocket.
4. Ad breaks trigger ads which are dynamically inserted by servers.
5. All your interactions (pause, seek, resume) are tracked for analytics.
📢 Final Note:
Live streaming like IPL is not just video, it's a combination of:

 Ultra-fast video delivery


 Real-time data streaming (scores, stats, polls)
 Advertisement systems
 Massive horizontal scalability

Disney+ Hotstar is one of the biggest examples of running a live-event distributed


architecture in production today.

You might also like