Meta E6 System Design Cheat Sheet
Performance & Scalability
1. Latency – Time for a request to complete.
2. Throughput (TPS/QPS) – Requests processed per second.
3. Tail Latency (p95/p99) – Worst-case response times for top users.
4. Availability (HA) – Uptime percentage (e.g., 99.99%).
5. Consistency – Uniformity of data across replicas (Strong/Eventual).
6. Durability – Ensuring persisted data is never lost.
7. CAP Theorem – Consistency, Availability, Partition tolerance trade-offs.
8. PACELC Theorem – Adds Latency vs Consistency trade-offs to CAP.
9.
10. Scalability – Handling growing load (Vertical vs Horizontal).
11. Elasticity – Auto-adjusting resources to match demand.
12. Capacity Planning – Estimating infra for steady & peak loads.
13. Load Shedding – Dropping low-priority requests during overload.
14. Backpressure – Slowing producers when consumers are saturated.
15. Idempotency – Ensuring retries don’t cause duplicates.
16. Head-of-Line (HOL) Blocking – Queues blocked by a slow task.
17. Slow Start/Warm-up – Gradually increasing load on new nodes.
18. N+1 Problem – Inefficient multiple queries per operation.
19. Fan-out Limiting – Avoiding request explosion across services.
20. Rate Adaptation – Dynamic throttling based on system health.
Data Management & Storage
20. SQL (Structured Query Language) vs NoSQL – Relational vs key-value/document/column/graph.
21. Indexes – Speeding queries (B-tree, LSM-tree).
22. Sharding – Splitting data across servers (Range/Hash/Geo).
23. Consistent Hashing – Minimizing rebalancing in distributed storage.
24. Hot spotting – Overloaded partitions due to skewed access.
25. Replication – Copying data across nodes (Leader-Follower/Multi-Leader).
26. Replication Lag – Delay between primary and replica updates.
27. Quorums – Majority agreement on reads/writes for safety.
28. Eventual Consistency (EC) – Data eventually converges across replicas.
29. CRDTs (Conflict-free Replicated Data Types) – Auto conflict resolution.
30. Object Storage – Large unstructured blobs (e.g., S3).
31. Columnar Storage – Optimized for analytical queries.
32. Time-Series DB (TSDB) – Optimized for metrics & time-ordered data.
33. Graph Databases – For relationship-heavy workloads.
34. Materialized Views (MVs) – Precomputed query results for fast reads.
35. Write-Ahead Logging (WAL) – Ensures durability before commit.
36. Bloom Filters – Probabilistic existence checks.
37. LSM (Log-Structured Merge) Trees – Write-optimized data structures.
38. Cold vs Hot Storage – Archival vs frequently accessed data.
39. Schema Evolution – Safely changing DB schemas.
40. Federated Databases – Querying across multiple sources.
41. Data Lake vs Data Warehouse (DL/DW) – Unstructured vs structured analytics storage.
42. Polyglot Persistence – Different DBs for different needs.
43. CDC (Change Data Capture) – Streaming DB changes for syncing.
44. TTL (Time-to-Live) – Auto-expiring data.
45. HTAP (Hybrid Transactional/Analytical Processing) – Real-time analytics.
46. Cold/Hot/Warm Path – Layered processing pipelines.
47. Erasure Coding – Storage-efficient redundancy.
48. Merkle Trees – Hash-based integrity verification (Git, blockchain).
49. Hot/Cold Splits – Splitting tables based on access frequency.
50. Data Retention Policies – Automated cleanup for compliance.
Data Consistency & Transactions
52. 2PC (Two-Phase Commit) / 3PC – Coordinated distributed commits.
53. Idempotent Event Processing – Avoiding duplication on retries.
54. Snapshot Isolation (SI) – Consistent reads without locking.
55. Conflict Resolution – Last-write-wins, merges, vector clocks.
Caching & Acceleration
56. In-Memory Caching – Redis/Memcached for fast reads.
57. CDN (Content Delivery Network) – Edge caching for performance.
58. Cache Hierarchies – Multi-level caches (L1/L2/L3).
59. Cache Invalidation – Write-through, write-around, write-back.
60. Prefetching – Proactively loading likely-needed data.
61. Write Coalescing – Grouping small writes.
62. Cache Stampede Prevention – Avoiding thundering herd.
63. Near-Cache – Client-side caching.
64. Content-Aware Caching – Prioritizing cache by popularity.
65. Content Hashing – Cache-busting & integrity validation.
Distributed Systems & Coordination
66. Leader Election – Selecting a coordinator (Raft/Paxos).
67. Consensus Protocols – Distributed agreement (Raft, Paxos).
68. Vector Clocks – Ordering events across replicas.
69. Gossip Protocols – Peer-to-peer state sharing.
70. Service Discovery – Dynamic service registry.
71. Multi-Region Replication – Active-active/passive setups.
72. Disaster Recovery (DR) – RPO/RTO planning.
73. Self-Healing Systems – Automatic recovery mechanisms.
74. Quorum-based Commit – Majority confirmation for consistency.
75. Split-Brain Handling – Resolving partition conflicts.
76. Leaderless Consensus – Dynamo-style quorums.
77. Service Meshes – Istio/Linkerd for routing & security.
78. Geo-Partitioning – Placing data close to users for latency/compliance.
79. Automated Failback – Safely switching back to primary after failover.
Load Management & Networking
80. Load Balancing (LB) – L4/L7 routing (round-robin, least connections).
81. Sticky Sessions – Binding users to specific servers.
82. Reverse Proxies – Request routing & security (Nginx/Envoy).
83. API Gateway – Unified API entry point.
84. Protocol Conversion – REST↔gRPC↔WebSockets.
85. Rate Limiting – Token bucket/leaky bucket algorithms.
86. Throttling vs Quotas – User vs system-wide limits.
87. GTM (Global Traffic Management) – Anycast & geo-aware routing.
88. Edge Computing – Processing closer to data sources.
89. Circuit Breaker Pattern – Prevent cascading failures.
90. Retry with Exponential Backoff – Controlled retries.
91. QUIC Protocol – Low-latency transport (HTTP/3).
92. Multi-CDN Strategies – Using multiple CDNs for redundancy.
93. API Aggregators – Combining backend calls into optimized APIs.
Messaging & Eventing
94. Message Queues (MQ) – Async communication (Kafka, RabbitMQ).
95. Pub/Sub – Decoupled producer-consumer design.
96. Fan-out / Fan-in – Broadcasting or aggregating events.
97. Backpressure Handling – Preventing overload in consumers.
98. Dead Letter Queues (DLQ) – Isolating failed messages.
99. Ordering Guarantees – At-most/at-least/exactly-once semantics.
100. Eventual Delivery – Ensuring retries succeed.
101. Asynchronous Communication (Async) – Loose coupling between services.
102. Outbox Pattern – Ensuring DB and event bus consistency.
103. Event Replay – Reprocessing historical events.
Processing Architectures
104. Batch Processing – Large-scale ETL pipelines.
105. Stream Processing – Real-time event handling.
106. Micro-batching – Near real-time processing.
107. Lambda Architecture – Hybrid batch + streaming.
108. Kappa Architecture – Stream-only pipelines.
109. Serverless Computing (FaaS) – Event-driven functions.
110. Workflow Orchestration – Multi-step task management (Airflow/Temporal).
111. Real-time Aggregation – Live metrics precomputation.
112. Choreography vs Orchestration – Event-driven vs central workflows.
113. Compensating Transactions – Undo steps in workflows.
114. Backfill Processing – Re-ingesting delayed/missing data.
System Design Patterns
115. CQRS (Command Query Responsibility Segregation) – Separate read/write models.
116. Event Sourcing – Storing state as events.
117. Saga Pattern – Distributed transaction handling.
118. Bulkhead Pattern – Isolating components to limit failure impact.
119. Sidecar Pattern – Adding observability/security side-processes.
120. Strangler Fig Pattern – Gradually replacing legacy systems.
121. Backends for Frontends (BFF) – Separate APIs per client type.
122. Multi-Tenancy – Supporting multiple customers with isolation.
123. Anti-Corruption Layer (ACL) – Protecting clean domains from legacy.
124. Brownout Pattern – Gradual degradation under stress.
Reliability & Deployments
125. SPOF (Single Point of Failure) – Eliminating critical dependencies.
126. Failover – Hot/warm/cold disaster recovery mechanisms.
127. Canary Releases – Gradual rollout for safety.
128. Feature Flags – Toggling features dynamically.
129. Shadow Traffic – Testing with real production traffic invisibly.
130. Chaos Engineering – Testing resilience via controlled failures.
131. Graceful Degradation – Reducing features under load.
132. Fail-Fast Design – Quick detection and termination of failing paths.
133. Self-Stabilizing Systems – Automatic recovery to steady state.
134. Multi-Homing – Redundancy across multiple DCs.
135. Blameless Postmortems – Post-incident learning process.
Observability & Ops
138. SLA/SLO/SLI (Service-Level) – Defining service health & guarantees.
139. Golden Signals – Latency, traffic, errors, saturation.
140. Health Checks & Heartbeats – Detecting service failures.
141. Distributed Tracing – End-to-end request observability.
142. Centralized Logging – ELK/Splunk for unified logs.
143. Real-time Dashboards – Prometheus/Grafana for live metrics.
144. Anomaly Detection – Detecting unexpected behaviors.
145. Feature-level Metrics – Tracking impact of new features.
146. Canary Analysis Automation – Automatic canary rollout validation.
147. Error Budgets – Balancing reliability with release velocity.
148. Operational Runbooks – Step-by-step incident response guides.
149. Audit Logging – Compliance and forensic traceability.
150. RUM (Real User Monitoring) – Client-side performance monitoring.
API & Application Layer
151. REST vs gRPC vs GraphQL – API styles and trade-offs.
152. Federated GraphQL – Unified schema across microservices.
153. Pagination – Offset vs cursor-based fetching.
154. Idempotent APIs – Ensuring safe retries.
155. API Versioning – Managing backward compatibility.
156. Long-Polling vs WebSockets vs SSE – Realtime communication techniques.
157. WebRTC (Web Real-Time Communication) – Peer-to-peer streaming.
158. TURN/STUN Servers – NAT traversal for WebRTC.
159. gRPC Streaming – Bi-directional API streams.
160. API Contract Testing – Ensuring compatibility across versions.
161. API Aggregators – Single API that consolidates multiple backend calls.
Security & Privacy
162. AuthN vs AuthZ – Authentication vs Authorization.
163. OAuth 2.0, OpenID, SAML – Standard auth protocols.
164. JWT (JSON Web Tokens) – Stateless signed authentication tokens.
165. HMAC / Digital Signatures – Ensuring integrity and authenticity.
166. Zero Trust Architecture – No implicit trust, continuous verification.
167. Data Encryption – Encrypting at rest & in transit.
168. PII Handling & GDPR – Privacy-compliant data processing.
169. WAF (Web Application Firewall) – Protecting against web exploits.
170. mTLS (Mutual TLS) – Service-to-service authentication.
171. Secrets Management – Securing API keys/passwords (Vault/KMS).
172. Differential Privacy (DP) – Protecting individual data in analytics.
173. Secure Multiparty Computation (SMPC) – Joint computation without exposing inputs.
Cost & Efficiency
174. Cost-Aware Design – Balancing performance vs infra costs.
175. Resource Quotas – Enforcing per-service limits.
176. Autoscaling Policies – Reactive vs predictive scaling.
AI/ML Integration
177. Online vs Offline Model Serving – Real-time vs batch inference.
178. Shadow Model Deployment – Testing new ML models in production invisibly.
179. Feedback Loops – Handling model drift & retraining.
180. Ranking Infrastructure – Personalized content delivery systems.
181. Precomputation Pipelines – Pre-ranked feed & recommendation pipelines.
Real-Time & Media
183. Real-Time Media Optimization (ABR) – Adaptive bitrate streaming.
184. State Synchronization – Keeping shared state consistent across users.
Meta-Specific
185. Planet-Scale Systems – Billions of users, multi-region infra.
186. Real-Time Features – Chats, notifications, live feeds.
187. Offline Sync & Conflict Resolution – Reliable offline-first systems.
188. Content Moderation Pipelines – Human + ML workflows for safety.
189. Experimentation Frameworks – A/B testing at scale.
190. Feed Ranking & Personalization – ML-driven recommendation infra.
191. Feature Gating – Controlled rollout by user, geography, or cohort.
192. Privacy by Design – Integrating privacy at system-architecture level.