System Design CheatSheet
System Design CheatSheet
This quick reference quick sheet I made covers the most important system design concepts, making it easy to review key points
right before your interview.
Use Cases/Problems System Design Component What it solves Caveats/Issues Mitigations Examples of Tools
Questions
- Unified API access: - Design an API API Gateway Single entry point, Can become a - Use multiple gateways Kong, Apigee, AWS
Centralizes client requests. gateway for manages bottleneck, adds with load balancing. API Gateway
microservices. authentication latency.
- Security: Manages - Implement secure and routing. - Implement rate limiting
authentication and and scalable API and caching.
authorization. access.
- Use circuit breakers and
retries.
- High traffic websites: - Design a scalable Load Distributes traffic Single point of - Use multiple load Nginx, HAProxy,
Ensures uptime and web application. Balancer across across workers, failure, adds balancers in different AWS ELB
balances load. multiple improves complexity. regions.
- Scalable APIs: Distributes - Build a highly redundant reliability and - Implement health checks.
incoming requests. available online workers availability.
service.
- Use DNS-based load
balancing.
- Financial transactions: - Design a financial SQL Database Strong ACID Limited - Implement sharding. MySQL,
Requires ACID compliance. transaction system. properties, scalability, PostgreSQL, MS
structured data, schema SQL Server
- Complex queries: Needs - Create a scalable complex queries. management. - Use read replicas.
structured and relational relational database.
data.
- Employ clustering and
partitioning.
1
- Large-scale data: Supports - Design a large-scale NoSQL Database Flexible schema, Eventual - Use consistency settings MongoDB,
horizontal scaling. user profile store. horizontal consistency, (e.g., quorum Cassandra,
scalability, high limited reads/writes). DynamoDB
- Unstructured data: Flexible - Create a scalable performance. transaction - Design for idempotent
schema adapts to changes. data storage support. operations.
solution.
- Implement conflict
resolution strategies.
- High availability: Ensures - Design a data Data Replication Ensures data Increases costs, - Use asynchronous AWS RDS standby
data is replicated and replication strategy. durability, to consistency replication. (synchronous), AWS
available. - Implement a highly ensure system issues. - Implement conflict RDS Read Replicas
available database availability. resolution. (asynchronous),
system. MongoDB Replica
- Use multi-master Set (asynchronous)
replication.
- High read load: Reduces - Design a high- Cache Reduces latency, Cache - Implement cache Redis, Memcached
latency for frequent reads. performance caching decreases load on consistency invalidation strategies.
layer. databases. issues, potential
- Session storage: Speeds up - Optimize read- for stale data. - Use Time-to-Live (TTL)
access to session data. heavy workload. settings.
- Employ write-through or
write-back caching.
- Real-time analytics: - Design a real-time In-Memory Extremely fast Volatile storage, - Enable persistence Redis, Memcached
Requires fast data access. analytics system. Database data retrieval, high memory options.
reduces latency. cost.
- Leaderboards: High-speed - Create a fast - Use hybrid storage
data retrieval is crucial. leaderboard service. models (in-memory + disk).
2
- Event streaming: Manages - Design a real-time Message Broker Facilitates Bottleneck - Use scalable brokers with Apache Kafka,
high-throughput data event streaming message potential, partitions. RabbitMQ,
streams. platform. exchange, delivery ActiveMQ
supports multiple guarantees.
- Real-time processing: - Implement a patterns. - Implement backpressure
Facilitates real-time data reliable messaging handling.
flows. system.
- Monitor message broker
performance.
- Event-driven systems: - Design an event- Distributed Manages Message - Use message brokers Apache Kafka,
Manages asynchronous driven architecture. Queue asynchronous ordering and with strong ordering RabbitMQ, AWS
events. communication, delivery guarantees. SQS
- Microservices: Decouples - Create a reliable decouples guarantees. - Implement idempotent
service communication. task processing components. message processing.
system.
- Use message
deduplication techniques.
- Large applications: - Design a scalable Microservices Improves Increased - Use service meshes. Docker,
Enhances modularity and microservices modularity, communication Kubernetes, Istio
scalability. architecture. independent complexity.
- Continuous delivery: - Build a modular, deployment. - Implement standardized
Facilitates independent independently APIs.
deployment. deployable system.
- Use centralized logging
and monitoring.
- Microservices: Enables - Design a service Service Registry Tracks services High availability - Use distributed service Consul, Eureka,
service discovery. discovery and their required, registries. Zookeeper
mechanism. instances. consistency
3
- Dynamic environments: - Implement dynamic issues. - Implement regular health
Tracks changing service service registration. checks.
instances.
4
- Media storage: Handles - Design a scalable File Storage Scales with data Backup and - Use distributed file AWS S3, Google
large files like images and file storage system. growth, handles redundancy systems. Cloud Storage,
videos. unstructured required, HDFS
- Backup solutions: Stores - Implement a data. retrieval - Implement multi-region
and retrieves backups. reliable backup latency. replication.
solution.
- Use lifecycle policies for
data management.
- Data warehousing: - Design an ETL ETL Pipeline Facilitates data Complex to - Use managed ETL Apache Nifi, AWS
Prepares data for analysis. pipeline for a data integration and build and services. Glue, Talend
warehouse. analysis. maintain.
- Data migration: Transforms - Build a reliable data - Implement monitoring
data from multiple sources. integration system. and error handling.
5
- Secure applications: - Design a secure Authentication Enhances security, Single point of - Use multi-factor OAuth, Okta, Auth0
Manages user identity and authentication Service manages user failure, security authentication.
access. system. authentication. measures
- Single sign-on: Centralizes - Implement a single needed. - Implement redundancy
authentication across sign-on solution. and failover.
services.
6
- Monitoring: Provides - Implement a live - Implement windowing
instant insights from data data aggregation and aggregation
streams. platform. techniques.
- Monitor and scale
processing infrastructure.
- Microservices: Tracks - Design a distributed Distributed Aids in debugging High overhead, - Use sampling to reduce Jaeger, Zipkin,
requests across services. tracing system. Tracing and performance integration overhead. OpenTracing
- Performance tuning: - Implement monitoring. required. - Implement efficient trace
Identifies bottlenecks and performance storage.
delays. monitoring for
microservices.
- Use correlation IDs for
request tracking.
- Fault tolerance: Prevents - Design a fault- Circuit Breaker Protects services Adds - Use monitoring tools to Hystrix,
system overloads. tolerant from cascading complexity, detect failures. Resilience4j, Istio
microservices failures. tuning needed.
system.
- Resilient services: Isolates - Implement circuit - Implement fallback
failures in microservices. breakers for service strategies.
reliability.
- Use retries and
exponential backoff.
- API management: Protects - Design an API rate Rate Limiter Controls request Can impact user - Use dynamic rate Kong, Envoy, Nginx
against request floods. limiting system. rate, prevents experience. limiting.
abuse.
- Fair resource allocation: - Implement a fair - Implement user-based
Ensures fair usage policies. resource allocation quotas.
mechanism.
- Use monitoring to adjust
limits.
7
- Periodic tasks: Automates - Design a job Scheduler Manages Requires - Use distributed Apache Airflow,
recurring jobs. scheduling system. background jobs monitoring, can schedulers. Celery, Kubernetes
and tasks. become CronJobs
- Batch processing: Manages - Implement a bottleneck. - Implement job
large data processing tasks. reliable task prioritization.
processing system.
- Use monitoring and retry
mechanisms.
- Microservices: Handles - Design a service Service Mesh Manages Adds - Use managed service Istio, Linkerd,
inter-service mesh for microservices operational meshes. Consul Connect
communication. microservices. communication. complexity.
- Observability: Provides - Implement - Implement automation
insights into service observability for tools.
interactions. service interactions.
- Use monitoring and
observability tools.
- Disaster recovery: Ensures - Design a backup Data Backup and Ensures data Resource- - Use automated backup Use native backup
data is safe and recoverable. and recovery system. Recovery durability, intensive, solutions. capabilites of the
protects against regular testing data store, or
- Data integrity: Maintains - Implement a data loss. needed. - Implement multi-region centralized backup
backups for compliance. reliable disaster Increases costs. storage. products like AWS
recovery solution. if backups are Backup, Google
not up to date - Regularly test backup and Cloud Backup,
there will be recovery processes. Veeam
data loss.
Backup may not
be accessible.
- Social networks: Models - Design a social Graph Database Efficiently handles Steep learning - Use graph-specific Neo4j, Amazon
complex relationships. network graph graph-based data curve, non- optimizations. Neptune, OrientDB
database. and relationships. graph query
8
- Recommendation engines: - Implement a inefficiency. - Implement hybrid models
Analyzes connected data. recommendation for different data types.
engine.
- Use indexing and caching
for performance.
- Big data analytics: Stores - Design a big data Data Lake Supports diverse Governance - Use metadata AWS Lake
and processes vast data. analytics platform. data types and required, risk of management. Formation, Azure
analytics. becoming data Data Lake, Hadoop
- Data warehousing: - Implement a data swamp. - Implement data
Prepares raw data for lake for diverse data cataloging.
analytics. types.
- Use data lifecycle
policies.
- Event-driven architectures: - Design a real-time Data Streaming Facilitates real- High - Use managed streaming
Processes data streams in data streaming Platform time data operational services.
real-time. system. processing. complexity.