100% found this document useful (1 vote)
230 views18 pages

Systems Design Interview Guide

This document provides guidance on designing distributed systems. It discusses asking clarifying questions about requirements, performing back-of-the-envelope estimations of scale and resources needed, defining system interfaces and APIs, modeling the data, designing the system at a high level and in detail, and identifying and resolving bottlenecks. Examples are provided for designing a Twitter-like service and a URL shortening service. Key aspects covered include concurrency vs parallelism, processes vs threads, the stack and heap memory, and basics of scaling systems on OpenShift.

Uploaded by

Satya Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
230 views18 pages

Systems Design Interview Guide

This document provides guidance on designing distributed systems. It discusses asking clarifying questions about requirements, performing back-of-the-envelope estimations of scale and resources needed, defining system interfaces and APIs, modeling the data, designing the system at a high level and in detail, and identifying and resolving bottlenecks. Examples are provided for designing a Twitter-like service and a URL shortening service. Key aspects covered include concurrency vs parallelism, processes vs threads, the stack and heap memory, and basics of scaling systems on OpenShift.

Uploaded by

Satya Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Systems Design Interview Study Guide - rev 1.

Zhu, Ray

Designing Distributed Systems, Step by Step


1. Ask clarifying questions
a. examples for designing Twitter-like service:
i. Will users be able to post tweets and follow other people?
ii. Should we also design to create and display the user’s timeline?
iii. Will tweets contain photos and videos?
iv. Do we need to display hot trending topics?
v. Will there be push notifications?
2. Back of the envelope estimation
a. What scale is expected from the system? (e.g., number of new tweets, number of tweet views,
number of timeline generations/second, etc.)
b. How much storage will be needed? We will have different storage requirements if users can
have photos and videos in their tweets
c. What network bandwidth usage are we expecting? This is crucial in deciding how we will
manage traffic and balance load between servers
3. System interface/API definition
a. Some examples for twitter service:
i. postTweet(user_id, tweet_data, tweet_location, user_location,
timestamp, …)
ii. generateTimeline(user_id, current_time, user_location, …)
iii. markTweetFavorite(user_id, tweet_id, timestamp, …)
4. Defining the data model
a. Examples for Twitter Service:
i. User: UserID, Name, Email, DoB, CreationData, LastLogin, etc.
ii. Tweet: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc.
iii. UserFollows: UserdID1, UserID2
iv. FavoriteTweets: UserID, TweetID, TimeStamp
b. What database solution should we use?
5. High level design, usually a block diagram detailing the data flow between sections of the application
6. Detailed design
a. Dig deeper into two or three major components
b. Should be able to present different approaches and their pros and cons, and why we prefer
one approach vs the other
c. Examples:
i. Since we are storing a lot of data, how should we partition it to distribute it to multiple
DBs?
ii. How will we handle hot users who tweet a lot of follow lots of peoples?
iii. Since users’ timelines will contain most recent and relevant tweets, should we try to
store our data in such a way that is optimized for scanning latest tweets?
iv. How much and at which layer should we introduce cache to speed things up?
v. What components need better load balancing?
7. Identifying and resolving bottlenecks
a. Try to discuss as many as possible and different approaches to mitigating them
i. Is there a single point of failure on the system? What are we doing to mitigate it?
ii. Do we have enough replicas of the data so that if we lose a few servers, we can still
serve our users?
iii. Do we have enough copies of different services such that a few failures will not cause
complete system failure?
iv. How are we monitoring the performance of the services? Do we get alerts when critical
components fail or their performance degrades?

Systems Design Examples and Questions

1. Design a URL Shortening service like TinyURL


a. Requirements and goals of the system
i. What are the functional requirements?
1. Given a URL, our service should generate a shorter and unique alias of it.
2. When users access the short link, it should redirect to original link
3. Users should optionally be able to pick a custom link for their URL
4. Links will expire after a standard default timespan. Users should be able to
specify the expiration time
ii. What are the non-functional requirements?
1. The system needs to be highly available. This is required due to URL
redirections being down if our service goes down
2. URL redirections should happen in real time with minimal latency
3. Shortened links should not be guessable (not predictable)
iii. Extended requirements
1. Analytics
2. Service should be available through rest API for other services
b. Capacity estimation and constraints
i. Read-heavy system. Lots of redirection requests coming compared to new URL
generation. Can assume 100:1 ratio for this example
ii. Traffic estimates
1. Assuming 500M new shortenings per month, with 100:1 r/w ratio, we can expect
100 * 500M => 50B redirections/month
2. What is the QPS for our system? New URL shortenings per second:
a. 500M / (30 days * 24 hours * 3600 sec) => ~200 URLs/s
3. Considering 100:1 r/w ratio, URL redirections/second
a. 100 * 200 URLs/s = 20K/s
iii. Storage estimates
1. Assuming we store URL shortening requests and their links for 5 years
2. Total number of objects we expect to store:
a. 500 million *5 years * 12 months = 30B
3. Total storage needed, assuming an object is 500bytes:
a. 30B * 500bytes = 15TB
iv. Bandwidth estimates
v. TODO: finish(?)

Operating Systems Knowledge


1. Concurrency vs parallelism: what’s the difference?
a. Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It
doesn't necessarily mean they'll ever both be running at the same instant. Eg. multitasking on a single-
core machine.
b. Parallelism is when tasks literally run at the same time, eg. on a multicore processor.
2. What is a thread and what is a process?
a. Process:
● An executing instance of a program is called a process.
● Some operating systems use the term ‘task‘ to refer to a program that is being executed.
● A process is always stored in the main memory also termed as the primary memory or random
access memory.
● Therefore, a process is termed as an active entity. It disappears if the machine is rebooted.
● Several process may be associated with the same program.
● On a multiprocessor system, multiple processes can be executed in parallel.
● On a uniprocessor system, though true parallelism is not achieved, a process scheduling algorithm
is applied and the processor is scheduled to execute each process one at a time yielding an illusion
of concurrency.
● Example: Executing multiple instances of the ‘Calculator’ program. Each of the instances are
termed as a process.

Thread:

● A thread is a subset of the process.


● It is termed as a ‘lightweight process’, since it is similar to a real process but executes within the
context of a process and shares the same resources allotted to the process by the kernel.
● Usually, a process has only one thread of control – one set of machine instructions executing at a
time.
● A process may also be made up of multiple threads of execution that execute instructions
concurrently.
● Multiple threads of control can exploit the true parallelism possible on multiprocessor systems.
● On a uniprocessor system, a thread scheduling algorithm is applied and the processor is scheduled
to run each thread one at a time.
● All the threads running within a process share the same address space, file descriptors, stack and
other process related attributes.
● Since the threads of a process share the same memory, synchronizing the access to the shared
data within the process gains unprecedented importance.

3. What and where are the stack and heap?


a. The stack is the memory set aside as scratch space for a thread of execution. When a function is called,
a block is reserved on the top of the stack for local variables and some bookkeeping data. When that
function returns, the block becomes unused and can be used the next time a function is called. The
stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the
next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the
stack is nothing more than adjusting one pointer.
b. The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the
allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at
any time. This makes it much more complex to keep track of which parts of the heap are allocated or
free at any given time; there are many custom heap allocators available to tune heap performance for
different usage patterns.

Scaling - Basics and Knowledge


1. Basics of scaling Openshift blog post
a. Have DNS direct requests to servers in a manner that doesn’t put too much work on one
b. Eliminate single points of failure, redundancy is key
c. Figure out what compromises are acceptable for your business needs, decide between horizontal or
vertical scaling or both
d. Use a load balancer to distribute requests between workers and app servers
e. Choose a DB schema and technology suited for your application and its workload
f. Use a load balancer for distributing DB I/O, Master/Slave DB Configuration
g. Use a distributed cache that sits between the app server and DB (ie, Redis), and put it in a separate
layer from the application tier so that it can scale independently
h. Spread your content between servers, possibly even in different geographical locations (use a CDN
i. Never store state in application layer. Use a separate DB so that e/ node in the cluster can access it
j. Use asynchronous calls between your services. X will scale better if it isn’t tightly coupled to Y
2. Horizontal versus vertical scaling
a. Vertical scaling means increasing the resources of one node. It can be increasing CPU power or RAM
capacity or drive throughput. This form of scaling is limited by technology. As you reach the bleeding
edge of server technology, you will reach a point of severely diminishing returns for the amount of money
you’re spending
b. Horizontal scaling means opening more nodes to do work. The nodes don’t necessarily have to be
bleeding edge or super powerful, but more nodes and more clusters doing work will decrease the work
done on any one server and increase total throughput. This is what most companies do nowadays.
i. Apache Zookeeper - used for distributed management and configuration, synchronization
between hosts in a service. It helps coordinate distributed nodes.
3. CAP Theorem
a. Consistency, Availability, and Partition Tolerance: pick two.
i. Note that the above isn’t necessarily true. For the purposes of real life applications, it’s nearly
impossible to build a system that is impervious to network partitions. In that case, the systems
designer is therefore forced to choose between consistency and availability. In the case of
consistency, the system will shutdown or refuse to service requests until the error condition is
resolved, causing momentary unavailability. In the case of availability, the system will continue to
service requests but at the risk of sending outdated data, potentially causing system
misbehavior. Typically, modern services will choose availability over consistency, because
downtime is unacceptable from a business standpoint.
4. Clones
a. Clones contain the same codebase and serve the same services and data to users. The load balancer
distributes workload between clones
b. Golden rule of scalability: every server contains exactly the same codebase and doesn’t store any user
related data like sessions or profiles on local disc or memory
c. Sessions should be stored on a centralized data store which is accessible to all application servers in
your system.
i. Can be external DB or external persistent caches like Redis
1. The latter has better performance
d. Deployment issues solved with tools like Capistrano, also can create image files (AWS called this AMI -
Amazon Machine Image)
i. Use this as a super clone that new instances are based on
5. Database
a. Scalable apps need to account for huge I/O, and that can be an enormous burden to traditional DBs like
MySQL
b. You have two options:
i. Keep trying to make the relational DB performant, which will eventually become too time
consuming and expensive
ii. Denormalize your DB and continue to use a relational DB in a NoSQL manner or switch to a
NoSQL DB like MongoDB. You will need to introduce a cache.
6. Caching
a. Runs on memory and so has limited capacity.
b. Should never be the source of truth. Cached items can be outdated
c. Only use a cache if you know your system needs it. Can introduce more problems than solutions
d. Caching objects > caching DB queries
i. Store the data like an object or class, makes it easier to delete it later if something changes in
your DB
ii. Makes the overall operation of your code more logical and faster
iii. Makes asynchronous processing possible.
e. Redis, Memcached
f. Caches - Redis vs Memcached (which is better?) StackOverflow Link
i. Read/write speed: Both are extremely fast. Benchmarks vary by workload, versions, and many
other factors but generally show redis to be as fast or almost as fast as memcached.
ii. Memory usage: Redis is better.
1. Redis can reclaim memory and will never use more than its allotted RAM. Memcached
quickly reaches its RAM limit and never reclaims memory
2. “I stored 100,000 ~2KB strings (~200MB) of random sentences into both. Memcached
RAM usage grew to ~225MB. Redis RAM usage grew to ~228MB. After flushing both,
redis dropped to ~29MB and memcached stayed at ~225MB. They are similarly efficient
in how they store data, but only one is capable of reclaiming it.”
iii. Disk I/O dumping: A clear win for redis since it does this by default and has very configurable
persistence. Memcached has no mechanisms for dumping to disk without 3rd party tools.
iv. Scaling: Both give you tons of headroom before you need more than a single instance as a
cache. Redis includes tools to help you go beyond that while memcached does not.
v. Other Points: Redis has: better documentation, bigger community, better persistence and
availability, isn’t limited to strings like Memcached is, atomic transactions, HyperLogLog and
Bitmap, Pipelining commands for greater throughput, support for clustering and high availability
with redis-sentinel
7. Load Balancer
a. Runs between clients and nodes to direct requests to nodes in a manner that balances load across
nodes (see: clones). Can run in a round robin manner or through other means.
b. Prevents requests from going to unhealthy servers, prevents overloading resources, helps eliminate
single points of failure
i. Additional Benefits
1. SSL Termination
a. Decrypt incoming requests and encrypt server responses so that servers don’t
need to do these potentially expensive operations. Removes need to install
X.509 Certificates on each server
2. Session Persistence
a. Issue cookies and route a specific client’s requests to same instance if the web
app doesn’t keep track of sessions
c. IRL examples: NGINX, HAProxy, hardware (very expensive)
8. Content Delivery Networks (CDN) and Edge
a. CDNs are copies of a service’s assets placed in strategic geographical locations around the world that
service clients based on their location. Can be done via DNS server returning the IP of the closest edge
server to a client
b. Typically serve static content such as JS/HTML/CSS, photos, and videos, but some CDNs such as
Amazon CloudFront can serve dynamic content
c. Two types of CDNs
i. Push CDNs
1. Receive new content whenever changes occur on the server
2. Service takes full responsibility for providing content, uploading directly to CDN and
rewriting URLs that point to CDN.
3. Good for sites with small amounts of traffic or don’t have regularly updated content
ii. Pull CDNs
1. Grab new content from the server when the first user requests the content. Content
stays on the server as long as the TTL field isn’t 0.
2. Pull CDNs minimize storage space on CDN but create redundant traffic if files expire
before they’ve actually changed
3. Works well with sites that have heavy traffic, as it’s spread out more evenly with only
recently requested content remaining on CDN
9. Reverse Proxy
a. Web server that centralized internal services and provides unified interfaces to the public.
b. Requests from clients are forwarded to a server that can fulfill it before a reverse proxy returns the
server’s response to the client
c. Benefits
i. Increased security (obfuscate information about service topology, blacklist IPs, limit number of
connections per client
ii. Increased scalability and flexibility
1. Clients only see the reverse proxy IP, allowing you to scale servers and change
configuration
iii. SSL Termination
iv. Compression of server responses
v. Caching
vi. Serve static content directly
10. Load Balancer vs. Reverse Proxy
a. Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to
a set of servers serving the same function.
b. Reverse proxies can be useful even with just one web server or application server, opening up the
benefits described in the previous section.
c. Solutions such as NGINX and HAProxy can support both layer 7 (application level) reverse proxying and
load balancing.
11. Asynchronism
a. Two types of async
i. Type 1: Preprocess data. For example, you could take a bunch of dynamic content generated by
a massive framework and store it as static HTML and serve that to users. Only reprocess and
rebuild when something changes
ii. Type 2: Have a work queue and have the front end wait for the job to finish and then serve it up
when it’s done. Allows nearly infinitely scalable backends and snappy frontends. Some
interfaces can do other processing while waiting for a job to finish. For example, a Twitter user
posting a status could see the status posted in her feed immediately upon clicking “post”, but the
actual post could take a few minutes to propagate to others.
b. Message Queue
i. A message queue allows asynchronous processing by taking requests and handling them in a
specific order, returning the work when it is either done or failed. This allows the client to do
other work while waiting for its request to be processed.
ii. Examples of message queues include Kafka and RabbitMQ
12. DNS Servers
a. Translates a domain name such as www.google.com to an IP address
b. Hierarchical system with authoritative name servers at the top level. ISP or router provides information
about which DNS server to contact when doing a lookup. Lookups are typically cached in the OS or in
the browser, and have a TIME_TO_LIVE (TTL) field.
c. Types of DNS records
i. NS record (name server) - Specifies the DNS servers for your domain/subdomain.
ii.MX record (mail exchange) - Specifies the mail servers for accepting messages.
iii.A record (address) - Points a name to an IP address.
iv. CNAME (canonical) - Points a name to another name or CNAME (example.com to
www.example.com) or to an A record.
d. Disadvantages of DNS
i. Can introduce latency (mitigated somewhat by caching)
ii. DDoS attack point
iii. Server management of DNS is complex and generally managed by ISPs, Governments, and
large companies
13. Typical Request Flow
a. Client Request -> DNS -> CDN -> Load Balancer -> Cache -> Service Worker -> Cache/Other -> DB
Master -> DB Slaves

Database Knowledge (RDBMS and NoSQL)


1. ACID Principle
a. All operations in a transaction succeed or every operation is rolled back.
b. Consistency means that you guarantee that your data will be consistent; none of the constraints you
have on related data will ever be violated. On the completion of a transaction, the database is structurally
sound.
c. Isolated means that transactions do not contend with one another. Contentious access to data is
moderated by the database so that transactions appear to run sequentially.
d. Durability means that the results of applying a transaction are permanent and recorded, even in the
presence of failures.
2. BASE Principle
a. Basic Availability means that the database appears to work most of the time
b. Soft-State means that stores don’t have to be write-consistent, nor do different replicas need to be
mutually consistent all the time
c. Eventual Consistency means that the stores will exhibit consistency at some later time, typically lazily at
reads
3. BASE vs ACID
a. BASE values availability over guaranteed consistency. Important for scaling
b. BASE less consistent than ACID, has less overhead b/c ACID is a heavyweight set of constraints to
guarantee
c. BASE is used primarily by aggregate stores, incl. Column family, k-v, and document stores
4. Strong vs Eventual Consistency
a. Strong consistency ensures that the data you get is always the latest. Eventual consistency means that
the data will be consistent only after certain time periods or is not guaranteed. A typical data reliability
versus availability trade off
5. Normalization and Denormalization
i. Denormalization is a time-space trade-off. Normalized data takes less space, but may require
join to construct the desired result set, hence more time. If it's denormalized, data are replicated
in several tables. It then takes more space, but the desired view of the data is readily available.
6. Replication Schemes
a. Master-Slave Replication
i. Master serves reads/writes, replicating writes to one or more slaves, which server read-only.
ii. Slaves can replicate to other slaves in a tree-like manner
iii. If master goes offline, system can continue to operate in read-only mode until new master
promoted
iv. Disadvantages
1. Additional logic needed for promotion to master
b. Master-Master Replication
i. Both masters serve reads and writes and coordinate with each other on writes. If either master
goes down, the system can continue to operate with both reads and writes
ii. Disadvantages
1. Needs load balancer or changes to app logic to determine where to write
2. Most master-master configs are either loosely consistent and violate ACID or have
increased latency due to synchronization between the servers
3. Conflict resolution needed as more write nodes are added and latency increases
c. Disadvantages of both types
i. Potential loss of data if master fails before newly written data can be replicated to other nodes
ii. Writes are replayed to the read replicas
iii. The more read slaves, the more writes you have to do, leading to replication lag
iv. Adds complexity
7. Federation (Functional Partitioning)
a. Splits up DBs by function.
b. Example: Separate DBs for Forums, Users, and Products
c. Disadvantages
i. Not effective if schema requires huge functions or tables
ii. Need to update app logic to determine which DB to read and write
iii. Joining from two DBs more expensive and complex with server link
iv. Adds complexity to both software and hardware
8. Sharding
a. Distributes data across different databases such that each DB only has a subset of the data.
b. Traditional hash tables map keys to an array index using the following:
i. hash = hashFunc(key)
ii. index = hash % arraySize
c. Common ways to shard a table of users include user’s last name initial, or the user’s geographical
location
d. Disadvantages
i. Need to update app logic to work with shards, resulting in complex SQL queries
ii. Data distribution can end up lopsided in shards. (See Consistent Hashing below)
iii. When the arraySize changes, all keys need to be remapped because the index is calculated by
a modular operation
iv. Joining data on shards is complex
v. Adds complexity to both software and hardware
9. Consistent Hashing
a. Use a function f(x) => hash and join the output range at the tail ends into a circle
b. Using f(x), map each node to a point on the ring
c. The interval between two nodes forms a partition. If you use the same function f(x) over the key you will
end up with a projection of that key in that ring
d. We define the server responsible for the above key as the first node in a clockwise direction after the key
projection
e. Adding a new node to the ring does not mean all keys need to be remapped. Only a fraction of the keys
will move to a different node. This also applies when a node leaves the ring
f. Some examples of systems that use consistent hashing are Amazon’s Dynamo, Riak, Akka, and Chord
g. In order to avoid overloading a single node when another one leaves the ring, and to distribute the keys
more evenly, the system can create a different mapping between physical nodes and nodes in the ring.
h. Instead of having a one-to-one mapping, the system can create virtual nodes, creating a M-to-N mapping
between physical nodes and virtual nodes in the ring

i.

10. NoSQL
a. A NoSQL database environment is, simply put, a non-relational and largely distributed database system
that enables rapid, ad-hoc organization and analysis of extremely high-volume, disparate data types.
NoSQL databases are sometimes referred to as cloud databases, non-relational databases, Big Data
databases and a myriad of other terms and were developed in response to the sheer volume of data
being generated, stored and analyzed by modern users (user-generated data) and their applications
(machine-generated data). NoSQL DBs do not support joins and are designed to scale better
b. Types of noSQL DBs
i. Key-Value Store – It has a Big Hash Table of keys & values {Example- Riak, Amazon S3
(Dynamo)}
ii. Document-based Store- It stores documents made up of tagged elements. {Example- CouchDB}
iii. Column-based Store- Each storage block contains data from only one column, {Example-
HBase, Cassandra}
iv. Graph-based-A network database that uses edges and nodes to represent and store data.
{Example- Neo4J}
11. SQL vs NoSQL
a. Reasons for SQL:
i. Structured data
ii. Strict schema
iii. Relational data
iv. Need for complex joins
v. Transactions
vi. Clear patterns for scaling
vii. More established: developers, community, code, tools, etc
viii. Lookups by index are very fast
b. Reasons for NoSQL:
i. Semi-structured data
ii. Dynamic or flexible schema
iii. Non-relational data
iv. No need for complex joins
v. Store many TB (or PB) of data
vi. Very data intensive workload
vii. Very high throughput for IOPS
c. Sample data well-suited for NoSQL:
i. Rapid ingest of clickstream and log data
ii. Leaderboard or scoring data
iii. Temporary data, such as a shopping cart
iv. Frequently accessed ('hot') tables
v. Metadata/lookup tables

12. Types of NoSQL DBs:


1. Document-oriented
a. Examples: MongoDB, CouchDB
b. Strengths: Heterogenous data, working object-oriented, agile development
c. Their advantage is that they do not require a consistent data structure. They are useful when
your requirements and thus your database layout changes constantly, or when you are dealing
with datasets which belong together but still look very differently. When you have a lot of tables
with two columns called "key" and "value", then these might be worth looking into.
2. Graph databases
a. Examples: Neo4j, GiraffeDB.
b. Strengths: Data Mining
c. While most NoSQL databases abandon the concept of managing data relations, these
databases embrace it even more than those so-called relational databases.
d. Their focus is at defining data by its relation to other data. When you have a lot of tables with
primary keys which are the primary keys of two other tables (and maybe some data describing
the relation between them), then these might be something for you.
3. Key-Value Stores
a. Examples: Redis, Cassandra, MemcacheDB
b. Strengths: Fast lookup of values by known keys
c. They are very simplistic, but that makes them fast and easy to use. When you have no need for
stored procedures, constraints, triggers and all those advanced database features and you just
want fast storage and retrieval of your data, then those are for you.
d. Unfortunately they assume that you know exactly what you are looking for. When you don't have
a definite and unique key for a specific result, you can't get it out of your K-V store that easily.

Anatomy of an HTTP Request, REST API


1. What is a REST API?
a. REST = “REpresentational State Transfer”
2. How does an HTTP Request work?
a. The gist of it: (1) Establish a TCP connection from the client to the server (2) Client initiates an HTTP
GET request to the server (3) HTTP server response to the HTTP GET request (4) HTML page is loaded
on the client browser
i. Step 1: 3 way handshake (SYN, ACK)
ii. Step 2: client request typically contains at least the following:

GET /index.html HTTP/1.1


Host: www.example.com

1. GET - refers to the type of request


2. /index.html - specifies the file that is being requested. It could alternatively just be a
request to '/' in which case the server would decide what files to send back.
3. HTTP/1.1 - specifies the version of HTTP
4. Host: www.example.com specifies the host url Note: the request always ends with a
double newline so that the "Host" can distinguish between various Domain Name
Services (DNS) sharing the same IP address.
iii. Step 3: server response
HTTP/1.1 200 OK
Date: Mon, 23 March 2015 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2015 23:11:55 GMT
ETag: "3f80f-1b6-3e1cb03b"
Content-Type: text/html; charset=UTF-8
Content-Length: 138
Accept-Ranges: bytes
Connection: close

<html>
...
</html>

iv. Step 4: Client renders HTML after receiving ‘Connection: closed’ in header
b. Other types of GET requests:
i. GET - previously explained
ii. HEAD - same as get but only transfers status line and headers
iii. POST - send data to the server
iv. PUT - usually used to update data
v. DELETE - removes something from the target resource
vi. CONNECT - establishes a tunnel to the server with the given URI
c. Difference between POST and PUT
i. PUT implies putting a resource - completely replacing whatever is available at the given URL
with a different thing. By definition, a PUT is idempotent. Do it as many times as you like, and
the result is the same. x=5 is idempotent. You can PUT a resource whether it previously exists,
or not (eg, to Create, or to Update)!
ii. POST updates a resource, adds a subsidiary resource, or causes a change. A POST is not
idempotent, in the way that x++ is not idempotent.
3. What happens when a user enters a URL into their browser? (Assuming the simplest
possible HTTP request, no proxies, IPv4 and no problems in any step)
1. browser checks cache; if requested object is in cache and is fresh, skip to #9
2. browser asks OS for server's IP address
3. OS makes a DNS lookup and replies the IP address to the browser
4. browser opens a TCP connection to server (this step is much more complex with HTTPS)
5. browser sends the HTTP request through TCP connection
6. browser receives HTTP response and may close the TCP connection, or reuse it for another
request
7. browser checks if the response is a redirect or a conditional response (3xx result status codes),
authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal
responses (2xx)
8. if cacheable, response is stored in cache
9. browser decodes response (e.g. if it's gzipped)
10. browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound
clip?)
11. browser renders response, or offers a download dialog for unrecognized types

Miscellaneous/Other
1. Latency vs Throughput
A. Latency is the time to perform some action or to produce some result.
B. Throughput is the number of such actions or results per unit of time.
C. Generally, you should aim for maximal throughput with acceptable latency.
2. Consistency Patterns
A. Weak Consistency
1. After a write, reads may or may not see it. A best effort approach is taken
2. Seen in VoIP, voice chat, and multiplayer online games
B. Eventual Consistency
1. After a write, reads will eventually see it (typically within Milliseconds). Data is replicated
Asynchronously
2. Often seen in DNS and email. Works well in highly available systems
C. Strong Consistency
1. After a write, reads will see it. Data is replicated synchronously.
2. This approach is seen in file systems and RDBMSes. Works well in systems that need
transactions
3. Availability Patterns (Failover)
A. Active-Passive (Master-Slave)
1. Active server and passive server exchange heartbeats. When heartbeat signal interrupted,
passive server takes over active’s IP and work. Availability determined by the amount of time it
takes for passive server to take over (cold or hot-start?)
B. Active-Active (Master-Master)
1. Both servers are active and manage traffic, spreading the load between them.
2. If public facing, DNS needs to know public IPs of e/ server
3. If internal facing, app logic needs to know about both servers
C. Disadvantages of failover
1. Add more hardware and complexity
2. Potential for loss of data if active system fails before any newly written data can be replicated to
the passive server
4. What is MapReduce?
A. MapReduce is a framework developed by Google to batch process large amounts of data in parallel. It
has a Map function, which takes elements and applies an operation to each one, and a reduce function,
which takes all the grouped results of the Map function and reduces them to a single value.
B. Abstracts away how data is stored and looping over particular data. Allows massively parallelization of
data processing over hundreds or thousands of nodes
C. https://stackoverflow.com/questions/12375761/good-mapreduce-examples
D. https://www.joelonsoftware.com/2006/08/01/can-your-programming-language-do-this/
5. Good API Design
A. II. General Principles
1. API Should Do One Thing and Do it Well
2. API Should Be As Small As Possible But No Smaller
3. Implementation Should Not Impact API
4. Minimize Accessibility of Everything
5. Names Matter–API is a Little Language
6. Documentation Matters, Do it religiously
7. Consider Performance Consequences of API Design Decisions
8. Effects of API Design Decisions on Performance are Real and Permanent
9. API Must Coexist Peacefully with Platform
B. III. Class Design
1. Minimize Mutability
2. Subclass Only Where it Makes Sense
3. Design and Document for Inheritance or Else Prohibit it
C. IV. Method Design
1. Don't Make the Client Do Anything the Module Could Do
2. Don't Violate the Principle of Least Astonishment
3. Fail Fast - Report Errors as Soon as Possible After They Occur
4. Provide Programmatic Access to All Data Available in String Form
5. Overload With Care
6. Use Appropriate Parameter and Return Types
7. Use Consistent Parameter Ordering Across Methods
8. Avoid Long Parameter Lists
9. Avoid Return Values that Demand Exceptional Processing

Systems Design and Scalability Questions - How To Approach


1. Handling the question
a. Communicate - ask questions, be open about the issues of your system
b. Go broad first - don’t dive straight into the alg part or get too focused on one aspect of the problem
c. Use the whiteboard - draw pictures of what you’re proposing. Anything to help you visualize the problem
d. Acknowledge interviewer concerns - validate them and try to make changes and point them out
e. Be careful about your assumptions - they can completely change the scope and type of problem.
f. State your assumptions explicitly - if you do make assumptions, make sure you state them
g. Estimate when necessary
h. Drive - drive the problem, making sure you’re talking to your interviewer the entire time
i. You should be talking 80% of the time while your interviewer talks 20% of the time
i. Other tips
i. Don’t use buzzwords. Know the specifics of the technologies you’re using
2. Design - Step by step
a. Scope the problem
b. Make reasonable assumptions
c. Draw the major components with diagrams, charts, etc.
d. Identify the key issues - what are possible bottlenecks or major challenges for the system?
e. Redesign for the key issues - be open about limitations in your design
3. Problem solving
a. Features
b. Define the API
c. Talk about availability and persistence
d. latency/performance
e. Scalability
f. Durability
g. Class diagram
h. Security and privacy
i. Cost effectiveness

Useful Links/Sources
1. Systems Design Primer: LINK
2. Curated list of engineering blogs by company: LINK
3. How Threads Work: LINK
4. Systems Design prep and study list: LINK
5. CAP Theorem confusion: LINK
6. Neo4J Blog: ACID vs BASE: LINK
7. Neo4J Blog: Why we need NoSQL Databases: LINK
8. StackOverflow - what is caching?: LINK
9. Top 10 Systems Design questions: LINK
Diagrams and Pictures

The Big Picture


Master-Slave Replication

Master-Master Replication
OSI - Open Source Interconnection Transport Layer model (7-layer)

You might also like