Web service scalability

Where is the ceiling?

13 February 2019

Marsel Mavletkulov

Scalability limits

Adding more servers improves throughput up until it doesn't
(hardware scalability)

2

Why does this happen?

Contention-limited scalability due to serialization or queueing — concurrent processes access a shared resource (waiting on a database lock)

Coherency-limited scalability due to inconsistent copies of data — concurrent processes must agree whose data to use (consensus in cluster, 2-phase commit)

3

Contention and coherency examples

Priority sorting of the message queue, garbage collection

4

Contention and coherency on hardware/OS/DB level

5

How to reduce contention and crosstalk in a web service

Common setup: load balancer, multiple app servers, and one database

How can we improve how the system scales?

6

Universal scalability law

You can estimate your system's scalability

7

Scalability facets

Hardware scalability where the behavior of an application system running on larger hardware configurations is investigated

How many N servers can we leverage?

Software scalability where the focus is about how the system behaves when the demand increases (when more users are using it or more requests need to be handled)

How many N users can we serve?
8

Hardware scalability

Obtain measurements of throughput at various levels of cluster size, for example

Each node should receive the same amount and rate of work no matter the cluster size

(N=1, X=100req/s), (N=2*1, X=2*100req/s), (N=4*1, X=4*100req/s), ...

Measure λ throughput of the system with one node or use regression to determine λ

9

Software scalability

Load testing with Yandex Tank & Pandora or locust.io

N is a number of users (greenlets/goroutines) that can possibly send requests to a web server (some asleep, some active)

Z (average think time in seconds) is a delay between requests to emulate user thinking what to do next

Q is the number of requests resident in the web server: combined number of requests waiting for service and the number of requests in service

wait_req_count = arrival_rate * wait_time_seconds
serv_req_count = arrival_rate * serv_time_seconds
resid_req_count = wait_req_count + serv_req_count

resp_time_seconds = wait_time_seconds + serv_time_seconds
resid_req_count = arrival_rate * resp_time_seconds
10

How to emulate web traffic

Measure requests arrival rate on production load balancer during peak traffic (steady state), e.g., arrival_rate=150 requests/second

Choose the ratio N/Z to be equivalent to arrival_rate, e.g., N=1500 users, Z=10 seconds

To emulate web-user traffic, hold the ratio N/Z fixed while increasing N and Z in the same proportion

(N=1500, Z=10s), (N=10*1500, Z=10*10s), (N=20*1500, Z=20*10s), (N=30*1500, Z=30*10s), ...

To achieve statistically independent web requests the ratio Q/Z should be small, Q can be non-zero as long as Z is relatively large

If you are limited by N (can't spin many users), then decrease Z while holding N fixed at its max (200 users as in the paper)

11

Estimate contention and crosstalk

Use nonlinear least squares regression to estimate level of contention and crosstalk based on measurements of node count and corresponding throughput

You need >=6 measurements

Tools for using Universal Scalability Law

12

Conclusion

Scalability is constrained by contention and crosstalk

The time when your system is unable to keep up with load might come unexpectedly 🚒

Earlier you know your system's limits, better you're prepared to handle increasing load (optimize, redesign)

This presentation provided an overview of how to measure and estimate web service's scalability

13

References

14

Thank you

Marsel Mavletkulov

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)