DATA 228
Big Data Technologies and Applications (Fall 2024)
Sangjin Lee
Hadoop: YARN, Hadoop’s
distributed compute
Ch pter 4, “H doop: the De initive Guide” 4th Edition, Tom White
a
a
f
What is distributed compute/computing?
A distributed compute system is system which solves computing problem utilizing set of
multiple devices over network.
a
a
a
a
Distributed compute
• Sometimes referred to s Scheduler or Orchestr tor
a
a
Elements of distributed compute
Fundamentals
• Run t sks in p r llel cross multiple m chines
• Schedule t sks in n e icient nd f ir m nner
• Monitor nd ccount for resource us ge cross multiple m chines
• Sc le horizont lly by dding more m chines to the cluster
• Recover from ll m nners of f ilures: node f ilures, t sk f ilures, network f ilures, etc.
a
a
a
a
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
a
a
a
a
a
a
a
a
a
Elements of distributed compute
More advanced elements
• Support notion of n “ pplic tion”
• Support both st teless nd st teful types of pplic tions
• (Big-d t -speci ic) Schedule t sks to be s loc l to d t s possible
• (St teless- pp-speci ic) Provide support for tr ic ingress for pplic tions
a
a
a
a
a
f
a
a
f
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
a
a
Elements of distributed compute
Stateless and stateful
St teless St teful
Coordin tion No coordin tion required Coordin tion required mong worklo ds
Dur tion Tends to run long (long-running) Tends to complete its job nd shut down
St te T sks don’t need to m int in st te St te is critic l to t sks
Big d t jobs, b tch jobs, C ss ndr , AI/ML
Ex mples Web services, microservices
jobs requiring GPUs
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Examples of distributed compute systems
• Not quite but pointing the w y: servlet cont iners (Tomc t, Jetty), pplic tion servers
• Mesos
• Kubernetes
• Orchestr tes cont iner worklo ds
• The le der in st teless distributed compute systems
• Supports more complex pplic tion types (st teful, GPU, etc.)
• Sp wned n ecosystem of supporting technologies: Helm, cont inerd, Istio, Envoy, etc. (in CNCF)
• Old competition: Mesos
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Examples of distributed compute systems
• YARN
• Orchestr tes mostly big-d t or d t -rel ted worklo ds
• C n support other types of pplic tions
• D t - w re scheduling
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Distributed compute in Hadoop
Characteristics
• Accomplish d t processing over l rge d t (TBs or PBs) within short mount of time
• Use resources (memory, CPUs, network, nd I/O) e iciently to ccomplish them
• H ndle d t -w re scheduling
• H ndle t sk scheduling in bursts
a
a
a
a
a
a
a
a
a
a
a
a
ff
a
a
a
YARN
Basics
• H doop’s gener l-purpose distributed compute system
• It is NOT d t computing fr mework itself
• D t computing fr meworks (M pReduce, Sp rk, Tez, etc.) re YARN pplic tions
• D t pr ctitioners don’t inter ct with YARN directly for the most p rt
• Supports pplic tions nd cont iners (t sks)
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
Architecture
• ResourceM n ger nd NodeM n gers
a
a
a
a
a
YARN
ResourceManager
• “One” for single cluster
• Processes resource requests
• Resource requests
• Amount of compute resources being requested: virtu l cores (vCPUs), memory (MBs), GPUs, etc.
• Loc lity constr ints: speci ic node, speci ic r ck, or o -r ck (= nywhere)
• ResourceM n ger schedules cont iners b sed on resource requests
• ResourceM n ger tries to schedule b sed on the loc lity constr ints
a
a
a
a
a
a
a
f
a
a
a
f
a
a
a
ff
a
a
a
YARN
ResourceManager
• M int ins st te of v il ble nd lloc ted resources on nodes
• M int ins st te of ll running pplic tions nd cont iners
• ResourceM n ger requires l rge mount of memory
• ResourceM n ger c n be sc l bility bottleneck
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
ResourceManager high availability (HA)
• Active nd st ndby ResourceM n gers
• ResourceM n ger st te is persisted in n RMSt teStore ( ilesystems, ZooKeeper, etc.)
• M nu l nd utom tic f ilover
• HA c n recover ll running pplic tions during f ilover
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
f
YARN
NodeManager
• Responsible for l unching nd m n ging cont iners
• Runs he lth checks on the node it runs nd communic tes the st te to RM
• Reports resource us ge st tus to RM
• NodeM n ger c n be rest rted without ecting running cont iners
a
a
a
a
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
YARN
Application taxonomy
• Applic tion
• A single entity th t represents the distributed compute job s whole
• Cont iners
• Individu l compute t sks s p rt of the pplic tion
• Not the s me s the (Docker/Kubernetes) cont iner
• Applic tion m ster (AM)
• Speci l cont iner th t m n ges other cont iners for the pplic tion
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
YARN application
• Client requests n Applic tion M ster
(AM) to ResourceM n ger (RM)
• RM selects node to l unch the AM
cont iner
• AM m y request more cont iners to RM
a
a
a
a
a
a
a
a
a
a
YARN
Schedulers
• Scheduler: ResourceM n ger component/ lgorithm th t lloc tes cont iners b sed on
cert in policy
• A scheduler tries to optimize resource us ge (utiliz tion) nd timeliness of pplic tion
completions (throughput)
• The centr l consider tion is multi-ten ncy
• YARN supports 3 schedulers: FIFO ( irst-in- irst-out) scheduler, C p city scheduler, nd F ir
scheduler
a
a
a
a
a
a
f
a
a
f
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
Schedulers
YARN
Schedulers
• FIFO scheduler
• Simplest scheduler
• It’s FIFO ( irst-in- irst-out) queue
• Applic tions run in the order of submission: the next pplic tion gets scheduled fter the
previous pplic tions h ve been completed
• Not suit ble for multi-ten nt cluster
a
a
a
a
f
a
a
f
a
a
a
a
a
YARN
Schedulers
• C p city scheduler (def ult)
• C p city is p rtitioned with multiple dedic ted queues (e.g. te ms, groups of pps, etc.)
• Queues get resource gu r ntees even if other queues re contended
• Improves over ll throughput comp red to FIFO
• Provides te ms with predict ble c p city
• M y str nd resources if utiliz tion cross queues is uneven
• Queue-sizing becomes very import nt
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
Schedulers
• F ir scheduler
• Dyn mic lly b l nces resources between ll running pps so th t ll pps get equ l sh re
of resources over time: “f ir sh re”
• Tries to b l nce between h ving ll pps m ke good progress nd letting l rge pps inish in
timely m nner
• Queues re on p per unnecess ry, but they c n be used to incre se predict bility
• It c n be the best of both worlds in l rge multi-ten nted clusters
• Some loss of predict bility
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
f
a
a
YARN
Demo
Exploring schedulers using
sleep pp (job)
a
a
YARN
Demo: sleep job
• The j r for the sleep job (hadoop mapreduce client jobclient 3.4.0 tests.jar) is v il ble on C nv s
• Sleep job p r meters
• m: number of m ppers
• r: number of reducers
• mt: m pper sleep dur tion (ms)
• rt: reducer sleep dur tion (ms)
• bin/hadoop jar share/hadoop/mapreduce/hadoop mapreduce client jobclient 3.4.0 tests.jar
sleep m 2 r 1 mt 90000 rt 90000
-
-
-
-
a
a
-
a
a
-
a
-
a
a
-
-
-
-
-
-
-
-
-
a
-
a
a
-
a
a
YARN
Demo: sleep job
• AM size: 2 GB memory nd 1 vCore
• MR cont iner size: 1 GB memory nd 1 vCore e ch
• (My) YARN cluster size: 8 GB memory nd 8 vCores
a
a
a
a
a
YARN
Demo: sleep job: rst app
fi
YARN
Demo: sleep job: rst app (mappers running)
fi
YARN
Demo: sleep job: second app submitted
YARN
Demo: sleep job: second app running mappers
YARN
Demo: sleep job: second app completes running