0% found this document useful (0 votes)

77 views37 pages

Distributed Computing Seminar

Parallel vs. Distributed computing History of parallelization and synchronization networking Basics Computer Speedup. Distributed computing is multiple CPUs across many computers over the network A Brief History. 1975-85 Parallel computing was favored in the early years Primarily vector-based at first Gradually more threadbased parallelism was introduced.

Uploaded by

c1099775

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views37 pages

Distributed Computing Seminar

Uploaded by

c1099775

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

Distributed Computing Seminar

Lecture 1: Introduction to Distributed Computing & Systems Background

Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007
Except where otherwise noted, the contents of this presentation are Copyright 2007 University of Washington and are licensed under the Creative Commons Attribution 2.5 License.

Course Overview

5 lectures
1

Introduction 2 Technical Side: MapReduce & GFS 2 Theoretical: Algorithms for distributed computing

Readings + Questions nightly

Readings: http://code.google.com/edu/content/submissions/mapreduce-minilecture/listing.html Questions: http://code.google.com/edu/content/submissions/mapreduceminilecture/MapReduceMiniSeriesReadingQuestions.doc

Outline

Introduction to Distributed Computing Parallel vs. Distributed Computing History of Distributed Computing Parallelization and Synchronization Networking Basics

Computer Speedup

Moores Law: The density of transistors on a chip doubles every 18 months, for the same cost (1965)
Image: Toms Hardware and not subject to the Creative Commons license applicable to the rest of this work.

Scope of problems
What can you do with 1 computer? What can you do with 100 computers? What can you do with an entire data center?

Distributed problems

Rendering multiple frames of high-quality animation

Image: DreamWorks Animation and not subject to the Creative Commons license applicable to the rest of this work.

Distributed problems

Simulating several hundred or thousand characters

Happy Feet Kingdom Feature Productions; Lord of the Rings New Line Cinema, neither image is subject to the Creative Commons license applicable to the rest of the work.

Distributed problems

Indexing the web (Google) Simulating an Internet-sized network for networking experiments (PlanetLab) Speeding up content delivery (Akamai)

What is the key attribute that all these examples have in common?

Parallel vs. Distributed

Parallel computing can mean:

Vector

processing of data Multiple CPUs in a single computer

Distributed computing is multiple CPUs across many computers over the network

A Brief History 1975-85

Parallel computing was favored in the early years Primarily vector-based at first Gradually more threadbased parallelism was introduced

Image: Computer Pictures Database and Cray Research Corp and is not subject to the Creative Commons license applicable to the rest of this work.

A Brief History 1985-95

Massively parallel architectures start rising in prominence Message Passing Interface (MPI) and other libraries developed Bandwidth was a big problem

A Brief History 1995-Today

Cluster/grid architecture increasingly dominant Special node machines eschewed in favor of COTS technologies Web-wide cluster software Companies like Google take this to the extreme

Parallelization & Synchronization

Parallelization Idea

Parallelization is easy if processing can be cleanly split into n units:

work Partition problem w1 w2 w3

Parallelization Idea (2)

Spawn worker threads: thread thread thread

In a parallel computation, we would like to have as many threads as we have processors. e.g., a fourprocessor computer would be able to run four threads at the same time.

Parallelization Idea (3)

Workers process data: thread w1 thread w2 thread w3

Parallelization Idea (4)

thread w1

thread w2

thread w3 Report results

results

Parallelization Pitfalls
But this model is too simple!

How do we assign work units to worker threads? What if we have more work units than threads? How do we aggregate the results at the end? How do we know all the workers have finished? What if the work cannot be divided into completely separate tasks?
What is the common theme of all of these problems?

Parallelization Pitfalls (2)

Each of these problems represents a point at which multiple threads must communicate with one another, or access a shared resource. Golden rule: Any memory that can be used by multiple threads must have an associated synchronization system!

What is Wrong With This?

Thread 1: void foo() { x++; y = x; } Thread 2: void bar() { y++; x+=3; }

If the initial state is y = 0, x = 6, what happens after these threads finish running?

Multithreaded = Unpredictability
Many things that look like one step operations actually take several steps under the hood:
Thread 1: void foo() { eax = mem[x]; inc eax; mem[x] = eax; ebx = mem[x]; mem[y] = ebx; } Thread 2: void bar() { eax = mem[y]; inc eax; mem[y] = eax; eax = mem[x]; add eax, 3; mem[x] = eax; }

When we run a multithreaded program, we dont know what order threads run in, nor do we know when they will interrupt one another.

Multithreaded = Unpredictability
This applies to more than just integers:

Pulling work units from a queue Reporting work back to master unit Telling another thread that it can begin the next phase of processing

All require synchronization!

Synchronization Primitives

A synchronization primitive is a special shared variable that guarantees that it can only be accessed atomically. Hardware support guarantees that operations on synchronization primitives only ever take one step

Semaphores

A semaphore is a flag that can be raised or lowered in one step Semaphores were flags that railroad engineers would use when entering a shared track

Set:

Reset:

Only one side of the semaphore can ever be red! (Can both be green?)

Semaphores
set() and reset() can be thought of as lock() and unlock() Calls to lock() when the semaphore is already locked cause the thread to block.

Pitfalls: Must bind semaphores to particular objects; must remember to unlock correctly

The corrected example

Thread 1: Thread 2:

void foo() { sem.lock(); x++; y = x; sem.unlock(); }

void bar() { sem.lock(); y++; x+=3; sem.unlock(); }

Global var Semaphore sem = new Semaphore(); guards access to x&y

Condition Variables

A condition variable notifies threads that a particular condition has been met Inform another thread that a queue now contains elements to pull from (or that its empty request more elements!)

Pitfall: What if nobodys listening?

The final example

Thread 1: void foo() { sem.lock(); x++; y = x; fooDone = true; sem.unlock(); fooFinishedCV.notify(); } Thread 2: void bar() { sem.lock(); if(!fooDone) fooFinishedCV.wait(sem); y++; x+=3; sem.unlock(); }

Global vars: Semaphore sem = new Semaphore(); ConditionVar fooFinishedCV = new ConditionVar(); boolean fooDone = false;

Too Much Synchronization? Deadlock

Synchronization becomes even more complicated when multiple locks can be used Can cause entire system to get stuck Thread A: semaphore1.lock(); semaphore2.lock(); /* use data guarded by semaphores */ semaphore1.unlock(); semaphore2.unlock();
(Image: RPI CSCI.4210 Operating Systems notes)

Thread B: semaphore2.lock(); semaphore1.lock(); /* use data guarded by semaphores */ semaphore1.unlock(); semaphore2.unlock();

The Moral: Be Careful!

Synchronization is hard
Need

to consider all possible shared state Must keep locks organized and use them consistently and correctly

Knowing there are bugs may be tricky; fixing them can be even worse! Keeping shared state to a minimum reduces total system complexity

Fundamentals of Networking

Sockets: The Internet = tubes?

A socket is the basic network interface Provides a two-way pipe abstraction between two applications Client creates a socket, and connects to the server, who receives a socket representing the other side

Ports

Within an IP address, a port is a sub-address identifying a listening program Allows multiple clients to connect to a server at once

What makes this work?

Underneath the socket layer are several more protocols Most important are TCP and IP (which are used hand-in-hand so often, theyre often spoken of as one protocol: TCP/IP)

IP header

TCP header

Your data

Even more low-level protocols handle how data is sent over Ethernet wires, or how bits are sent through the air using 802.11 wireless

Why is This Necessary?

Not actually tube-like underneath the hood Unlike phone system (circuit switched), the packet switched Internet uses many routes at once

you

www.google.com

Networking Issues
If a party to a socket disconnects, how much data did they receive? Did they crash? Or did a machine in the middle? Can someone in the middle intercept/modify our data? Traffic congestion makes switch/router topology important for efficient throughput

Conclusions
Processing more data means using more machines at the same time Cooperation between processes requires synchronization Designing real distributed systems requires consideration of networking topology

Next time: How MapReduce works

CS439 CC 2 Parallel Distributed Systems
No ratings yet
CS439 CC 2 Parallel Distributed Systems
37 pages
CS439-CC-2-Parallel Distributed Systems
No ratings yet
CS439-CC-2-Parallel Distributed Systems
37 pages
Lecture 05
No ratings yet
Lecture 05
73 pages
Unit 4
No ratings yet
Unit 4
42 pages
L04 Concurrency Consistency
No ratings yet
L04 Concurrency Consistency
39 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Parallel Programming Challenges Explained
No ratings yet
Parallel Programming Challenges Explained
77 pages
Slides
No ratings yet
Slides
36 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
63 pages
CS621 Cheatsheet
No ratings yet
CS621 Cheatsheet
11 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
Apt05 2024S2
No ratings yet
Apt05 2024S2
23 pages
Parallel Computing Techniques Guide
No ratings yet
Parallel Computing Techniques Guide
24 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Parallel Computing Concepts Explained
No ratings yet
Parallel Computing Concepts Explained
90 pages
Levels of Parallelism in Computing
No ratings yet
Levels of Parallelism in Computing
70 pages
Threads
No ratings yet
Threads
12 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Unit-2 Notes (Os)
No ratings yet
Unit-2 Notes (Os)
18 pages
Report 1
No ratings yet
Report 1
15 pages
Unit 1
No ratings yet
Unit 1
11 pages
CS3006 Parallel Computing Course Overview
100% (1)
CS3006 Parallel Computing Course Overview
46 pages
Multi Threading
No ratings yet
Multi Threading
96 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
18 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
PDC Notes
No ratings yet
PDC Notes
2 pages
5 Scheduling
No ratings yet
5 Scheduling
168 pages
Intro to Parallel & Distributed Systems
No ratings yet
Intro to Parallel & Distributed Systems
15 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
PCP 2022 7 MutualExclusion
No ratings yet
PCP 2022 7 MutualExclusion
49 pages
3 Threads
No ratings yet
3 Threads
5 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
Threads
No ratings yet
Threads
32 pages
Multi-Core Processors in CS149
No ratings yet
Multi-Core Processors in CS149
107 pages
Parallel and Distributed Computing-1
No ratings yet
Parallel and Distributed Computing-1
23 pages
Distributed Systems for Developers
No ratings yet
Distributed Systems for Developers
10 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Distributed vs Parallel Computing Concepts
No ratings yet
Distributed vs Parallel Computing Concepts
29 pages
Concurrent and Parallel Programming .Unit-1
No ratings yet
Concurrent and Parallel Programming .Unit-1
8 pages
Threading and Synchronization Concepts
No ratings yet
Threading and Synchronization Concepts
45 pages
CS-482 Lecture#1 IntroductiontoParallelandDistributedComputing
No ratings yet
CS-482 Lecture#1 IntroductiontoParallelandDistributedComputing
26 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
OMG, Multi-Threading Is Easier Than Networking: White Paper
100% (1)
OMG, Multi-Threading Is Easier Than Networking: White Paper
10 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
70 - Process, Thread, Interprocess Communication
No ratings yet
70 - Process, Thread, Interprocess Communication
14 pages
Overview Concurrent and Distributed Systems
No ratings yet
Overview Concurrent and Distributed Systems
67 pages
Multithreading Algorithms
No ratings yet
Multithreading Algorithms
36 pages
Operating Systems Exam Notes Slide 3
No ratings yet
Operating Systems Exam Notes Slide 3
8 pages
Lecturette Topics SSB 2025
No ratings yet
Lecturette Topics SSB 2025
3 pages
5387843-Quantum Tantra - DMT-Extraction PDF
No ratings yet
5387843-Quantum Tantra - DMT-Extraction PDF
30 pages
Rdshade Manual v1.0
No ratings yet
Rdshade Manual v1.0
11 pages
Importance of Lesson Planning
No ratings yet
Importance of Lesson Planning
20 pages
AASHTO LRFD - The HL-93 Live Load Model - Dynamic Load Allowance
No ratings yet
AASHTO LRFD - The HL-93 Live Load Model - Dynamic Load Allowance
1 page
Global Perspective in Education Current Issues
No ratings yet
Global Perspective in Education Current Issues
11 pages
V Semester Diploma Examination MAY-2024 Full Stack Development-20CS52I
No ratings yet
V Semester Diploma Examination MAY-2024 Full Stack Development-20CS52I
33 pages
Crosthwaite G and McClure J 27 Nov 2012 F 22 ASIP Roadshow Benefits On Overall ASIP Execution 2012 Aircraft Structural Integrity Program Conferenc
No ratings yet
Crosthwaite G and McClure J 27 Nov 2012 F 22 ASIP Roadshow Benefits On Overall ASIP Execution 2012 Aircraft Structural Integrity Program Conferenc
25 pages
Marketing Plan For HP PDF
No ratings yet
Marketing Plan For HP PDF
14 pages
DLL Week 2
No ratings yet
DLL Week 2
6 pages
Dr. Venu - Management Accounting
No ratings yet
Dr. Venu - Management Accounting
2 pages
Instrument Cluster
No ratings yet
Instrument Cluster
58 pages
Workbook Gravity Falls
No ratings yet
Workbook Gravity Falls
19 pages
JV-P01-R01 QMS Processes Interaction Chart (VINA)
No ratings yet
JV-P01-R01 QMS Processes Interaction Chart (VINA)
1 page
Roman Tunnel Construction Techniques
No ratings yet
Roman Tunnel Construction Techniques
4 pages
Aluminium Magnesium AlMg
No ratings yet
Aluminium Magnesium AlMg
2 pages
Materi Prof Nursalam
100% (3)
Materi Prof Nursalam
88 pages
Industrial Training Report at Swiss Inn
94% (34)
Industrial Training Report at Swiss Inn
103 pages
Minimally Invasive Crown Lengthening Guide
No ratings yet
Minimally Invasive Crown Lengthening Guide
7 pages
Understanding Quality Function Deployment
No ratings yet
Understanding Quality Function Deployment
12 pages
Latika Thapliyal
No ratings yet
Latika Thapliyal
5 pages
Oil and Gas Company Profile CHEVRON
100% (1)
Oil and Gas Company Profile CHEVRON
4 pages
SCHOOL OF CREATIVE ARTS FILM WRITING Syllabus
No ratings yet
SCHOOL OF CREATIVE ARTS FILM WRITING Syllabus
10 pages
Chemist Retail Business Guide
No ratings yet
Chemist Retail Business Guide
31 pages
Lenovo Thinkpad L420 DAGC9EMB8E0 REV-E PDF
No ratings yet
Lenovo Thinkpad L420 DAGC9EMB8E0 REV-E PDF
53 pages
WEEK 4 - 2nd Quarter Lesson - INTEGRATING THE LOCXAL AND THE CONTEMPORARY
No ratings yet
WEEK 4 - 2nd Quarter Lesson - INTEGRATING THE LOCXAL AND THE CONTEMPORARY
6 pages
MGT 4010 Final Exam QUESTION Semester 1 2019
No ratings yet
MGT 4010 Final Exam QUESTION Semester 1 2019
14 pages
Design Project Brief
No ratings yet
Design Project Brief
8 pages
Suggestions Als
No ratings yet
Suggestions Als
4 pages
Service Quality Models A Review - Seth
No ratings yet
Service Quality Models A Review - Seth
53 pages

Distributed Computing Seminar

Uploaded by

Distributed Computing Seminar

Uploaded by

Distributed Computing Seminar

Lecture 1: Introduction to Distributed Computing & Systems Background

Readings + Questions nightly

Rendering multiple frames of high-quality animation

Simulating several hundred or thousand characters

Parallel vs. Distributed

Parallel computing can mean:

processing of data Multiple CPUs in a single computer

A Brief History 1975-85

A Brief History 1985-95

A Brief History 1995-Today

Parallelization & Synchronization

Parallelization is easy if processing can be cleanly split into n units:

Parallelization Idea (2)

Spawn worker threads: thread thread thread

Parallelization Idea (3)

Workers process data: thread w1 thread w2 thread w3

Parallelization Idea (4)

thread w3 Report results

Parallelization Pitfalls (2)

What is Wrong With This?

All require synchronization!

The corrected example

void foo() { sem.lock(); x++; y = x; sem.unlock(); }

void bar() { sem.lock(); y++; x+=3; sem.unlock(); }

Global var Semaphore sem = new Semaphore(); guards access to x&y

Pitfall: What if nobodys listening?

The final example

Too Much Synchronization? Deadlock

Thread B: semaphore2.lock(); semaphore1.lock(); /* use data guarded by semaphores */ semaphore1.unlock(); semaphore2.unlock();

The Moral: Be Careful!

Sockets: The Internet = tubes?

What makes this work?

Why is This Necessary?

Next time: How MapReduce works

You might also like