0% found this document useful (0 votes)
53 views1 page

Barlas Exercises Ch1

Uploaded by

Ghofrane Rh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views1 page

Barlas Exercises Ch1

Uploaded by

Ghofrane Rh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

26 CHAPTER 1 Introduction

EXERCISES
1. Study one of the top 10 most powerful supercomputers in the world. Discover:
• What kind of operating system does it run?
• How many CPUs/GPUs is it made of?
• What is its total memory capacity?
• What kind of software tools can be used to program it?
2. How many cores are inside the top GPU offerings from Nvidia and AMD? What
is the GFlop rating of these chips?
3. The performance of the most powerful supercomputers in the world is usually
reported as two numbers Rpeak and Rmax, both in TFlops (tera floating point
operations per second) units. Why is this done? What are the factors reducing
performance from Rpeak to Rmax? Would it be possible to ever achieve Rpeak?
4. A sequential application with a 20% part that must be executed sequentially, is
required to be accelerated three-fold. How many CPUs are required for this task?
If the required speedup was 5, what would be the number of CPUs required?
5. A parallel application running on 5 identical machines, has a 10% sequential
part. What is the speedup relative to a sequential execution on one of the
machines? If we would like to double that speedup, how many CPU would be
required?
6. An application with a 5% non-parallelizable part, is to be modified for parallel
execution. Currently on the market there are two parallel machines available:
machine X with 4 CPUs, each CPU capable of executing the application in
1 hour on its own, and, machine Y with 16 CPUs, with each CPU capable of
executing the application in 2 hours on its own. Which is the machine you
should buy, if the minimum execution time is required?
7. Create a simple sorting application that uses the mergesort algorithm to sort a
large collection (e.g., 107 ) of 32-bit integers. The input data and output results
should be stored in files, and the I/O operations should be considered a
sequential part of the application. Mergesort is an algorithm that is considered
appropriate for parallel execution, although it cannot be equally divided between
an arbitrary number of processors, as Amdahl’s and Gustafson-Barsis’ laws
require.
Assuming that this equal division is possible, estimate α, i.e., the part of the
program that can be parallelized, by using a profiler like gprof or valgrind to
measure the duration of mergesort’s execution relative to the overall execution
time. Use this number to estimate the predicted speedup for your program.
Does α depend on the size of the input? If it does, how should you modify
your predictions and their graphical illustration?
8. A parallel application running on 10 CPUs, spends 15% of its total time, in
sequential execution. What kind of CPU (how much faster) would we need to
run this application completely sequentially, while keeping the same total time?

You might also like