0% found this document useful (0 votes)
20 views51 pages

Lec1 - AnalysisOfAlgorithms

Uploaded by

m7mdkassab1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views51 pages

Lec1 - AnalysisOfAlgorithms

Uploaded by

m7mdkassab1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE

A NALYSIS OF A LGORITHMS

Modified by: Dr. Fahed Jubair and Dr. Ramzi Saifan


Computer Engineering Department
University of Jordan

http://algs4.cs.princeton.edu
A NALYSIS OF A LGORITHMS
‣ introduction
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
‣ memory
http://algs4.cs.princeton.edu
Reasons to analyze algorithms

Predict performance.

Compare algorithms.

Provide guarantees.

Primary practical reason: avoid performance bugs.

client gets poor performance because programmer


did not understand performance characteristics

4
The challenge

Q. Will my program be able to solve a large practical input?

Why is my program so slow ? Why does it run out of memory ?

5
Scientific method applied to analysis of algorithms

A framework for predicting performance and comparing algorithms.

Scientific method.
・Observe some feature of the natural world.
・Hypothesize a model that is consistent with the observations.
・Predict events using the hypothesis.
・Verify the predictions by making further observations.
・Validate by repeating until the hypothesis and observations agree.

Feature of the natural world. Computer itself.

6
Example: 3-SUM

3-SUM. Given N distinct integers, how many triples sum to exactly zero?

% more 8ints.txt
8 a[i] a[j] a[k] sum
30 -40 -20 -10 40 0 10 5 1
30 -40 10 0

% java ThreeSum 8ints.txt 2 30 -20 -10 0

4 -40 40 0 0
3

-10 0 10 0
4

Context. Deeply related to problems in computational geometry.

7
3-SUM: brute-force algorithm

public class ThreeSum


{
public static int count(int[] a)
{
int N = a.length;
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
if (a[i] + a[j] + a[k] == 0) check each triple
count++; for simplicity, ignore
return count; integer overflow

public static void main(String[] args)


{
In in = new In(args[0]);
int[] a = in.readAllInts();
StdOut.println(count(a));
}
}
8
Measuring the running time

public class Stopwatch (part of stdlib.jar )

Stopwatch() create a new stopwatch

double elapsedTime() time since creation (in seconds)

public static void main(String[] args)


{
In in = new In(args[0]);
int[] a = in.readAllInts();
Stopwatch stopwatch client
= new Stopwatch();
code
StdOut.println(ThreeSum.count(a));
double time = stopwatch.elapsedTime();
StdOut.println("elapsed time " + time);
}

9
Empirical analysis

Run the program for various input sizes and measure running time.

N time (seconds) †

250 0

500 0

1,000 0.1

2,000 0.8

4,000 6.4

8,000 51.1

16,000 ?

10
Data analysis

Standard plot. Plot running time T (N) vs. input size N.

11
Data analysis

Log-log plot. Plot running time T (N) vs. input size N using log-log scale.

lg(T (N)) = b lg N + c
b = 2.999
c = -33.2103

T (N) = a N b, where a = 2 c
3 orders
of magnitude

power law

Regression. Fit straight line through data points: a N b. slope

Hypothesis. The running time is about 1.006  10 –10  N 2.999 seconds.


12
Prediction and validation

Hypothesis. The running time is about 1.006  10 –10  N 2.999 seconds.

Predictions.
・51.0 seconds for N = 8,000.
・408.1 seconds for N = 16,000.

Observations. N time (seconds) †

8,000 51.1

8,000 51

8,000 51.1

16,000 410.8

validates hypothesis!

13
Experimental algorithmics

System independent effects.


・Algorithm. determines exponent
・Input data. in power law

System dependent effects. determines constant

・Hardware: CPU, memory, cache, … in power law

・Software: compiler, interpreter, garbage collector, …


・System: operating system, network, other apps, …

14
A NALYSIS OF A LGORITHMS
‣ Introduction
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
‣ memory
http://algs4.cs.princeton.edu
Mathematical models for running time

Total running time: sum of cost  frequency for all operations.


・Need to analyze program to determine set of operations.
・Cost depends on machine, compiler.
・Frequency depends on algorithm, input data.

Donald Knuth
1974 Turing Award

In principle, accurate mathematical models are available.


16
Cost of basic operations

Challenge. How to estimate constants.

operation example nanoseconds †

integer add a + b 2.1

integer multiply a * b 2.4

integer divide a / b 5.4

floating-point add a + b 4.6

floating-point multiply a * b 4.2

floating-point divide a / b 13.5

sine Math.sin(theta) 91.3

arctangent Math.atan2(y, x) 129

... ... ...

† Running OS X on Macbook Pro 2.2GHz with 2GB RAM

17
Cost of basic operations

Observation. Most primitive operations take constant time.

operation example nanoseconds †

variable declaration int a c1

assignment statement a = b c2

integer compare a < b c3

array element access a[i] c4

array length a.length c5

1D array allocation new int[N] c6 N

2D array allocation new int[N][N] c7 N 2

Caveat. Non-primitive operations often take more than constant time.

18
Example: 1-SUM

Q. How many instructions as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
if (a[i] == 0)
count++;

N array accesses

operation frequency

variable declaration 2

assignment statement 2

less than compare N+1

equal to compare N

array access N

increment N to 2 N

19
Example: 2-SUM

Q. How many instructions as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;

operation frequency

variable declaration N+2

assignment statement N+2

less than compare ½ (N + 1) (N + 2)

equal to compare ½ N (N − 1)
tedious to count exactly
array access N (N − 1)

increment ½ N (N − 1) to N (N − 1)

20
Simplification 1: cost model

Cost model. Use some basic operation as a proxy for running time.

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;

operation frequency

variable declaration N+2

assignment statement N+2

less than compare ½ (N + 1) (N + 2)

equal to compare ½ N (N − 1)

array access N (N − 1) cost model = array accesses

increment ½ N (N − 1) to N (N − 1)

22
Simplification 2: tilde notation

・Estimate running time (or memory) as a function of input size N.


・Ignore lower order terms.
– when N is large, terms are negligible
– when N is small, we don't care

Ex 1. ⅙ N 3 + 20 N + 16 ~ ⅙N3
Ex 2. ⅙ N 3 + 100 N 4/3 + 56 ~ ⅙N3
Ex 3. ⅙N3 - ½N 2 + ⅓ N ~ ⅙N3

discard lower-order terms


(e.g., N = 1000: 166.67 million vs. 166.17 million)

23
Simplification 2: tilde notation

・Estimate running time (or memory) as a function of input size N.


・Ignore lower order terms.
– when N is large, terms are negligible
– when N is small, we don't care

operation frequency tilde notation

variable declaration N+2 ~N

assignment statement N+2 ~N

less than compare ½ (N + 1) (N + 2) ~½N2

equal to compare ½ N (N − 1) ~½N2

array access N (N − 1) ~N2

increment ½ N (N − 1) to N (N − 1) ~ ½ N 2 to ~ N 2

24
Example: 2-SUM

Q. Approximately how many array accesses as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
"inner loop"
if (a[i] + a[j] == 0)
count++;

A. ~ N 2 array accesses.

Bottom line. Use cost model and tilde notation to simplify counts.
25
Example: 3-SUM

Q. Approximately how many array accesses as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++) "inner loop"
if (a[i] + a[j] + a[k] == 0)
count++;

A. ~ ½ N 3 array accesses.

Bottom line. Use cost model and tilde notation to simplify counts.
26
A NALYSIS OF A LGORITHMS
‣ introduction
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
‣ memory
http://algs4.cs.princeton.edu
Common order-of-growth classifications

Definition. If f (N) ~ c g(N) for some constant c > 0, then the order of growth
of f (N) is g(N).
・Ignores leading coefficient.
・Ignores lower-order terms.
Ex. The order of growth of the running time of this code is N 3.

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
if (a[i] + a[j] + a[k] == 0)
count++;

Typical usage. With running times.

where leading coefficient


depends on machine, compiler, JVM, ... 29
Common order-of-growth classifications

Good news. The set of functions


1, log N, N, N log N, N 2, N 3, and 2N
suffices to describe the order of growth of most common algorithms.

30
Common order-of-growth classifications

order of
name typical code framework description example T(2N) / T(N)
growth

add two
1 constant a = b + c; statement 1
numbers

while (N > 1)
log N logarithmic divide in half binary search ~1
{ N = N / 2; ... }

for (int i = 0; i < N; i++) find the


N linear loop 2
{ ... } maximum

divide
N log N linearithmic [see mergesort lecture] mergesort ~2
and conquer

for (int i = 0; i < N; i++)


check all
N2 quadratic for (int j = 0; j < N; j++) double loop 4
pairs
{ ... }

for (int i = 0; i < N; i++)


for (int j = 0; j < N; j++) check all
N3 cubic triple loop 8
for (int k = 0; k < N; k++) triples
{ ... }

exhaustive check all


2N exponential [see combinatorial search lecture] T(N)
search subsets

31
Practical implications of order-of-growth

problem size solvable in minutes


growth
rate
1970s 1980s 1990s 2000s

1 any any any any

log N any any any any

tens of hundreds of
N millions billions
millions millions

hundreds of hundreds of
N log N millions millions
thousands millions

tens of
N2 hundreds thousand thousands
thousands

N3 hundred hundreds thousand thousands

2N 20 20s 20s 30

Bottom line. Need linear or linearithmic alg to keep pace with Moore's law.
32
Binary search demo

Goal. Given a sorted array and a key, find index of the key in the array?

Binary search. Compare key against middle entry.


・Too small, go left.
・Too big, go right.
・Equal, found.

successful search for 33

6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

lo hi

33
Binary search demo

Goal. Given a sorted array and a key, find index of the key in the array?

Binary search. Compare key against middle entry.


・Too small, go left.
・Too big, go right.
・Equal, found.

successful search for 33

6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

lo mid hi
Binary search demo

Goal. Given a sorted array and a key, find index of the key in the array?

Binary search. Compare key against middle entry.


・Too small, go left.
・Too big, go right.
・Equal, found.

successful search for 33

6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

lo mid hi
Binary search demo

Goal. Given a sorted array and a key, find index of the key in the array?

Binary search. Compare key against middle entry.


・Too small, go left.
・Too big, go right.
・Equal, found.

successful search for 33

6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

lo mid hi
Binary search demo

Goal. Given a sorted array and a key, find index of the key in the array?

Binary search. Compare key against middle entry.


・Too small, go left.
・Too big, go right.
・Equal, found.

lo = hi
successful search for 33

6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

mid
return 4
Binary search: Java implementation

Trivial to implement?
・First binary search published in 1946.
・First bug-free one in 1962.
・Bug in Java's Arrays.binarySearch() discovered in 2006.
public static int binarySearch(int[] a, int key)
{
int lo = 0, hi = a.length-1;
while (lo <= hi)
{
int mid = lo + (hi - lo) / 2;
if (key < a[mid]) hi = mid - 1;
one "3-way compare"
else if (key > a[mid]) lo = mid + 1;
else return mid;
}
return -1;
}

Invariant. If key appears in the array a[], then a[lo]  key  a[hi].
38
Binary search: mathematical analysis

Proposition. Binary search uses at most 1 + log N key compares to search in


a sorted array of size N.

Def. T (N) = # key compares to binary search a sorted subarray of size ≤ N.

Binary search recurrence. T (N) ≤ T (N / 2) + 1 for N > 1, with T (1) = 1.

left or right half possible to implement with one


(floored division) 2-way compare (instead of 3-way)
Pf sketch. [assume N is a power of 2]

T (N) ≤ T (N / 2) + 1 [ given ]

≤ T (N / 4) + 1 + 1 [ apply recurrence to first term ]

≤ T (N / 8) + 1 + 1 + 1 [ apply recurrence to first term ]

≤ T (N / N) + 1 + 1 + … + 1 [ stop applying, T(1) = 1 ]

= 1 + log N 39
An N2 log N algorithm for 3-SUM

Algorithm.
input
・Step 1: Sort the N (distinct) numbers. 30 -40 -20 -10 40 0 10 5
・Step 2: For each pair of numbers a[i]
sort
and a[j], binary search for -(a[i] + a[j]). -40 -20 -10 0 5 10 30 40

binary search

(-40, -20) 60
Analysis. Order of growth is N2 log N.
(-40, -10) 50
・Step 1: N2 with insertion sort.
(-40, 0) 40
・Step 2: N 2 log N with binary search.
(-40, 5) 35

(-40, 10) 30

⋮ ⋮

(-20, -10) 30

⋮ ⋮
only count if
(-10, 0) 10
a[i] < a[j] < a[k]
⋮ ⋮ to avoid
double counting
( 10, 30) -40

( 10, 40) -50 40


Comparing programs

Hypothesis. The sorting-based N 2 log N algorithm for 3-SUM is significantly


faster in practice than the brute-force N 3 algorithm.

N time (seconds) N time (seconds)

1,000 0.1 1,000 0.14

2,000 0.8 2,000 0.18

4,000 6.4 4,000 0.34

8,000 51.1 8,000 0.96

ThreeSum.java 16,000 3.67

32,000 14.88

64,000 59.16

ThreeSumDeluxe.java

Guiding principle. Typically, better order of growth  faster in practice.


41
A NALYSIS OF A LGORITHMS
‣ Introduction
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
‣ memory
http://algs4.cs.princeton.edu
Types of analyses

Best case. Lower bound on cost.


・Determined by “easiest” input.
・Provides a goal for all inputs.
Worst case. Upper bound on cost.
・Determined by “most difficult” input.
・Provides a guarantee for all inputs.
this course
Average case. Expected cost for random input.
・Need a model for “random” input.
・Provides a way to predict performance.

Ex 1. Array accesses for brute-force 3-SUM. Ex 2. Compares for binary search.


Best: ~ ½ N3 Best: ~ 1
Average: ~ ½ N3 Average: ~ lg N
Worst: ~ ½ N3 Worst: ~ lg N
43
Commonly-used notations in the theory of algorithms

notation provides example used to

asymptotic order classify


Big Theta Θ(N2)
of growth algorithms

develop
Big Oh Θ(N2) and smaller O(N2)
upper bounds

develop
Big Omega Θ(N2) and larger Ω(N2)
lower bounds
Theory of algorithms: example 1

Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 1-SUM = “Is there a 0 in the array? ”
Upper bound.
・Ex. Brute-force algorithm for 1-SUM: Look at every array entry.
・Running time of the optimal algorithm for 1-SUM is O(N).
Lower bound.
・Ex. Have to examine all N entries.
・Running time of the algorithm for 1-SUM is Ω(N).
Optimal algorithm.
・Lower bound equals upper bound (to within a constant factor).
・Ex. Brute-force algorithm for 1-SUM is Θ(N).
46
Theory of algorithms: example 2

Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 3-SUM.
Upper bound. A specific algorithm.
・Ex. Brute-force algorithm for 3-SUM.
・Running time of the optimal algorithm for 3-SUM is O(N 3).

47
Theory of algorithms: example 2

Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 3-SUM.
Upper bound. A specific algorithm.
・Ex. Improved algorithm for 3-SUM.
・Running time of the improved algorithm for 3-SUM is O(N 2 log N ).

Lower bound.
・Running time of the improved algorithm for 3-SUM is Ω(N ).2

・Running time of a magical optimal algorithm for solving 3-SUM is Ω(N ).


(why?)

Open problems.
・Optimal algorithm for 3-SUM?
・Subquadratic algorithm for 3-SUM?
48
Algorithm design approach

Start.
・Develop an algorithm.
・Prove a lower bound.
Gap?
・Lower the upper bound (discover a new algorithm).
・Raise the lower bound (more difficult).
Golden Age of Algorithm Design.
・1970s-.
・Steadily decreasing upper bounds for many important problems.
・Many known optimal algorithms.

49
A NALYSIS OF A LGORITHMS
‣ introduction
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
‣ memory
http://algs4.cs.princeton.edu
Basics

Bit. 0 or 1. NIST most computer scientists


Byte. 8 bits.
Megabyte (MB). 1 million or 220 bytes.
Gigabyte (GB). 1 billion or 230 bytes.

64-bit machine. We assume a 64-bit machine with 8-byte pointers.


・Can address more memory.
・Pointers use more space. some JVMs "compress" ordinary object
pointers to 4 bytes to avoid this cost

51
Typical memory usage for primitive types and arrays

type bytes type bytes

boolean 1 char[] 2 N + 24

byte 1 int[] 4 N + 24

char 2 double[] 8 N + 24

int 4
one-dimensional arrays

float 4

long 8
type bytes
double 8
char[][] ~2MN
primitive types
int[][] ~4MN

double[][] ~8MN

two-dimensional arrays

52
Typical memory usage for objects in Java

Object overhead. 16 bytes.


Reference. 8 bytes.
Padding. Each object uses a multiple of 8 bytes.

Ex 1. A Date object uses 32 bytes of memory.

16 bytes (object overhead)

4 bytes (int)
4 bytes (int)
4 bytes (int)
4 bytes (padding)

32 bytes

53
Typical memory usage for objects in Java

Object overhead. 16 bytes.


Reference. 8 bytes.
Padding. Each object uses a multiple of 8 bytes.

Ex 2. A virgin String of length N uses ~ 2N bytes of memory.

16 bytes (object overhead)

8 bytes (reference to array)


2N + 24 bytes (char[] array)
4 bytes (int)
4 bytes (int)
4 bytes (int)
4 bytes (padding)

2N + 64 bytes

54
Typical memory usage summary

Total memory usage for a data type value:


・Primitive type: 4 bytes for int, 8 bytes for double, …
・Object reference: 8 bytes.
・Array: 24 bytes + memory for each array entry.
・Object: 16 bytes + memory for each instance variable.
・Padding: round up to multiple of 8 bytes.
+ 8 extra bytes per inner class object
(for reference to enclosing class)

Shallow memory usage: Don't count referenced objects.

Deep memory usage: If array entry or instance variable is a reference,


count memory (recursively) for referenced object.

55

You might also like