Analyzing Algorithms
●
Scientific Method
●
Observe – Observe some feature in the natural
world
●
Hypothesize – Create a model consistent with
the observations
●
Predict – Predict events using the hypothesis
●
Verify – Verify the predictions by making further
observations
●
Validate – Repeat until the hypothesis and
observations agree
Experiments
●
What must the experiments be like?
●
Experiments must be repeatable
●
We cannot build a one off experiment. The
experiments must be repeatable many times
●
Experiments must be understandable
●
Experiments must be like a recipe book. A
person must be able to take the analysis and
build their own experiments
Observation
●
In computer science the observation is the easy
part
●
Run the program and take a quantitative
measurement of how long the program takes
●
The size of the input is generally the largest
factor in how long the program takes to run
●
Input of 100 elements vs 1000000 elements
●
However, increased input will not make the
program go up in a linear way, necessarily
Timing the program
●
We need to be able to time the program while it
is running
●
Java has a method that will begin a timer
●
In the sample code, we use that timer to stamp
the time each time we start a new iteration of
numbers
●
We can re-use this to time many different
algorithms
Some Baselines
●
If something runs in a constant time, then that is
represented by a constant N or a number
●
Everything in computers is base 2
●
So if we have something that runs in logarithmic
time, we usually notate it as lg not Log because
of base 2. Linearithmic is the exception (Below)
●
You will find in many algorithms it becomes a
goal to run in logarithmic time
Mathematical Models
●
D.E Knuth in the early days of computer science
postulated that understanding running times
consisted of two main factors
●
The cost of running each statement
●
The frequency of execution of each statement
Cost of running each statement
●
The cost of running each statement is a piece
that we often do not have control over
●
The compiler
●
The operating system
●
The programming language itself
●
The frequency of each statement
●
Frequency of each statement is based on the
code that we write
●
Some statements are easy to analyze
●
We can look at them and see that something
will be only run once like an assignment
statement
●
However, when we have ifs, loops and other
items it gets more difficult
●
We can figure some out just by counting like in
ThreeSum.count() example IF (N(N-1)(N-2)/6
●
Others depend on the input data (for loop)
The frequency of each statement
●
Others might require other types of analysis
●
For example when we generate numbers
randomly, we can do a probabilistic analysis to
determine the expected value
●
The frequency question can lead to large
mathematical expressions
●
We often throw away lower order terms because
they don’t contribute much to the running time
●
This is referred to as tilde (~) notation
Analysis patterns
●
Often, there is a very large set of running time on
very few statements in the program
●
Loops are the main example
●
A loop is often an area where the most time is
spent in the program
●
If we have just one loop then the program will go
up linearly which is ok for a while
Analysis patterns
●
As we move into nested loops that is where
things get interesting
●
If we have a loop that has a nested loop within it
and we need to loop through 5 elements in the
first loop and the same five elements in the
second loop…
●
We wind up going through the inner loop 5 x 5
times (5 Squared). That is the key and longest
piece
●
So we can deduce that each inner loop will be
exponential growth!
Cost Model
●
From these findings, we develop a cost model
●
The cost model defines the basic operations
used by a given algorithm
●
When defining the cost, we focus on the area of
the program that will take the longest
●
Then we model that piece out
●
As said before there is generally very few places
where a program spends almost all of its time
●
We make our mathematical analysis and then
that is our tilde timing
Visuals of some of the costing
●
If we have an algorithm that has linear growth, it
grows at a constant 45 degree angle. Visual:
Visuals of some of the costing
●
If we have “exponential” growth, then we have it
starting out fast with very small numbers but
growing VERY quickly. We have different
operational names for different levels
Visuals of some of the costing
●
If we have logarithmic growth, then it will start out
slower with smaller amounts but level off quickly
Thinking about the visuals
●
We generally want linear or logorithmic growth
●
While linear growth seems like the best (It is
predictable)…
●
Logarithmic growth with very large sets can out
perform linear growth
●
Logarithmic growth has a “ceiling” of sorts
because the number of operations will stay very
static
●
Think about binary search
Common Algorithms
●
Binary Search
●
Worst case is lgN+1
●
If we have to sort first – Merge sort and quick
sort are also lg so we have 2lg but it is MOST
often faster than unsorted lists
●
We will come back to Merge sort and quick sort
when we get into sorting
Common Algorithms
●
Selection sort
●
In select sort there is a nested for loop
●
Referring back on for loops – with one nested for,
that will be N squared
●
Selection sort is N squared
Common Algorithms
●
Insertion sort
●
In insertion sort there is a nested for loop
●
Referring back on for loops – with one nested for,
that will be N squared
●
Insertion sort is N squared
Order of growth classifications
●
Constant
●
This is a program that runs in a constant time.
Therefore the program does not depend on N.
●
Most programming operations are constant and
that is why we focus on small portions of a
program
Order of growth classifications
●
Logarithmic
●
A program that is logarithmic will start out AT
THE VERY beginning to grow quickly. But that is
VERY small
●
Therefore logarithmic growth is barely slower
than constant time
●
If we can get a program to run in logarithmic time
then we are doing well
Order of growth classifications
●
Linear
●
A program that spends the same amount of time
on each input item is linear
●
Linear programs are very predictable
●
With large data sets, linear programs will still
take a long time
Order of growth classifications
●
Linearithmic
●
This describes a program that has an order of
growth of NlogN. In this case the base is 10 but
we don’t care
●
Since it is logarithmic then the order of growth
will get smaller which is key
Order of growth classifications
●
Quadratic
●
This is an N squared growth
●
If it has one nested for loop then it is quadratic
Order of growth classifications
●
Cubic
●
A program that has N cubed growth
●
This program will have two nested for loops
●
ThreeSum Example
Order of growth classifications
●
Exponential
●
These are programs that are 2 to the N
●
These programs slow down VERY quickly
●
We will never run them for large amount of
elements
●
However, they play a critical role as there are
many use cases