Advance Optimization
Advance Optimization
1
Advanced Optimization Method
1. Operation Research
The term Operations Research (OR) was first coined by MC Closky and Trefthen in 1940 in a
small town, Bowdsey of UK. The main origin of OR was during the second world war – The
military commands of UK and USA engaged several inter-disciplinary teams of scientists to
undertake scientific research into strategic and tactical military operations. Their mission was to
formulate specific proposals and to arrive at the decision on optimal utilization of scarce military
resources and also to implement the decisions effectively. In simple words, it was to uncover the
methods that can yield greatest results with little efforts. Thus it had gained popularity and was
called “An art of winning the war without actually fighting it”
The name Operations Research (OR) was invented because the team was dealing with research
on military operations. The encouraging results obtained by British OR teams motivated US
military management to start with similar activities. The work of OR team was given various
names in US: Operational Analysis, Operations Evaluation, Operations Research, System
Analysis, System Research, Systems Evaluation and so on. The first method in this direction was
simplex method of linear programming developed in 1947 by G.B Dantzig, USA. Since then,
new techniques and applications have been developed to yield high profit from least costs. Now
OR activities has become universally applicable to any area such as transportation, hospital
management, agriculture, libraries, city planning, financial institutions, construction management
and so forth. In India many of the industries like Delhi cloth mills, Indian Airlines, Indian
Railway, etc are making use of OR techniques.
2
Advanced Optimization Method
HISTORY OF OR
The term OR coined by Mc.Clostcy and Tref in the year 1940 in U.K. OR was first
used in military operations for optimum utilization of resources.
YEAR EVENTS
1940 Term OR was coined by Mc.Closky and Trefthen in U.K
• OR unit was set up in India in Hyderabad. (The Regional
Research Lab)
1949
• OR unit was set up at defence science lab.
OR gained its significance first in the defence during the World War II (1939-1945) in order to
make the best use of limited military resources and win the war. The effectiveness of OR in
defence spread interest in Government departments and industry.
3
Advanced Optimization Method
DEFINITION
“OR is defined as the application of Scientific methods, tools and techniques to problems
involving the operations of a system so as to provide to those in control of the system,
with optimum solutions to the problem”.
“OR is defined as the application of Scientific methods by interdisciplinary team to
problems involving control of organized system, so as to provide solutions which serve
best to the organization as a whole.
OR is otherwise called as the “Science of use”.
OR is the combination of management principles and mathematical concepts
(Quantitative techniques) for managerial decision-making purpose.
CHARACTERISTICS OF OR
Aims to find solutions for problems of organized systems.
Aims to provide optimum solution. Optimization means the best minimum or
maximum for the criteria under consideration.
It is the application of scientific methods, tools and techniques.
Interdisciplinary team approach is used to solve the problems.
The solutions that serve best to the organization as a whole is taken into
consideration.
2. Optimization
Linear Programming Problem: A linear programming problem is one in which we have to
find optimal value (maximum or minimum) of a linear function of several variables (called
objective function) subject to certain conditions that the variables are non-negative and
satisfying by a set of linear inequalities with variables, are sometimes called division
variables.
Objective Function: A linear function z = px + qy (p and q are constants) which has to be
maximised or minimised, is called an objective function.
Constraints: The linear inequalities or equations or restrictions on the variables of the linear
programming problem are called constraints. The conditions x ≥ 0, y ≥ 0 are called non-
negative restrictions.
Optimal Value: The maximum or minimum value of an objective function is known as its
optimal value.
Optimisation Problem: A problem, which seeks to maximise or minimise a linear function
subject to certain constraints as determined by a set of linear inequalities, is called an
optimisation problem.
4
Advanced Optimization Method
Feasible Region: The common region determined by all the constraints including non-
negative constraints x, y >0 of a linear programming problem is called the feasible region
for the problem. The region other than the feasible region is called an infeasible region. The
feasible region is always a convex polygon.
Feasible Solutions: Points within and on the boundary of the feasible region represent
feasible solutions of the constraints. Any point outside the feasible region is called an
infeasible solution.
Optimal Feasible Solution: Any point in the feasible region that gives the optimal value of
the objective function is called the optimal feasible solution.
Bounded and Unbounded Region: A feasible region of a system of linear inequalities is
said to be bounded, if it can be enclosed within a circle. Otherwise, it is called unbounded.
Optimization is the way of life. We all have finite resources and time and we want to make
the most of them. From using your time productively to solving supply chain problems for
your company – everything uses optimization. It’s an especially interesting and relevant
topic in data science.
3. Linear programming (LP)
Linear programming (LP) is one of the simplest ways to perform optimization. It helps you
solve some very complex optimization problems by making a few simplifying assumptions.
As an analyst, you are bound to come across applications and problems to be solved by
Linear Programming.
For some reason, LP doesn’t get as much attention as it deserves while learning data
science. So, I thought let me do justice to this awesome technique. I decided to write an
article that explains Linear programming in simple English. I have kept the content as
simple as possible. The idea is to get you started and excited about Linear Programming.
Note- If you want to learn this in a course format, we have curated this free course for you-
Linear Programming for Data Science Professionals
5
Advanced Optimization Method
Example of a linear programming problem
Let’s say a FedEx delivery man has 6 packages to deliver in a day as shown in figure 1. The
warehouse is located at point A. The 6 delivery destinations are given by U, V, W, X, Y,
and Z. The numbers on the lines indicate the distance between the cities. To save on fuel
and time the delivery person wants to take the shortest route.
6
Advanced Optimization Method
Formulating a problem –
Let’s manufacture some chocolates
Example: Consider a chocolate manufacturing company that produces only two types of
chocolate – A and B. Both the chocolates require Milk and Choco only. To manufacture
each unit of A and B, the following quantities are required:
Each unit of A requires 1 unit of Milk and 3 units of Choco
Each unit of B requires 1 unit of Milk and 2 units of Choco
The company kitchen has a total of 5 units of Milk and 12 units of Choco. On each sale, the
company makes a profit of
Rs 6 per unit A sold
Rs 5 per unit B sold.
Now, the company wishes to maximize its profit. How many units of A and B should it
produce respectively?
Solution: The first thing I’m gonna do is represent the problem in a tabular form for better
understanding.
Milk Choco Profit per unit
A 1 3 Rs 6
B 1 2 Rs 5
Total 5 12
7
Advanced Optimization Method
For the company to make maximum profit, the above inequalities have to be satisfied.
This is called formulating a real-world problem into a mathematical model.
Let us define some terminologies used in Linear Programming using the above example.
Decision Variables: The decision variables are the variables that will decide my output.
They represent my ultimate solution. To solve any problem, we first need to identify the
decision variables. For the above example, the total number of units for A and B denoted by
X & Y respectively are my decision variables.
Non-negativity restriction: For all linear programs, the decision variables should always
take non-negative values. This means the values for decision variables should be greater
than or equal to 0.
The process to formulate a Linear Programming problem
Let us look at the steps of defining a Linear Programming problem generically:
Identify the decision variables
Write the objective function
Mention the constraints
Explicitly state the non-negativity restriction
For a problem to be a linear programming problem, the decision variables, objective
function and constraints all have to be linear functions.
If all the three conditions are satisfied, it is called a Linear Programming Problem.
8
Advanced Optimization Method
A graphical method involves formulating a set of linear inequalities subject to the
constraints. Then the inequalities are plotted on an X-Y plane. Once we have plotted all the
inequalities on a graph the intersecting region gives us a feasible region. The feasible region
explains what all values our model can take. And it also gives us the optimal solution.
Let’s understand this with the help of an example.
Example: A farmer has recently acquired a 110 hectares piece of land. He has decided to
grow Wheat and barley on that land. Due to the quality of the sun and the region’s excellent
climate, the entire production of Wheat and Barley can be sold. He wants to know how to
plant each variety in the 110 hectares, given the costs, net profits and labor requirements
according to the data shown below figure 2:
Wheat 100 50 10
The farmer has a budget of US$10,000 and availability of 1,200 man-days during the
planning horizon. Find the optimal solution and the optimal value.
Solution: To solve this problem, first we gonna formulate our linear program.
Formulation of Linear Problem
Step 1: Identify the decision variables
The total area for growing Wheat = X (in hectares)
The total area for growing Barley = Y (in hectares)
X and Y are my decision variables.
2. The next constraint is the upper cap on the availability of the total number of man-days
for the planning horizon. The total number of man-days available is 1200. As per the table,
we are given the man-days per hectare for Wheat and Barley.
10X + 30Y ≤ 1200
3. The third constraint is the total area present for plantation. The total available area is 110
hectares. So the equation becomes,
X + Y ≤ 110
10
Advanced Optimization Method
Figure 2. Graphical Method
Therefore from the output, we see that the organization should produce 88 units of toy A
and 20 units of toy B and the maximum profit for the organization will be Rs.2600.
5. Simplex Method
Simplex Method is one of the most powerful & popular methods for linear programming.
The simplex method is an iterative procedure for getting the most feasible solution. In this
method, we keep transforming the value of basic variables to get maximum value for the
objective function.
A linear programming function is in its standard form if it seeks to maximize the objective
function. subject to
constraints,
11
Advanced Optimization Method
where, and . After adding slack variables, the corresponding system
of constraint equation is,
. . . .
where,
The variables, ………………. are called slack variables. They are non-
negative numbers that are added to remove the inequalities from an equation.
The above explanation gives the theoretical explanation of the simplex method. Now, I am
gonna explain how to use the simplex method in real life using Excel.
Example: The advertising alternatives for a company include television, newspaper and
radio advertisements. The cost for each medium with its audience coverage is given below.
The local newspaper limits the number of advertisements from a single company to
ten. Moreover, in order to balance the advertising among the three types of media, no more
than half of the total number of advertisements should occur on the radio. And at least 10%
should occur on television. The weekly advertising budget is $18,200. How many
advertisements should be run in each of the three types of media to maximize the total
audience?
Solution: First I am going to formulate my problem for a clear understanding.
12
Advanced Optimization Method
Step 3: Write down the constraints
Now, I will mention each constraint one by one.
It is clearly given that we have a budget constraint. The total budget which can be allocated
is $18,200. And the individual costs per television, newspaper and radio advertisement is
$2000, $600 and $300 respectively. This can be represented by the equation,
The last constraint is the number of advertisements on the radio cannot be more than half of
the total number of advertisements. It can be represented as
Now, I have formulated my linear programming problem. We are using the simplex method
to solve this. I will take you through the simplex method one by one.
To reiterate all the constraints are as follows. I have simplified the last two equations to
bring them in standard form.
variables, , and .
So our equations are as follows:
On solving the objective function you will get the maximum weekly audience as 1,052,000.
You can follow the tutorial here to solve the equation. To solve a linear program in excel,
follow this tutorial.
13
Advanced Optimization Method
6. Applications of Linear Programming
Linear programming and Optimization are used in various industries. The manufacturing
and service industry uses linear programming on a regular basis. In this section, we are
going to look at the various applications of linear programming.
Manufacturing industries use linear programming for analyzing their supply chain
operations. Their motive is to maximize efficiency with minimum operation cost. As per the
recommendations from the linear programming model, the manufacturer can reconfigure
their storage layout, adjust their workforce and reduce the bottlenecks. Here is a small
Warehouse case study of Cequent a US-based company, watch this video for a more clear
understanding.
Linear programming is also used in organized retail for shelf space optimization. Since the
number of products in the market has increased in leaps and bounds, it is important to
understand what does the customer want. Optimization is aggressively used in stores like
Walmart, Hypercity, Reliance, Big Bazaar, etc. The products in the store are placed
strategically keeping in mind the customer shopping pattern. The objective is to make it
easy for a customer to locate & select the right products. This is subject to constraints like
limited shelf space, a variety of products, etc.
Optimization is also used for optimizing Delivery Routes. This is an extension of the
popular traveling salesman problem. The service industry uses optimization for finding the
best route for multiple salesmen traveling to multiple cities. With the help of clustering and
greedy algorithm, the delivery routes are decided by companies like FedEx, Amazon, etc.
The objective is to minimize the operation cost and time.
Optimizations are also used in Machine Learning. Supervised Learning works on the
fundamental of linear programming. A system is trained to fit on a mathematical model of a
function from the labeled input data that can predict values from an unknown test data.
Well, the applications of Linear programming don’t end here. There are many more
applications of linear programming in real-world like applied by Shareholders, Sports, Stock
Markets, etc. Go on and explore further.
Step 1. All the constraints should be converted to equations except for the non-
negativity restrictions which remain as inequalities (≥0).
Step 2. The right side element of each constraint should be made non-negative.
Step 3. All variables must have non-negative values.
Step 4. The objective function should be of maximization form.
4.1.2 Slack variables
If a constraint has less than or equal sign, then in order to make it on equality we have to
add something positive to the left hand side. The non-negative variable which is added to
the left hand side of the constraint to convert it into equation is called the slack variable. For
example, consider the constraints.
3X1 + 5X2 ≤ 2, 7X1 + 4X2 ≤ 5, X1, X2 ≥ 0
We add the slack variables S1 ≥ 0, S2 ≥ 0 on the left hand sides of above inequalities
respectively to obtain
3X1+5X2+S1 = 2
7X1+4X2+S2 = 5
X1, X2, S1, S2 ≥ 0
15
Advanced Optimization Method
If a constraint has greater than or equal to sign, then in order to make it an equality we have
to subtract something non-negative from its left hand side. The positive variable which is
subtracted from the left hand side of the constraint to convert it into equation is called the
surplus variable.
For example, consider the constraints.
3X1 + 5X2 ≥ 2, 2X1 + 4X2 ≥ 5, X1, X2 ≥0
We subtract the surplus variables S3 ≥0, S4 ≥ 0 on the left hand sides of above inequalities
respectively to obtain
3X1+5X2 -S3 = 2
2X1 + 4X2 -S4 = 5
X1, X2, X3, X4 ≥0
7.1.4 Solution to LPP
Any set X = {X1, X2, X3, Xn+m} of variables is called a solution to LP problem, if it satisfies
the set of constraints only.
7.1.5 Feasible solution (FS)
Any set X = {X1, X2, X3 ,Xn+m} of variables is called a feasible solution of L.P. problem, if
it satisfies the set of constraints as well as non-negativity restrictions.
7.1.6 Basic solution (BS)
For a system of m simultaneous linear equations in n variables (n>m), a solution obtained by
setting (n-m) variables equal to zero and solving for the remaining variables is called a basic
solution. Such m variables (of course, some of them may be zero) are called basic variables
and remaining (n-m) zero-valued variables are called non-basic variables.
4.4 Computational Aspect of Simplex Method for Maximization Problem
Step 1: Formulate the linear programming model. If we have n-decision variables X1,
X2, Xn and m constraints in the problem, then mathematical formulation of L P problem is
Maximize Z=C1X1+ C2X2++CnXn
Subject to the constraints:
a11 X1+ a12 X2+ +a1n Xn ≤ b1
a21 X1+ a22 X2 + a2n Xn ≤ b2
am1 X1+ am2 X2 + + amn Xn ≤ bm
X1, X2, Xn ≥ 0
Step 2: Express the mathematical model of LP problem in the standard form by adding
slack variables in the left hand side of the constraints and assign zero coefficient to these
variables in the objective function. Thus we can restate the problem in terms of equations as
follows:
Maximize
16
Advanced Optimization Method
Subject to the constraints:
a11 X1+ a12 X2+ + a1n Xn +Xn+1 = b1
a21 X1+ a22 X2++ a2n Xn +Xn+2 = b2
.
am1 X1+ am2X2++amn Xn +Xn+m = bm
Step 3: Design the initial feasible solution .An initial basic feasible solution is obtained by
setting X1=X2= =Xn =0. Thus, we get Xn+1=b1, Xn+2 =b2 ,, Xn+m =bm.
Step 4: Construct the starting (initial) simplex tableau. For computational efficiency and
simplicity, the initial basic feasible solution, the constraints of the standard LP problem as
well as the objective function can be displayed in a tabular form, called the Simplex
Tableau as shown below.
The interpretation of the data in the above Tableau is given as under. Other simplex tableau
will have similar interpretations.
(i) The first row, called the objective row of the simplex table indicates the values
of Cj (j subscript refer to the column number) which are the coefficients of the (m +
n) variables in the objective function. These coefficients are obtained directly from
the objective function and the value Cj would remain the same in the succeeding
tables. The second row of the table provides the major column headings for the table
17
Advanced Optimization Method
and these column headings remain unchanged in the succeeding tables of the
Simplex Method.
(ii) The first column labelled CB, also known as objective column, lists the coefficient of
the current basic variables in the objective function. The second
column labelled Basic variables points out the basic variables in the basis, and in the
initial simplex tableau these basic variables are the slack variables. The third
column labelled Solution values(= xB), indicates the resources or the solution values
of the basic variables.
(iii) The body matrix (under non-basic variables) in the initial simplex tableau consists of
the coefficients of the decision variables in the constraint set.
(iv)The identity matrix in the initial simplex tableau represents the coefficient of the
slack variables that have been added to the original inequalities to make them
equation. The matrix under non-basic variables in the simplex tableau is
called coefficient matrix. Each simplex tableau contains an identity matrix under the
basic variables.
(v) To find an entry in the Zj row under a column, we multiply the entries of that
column by the corresponding entries of CBcolumn and add the products, i.e., Z
= .
The Zj entry under the Solution columngives the current value of the objective function.
(vi) The final row labeled ∆j = Zj - Cj called the index (or net evaluation) row, is used to
determine whether or not the current solution is optimal. The calculation of Zj -
Cj row simply involves subtracting each Zj value from the corresponding Cj value
for that column, which is written at the top of that column. Each entry in the ∆j
row represents the net contribution (or net marginal improvement) to the objective
function that results by introducing one unit of each of the respective column
variables.
Step 5: Test the Solution for Optimality. Examine the index row of the above simplex
tableau. If all the elements in the index row are positive then the current solution is optimal.
If there exist some negative values, the current solution can further be improved by
removing one basic variable from the basis and replacing it by some non-basic one.
Step 6: Revision of the Current Simplex Tableau. At each iteration, the Simplex Method
moves from the current basic feasible solution to a better basic feasible solution. This
involves replacing one current basic variable (called the departing variable) by a new non-
basic variable (called the entering variable).
(a) Determine which variable to enter into the solution-mix net. One way of doing
this is by identifying the column (and hence the variable) with the most negative
number in the ∆j row of the previous table.
18
Advanced Optimization Method
(or solution values) column by the corresponding number in the pivot column
selected in (a), i.e., we compute the respective ratios b1/a1j, b2/a2j, bm/amj (only for
those aijs; i=1,2, m which are strictly positive). These quotients are written in the
last column labelled Minimum Ratio of the simplex tableau. The row
corresponding to smallest of these non-negative ratios is called the pivot (or key)
row and the corresponding basic variable will leave the basis. Let the minimum
of { b1/a1j, b2/a2j, bm/amj ; aij > 0} be bk/akj , then corresponding variables in the
pivot row sk will be termed as outgoing (or departing) variable in the next tableau
to be constructed just after we put an arrow → of type to right of kth row of the
simplex tableau 1.
(c) Identify the pivot number. The non-zero positive number that lies at the
intersection of the pivot column and pivot row of the given table is called the
pivot (or key) number. We place a circle around the number.
Step 7: Evaluate (update) the new solution. After identifying the entering and departing
variable, find the new basic feasible solution by constructing a new simplex tableau from
the current one by using the following steps:
(a) Compute new values for the pivot row by simply dividing every element of the
pivot row by the pivot number.
(b) New entries in the CB column and XB column are entered in the new table of the
current solution
(c) Compute new values for each of the remaining rows by using the following
formula
New row numbers=Number in old rows-{(corresponding number above or below
pivot number) x (corresponding number in the row replaced in (a))}
(d) Test for optimality. Compute the Zj and index rows as previously demonstrated
in the initial simplex tableau. If all numbers in the index row are either zero or
positive, an optimal solution has been made attained. i.e., there is no variable
which can be introduced in the solution to cause the objective function value to
increase.
4. Revise the solution. If any of the numbers in the index (∆j = Zj - Cj) row are negative,
repeat the entire steps 5 & 6 again until an optimal solution has been obtained.
Example 1
A firm produces three products A, B, and C each of which passes through three different
departments fabrication, finishing, packaging. Each unit of product A requires 3, 4 and 2
hours respectively, B requires 5, 4 and 4 hours respectively and C requires 2, 4 and 5 hours
respectively in 3 departments respectively. Every day 60 hours are available in fabrication
department, 72 hours in finishing and 100 hours in packaging department. If unit
19
Advanced Optimization Method
contribution of unit A is Rs. 5, Rs. 10 for B and Rs. 3 for C. Then determine number of units
of each product so that total contribution to cost is maximized and also determine if any
capacity would remain unutilized.
Solution:
Step 1: Formulate this as LPP. Let X1, X2 and X3 be the number of units produced of the
products A, B and C respectively.
Step 3: Find the initial feasible solution .An initial basic feasible solution is obtained by
setting X1 =0, X2 = 0 and X3 = 0. Thus, we get S1 = 60, S2 = 72 and S3 =100.
Step 4: Construct the starting (initial) simplex tableau.
Cj 5 10 8 0 0 0
→
B.V. CB XB X1 X2 X3 S1 S2 S3 Minimum
Ratio
R1 S1 0 60 3 (5) 2 1 0 0 60/5=12
R2 S2 0 72 4 4 4 0 1 0 72/4=18
R3 S3 0 100 2 4 5 0 0 1 100/4=25
Zj 0 0 0 0 0 0
−5 −10 −8 0 0 0
Step 5: The most negative value of ∆j is −10 hence X2 is the incoming variable (↑) and the
least positive minimum ratio is 12 hence S1 is the outgoing variable (⟶). The element under
column X2 and row R1 is the key element i.e. 5 so divide each element of row R1 by 5
(i.e. ). Subtract appropriate multiples of this new row from the remaining rows, so
20
Advanced Optimization Method
as to obtain zeros in the remaining positions. Performing the row
operations
we get the second Simplex tableau as
Cj → 5 10 8 0 0 0
B.V. CB XB X1 X2 X3 S1 S2 S3 M.R.
X2 10 12 3/5 1 2/5 1/5 0 0 12/2/5=30
S2 0 24 8/5 0 12/5 −4/5 1 0 24/12/5=10
S3 0 52 -2/5 0 17/5 −4/5 0 1 52/17/5=15.294
Zj 6 10 4 2 0 0
1 0 −4 2 0 0
Step 6: The most negative value of ∆j is -4 hence X3 is the incoming variable (↑) and the
least positive minimum ratio is 10 hence S2 is the outgoing variable (). The element under
column X2 and row
R1 is the key element i.e. 5 so divide each element of row R2b by 12/5 (i.e. Rc → Rb *
5/12). Subtract appropriate multiples of this new row from the remaining rows, so as to
obtain zeros in the remaining positions. Performing the
row operations .
We get the third Simplex tableau as
Cj → 5 10 8 0 0 0
B.V. CB XB X1 X2 X3 S1 S2 S3
X2 10 8 1/3 1 0 1/3 −1/6 0
X3 8 10 2/3 0 1 −1/3 5/12 0
S3 0 18 −8/3 0 0 1/3 −17/12 1
Zj 26/3 10 8 2/3 5/3 0
j 11/3 0 0 2/3 5/3 0
It is apparent from this table that all ∆j =Zj − Cj are positive and therefore an optimum
solutions is reached . So X1 = 0, X2 = 8, X3 = 10
21
Advanced Optimization Method
UNIT – II – Advanced Optimization Methods – SPRA5301
1
Advanced Optimization Method
1. Direct Search
A direct search method is a method which relies only on evaluating f (x) on a sequence x1, x2, · · ·
and comparing values in order to calculate a minimizer of f(x) . Direct methods are usually applied
in the following circumstance
The first two methods deal with a function of a single variable, the rest four deal with a function of
several variables.
2. Golden section
The first algorithm that I learned for root-finding in my undergraduate numerical analysis class
(MACM 316 at Simon Fraser University) was the bisection method. It’s very intuitive and easy to
implement in any programming language (I was using MATLAB at the time). The bisection
method can be easily adapted for optimizing 1-dimensional functions with a slight but intuitive
modification. As there are numerous books and web sites on the bisection method, I will not dwell
on it in my blog post.
Instead, I will explain a clever and elegant way to modify the bisection method with the golden
ratio that results in faster computation; I learned this method while reading “A First Course in
Statistical Programming with R” by John Braun and Duncan Murdoch. Using a script in R to
implement this special algorithm, I will illustrate how to minimize a non-differentiable function with
the golden section search method. In a later post (for the sake of brevity), I will use the same
method to show that the minimizer of the sum of the absolute deviations from a univariate data set is
the median. The R functions and script for doing everything are in another post as shown in figure
1.
2
Advanced Optimization Method
Figure 1. Search Space
The Fibonacci spiral approximates the golden spiral, a logarithmic spiral whose growth factor is
the golden ratio. Source: Dicklyon , Minimization with the Bisection Method
Assume that a single-variable continuous function has a unique minimum (and, thus, a unique
minimizer) in a closed interval [a, b]. If this function is differentiable in [a, b], then calculus can be
used easily to find the minimizer. However, if the function has a cusp or a kink, then it’s not
differentiable at that point, and numerical methods are needed instead. For example, consider the
following function as shown in figure 2.
3
Advanced Optimization Method
Figure 2. Golden Section Search Space
Within the interval [1, 3], this function has a unique minimizer near x = 1.5, but it also has a cusp at
x = 2. Thus, it is not differentiable at x = 2. A simple, stable, but slow numerical method to find the
minimizer is the bisection method.
To use the bisection method to find the minimizer, we can
1) Pick 2 points, and , with and both points in [a, b].
2) Find and .
3) a) If , then the minimizer must be to the right of , because this unique minimizer
must be less than , and the function is decreasing from to .
b) Otherwise (i.e. ), the minimizer must be to the left of with logically analogous
reasoning.
4) i) If 3a is true, then the minimizer must be in the interval ; repeat the algorithm from Step 1
until the interval length is smaller than some pre-set tolerance.
ii) If 3b is true, then the minimizer must be in the interval ; repeat the algorithm from Step
1 until the interval length is smaller than some pre-set tolerance.
To summarize this algorithm, every iteration beings with
4
Advanced Optimization Method
– the boundaries of the interval ,
– 2 test points for the argument, and .
The function is evaluated at and , and, based on whether or ,a
new set of boundaries and test points are set for the next iteration.
This number has many special properties that you can easily find on the web. The one that will save
computation time is
Let the lower and upper bounds of the interval of interest be and , respectively. Now, let’s set the
test points as
The advantage of setting the test points as these above values comes when one of the new test points
is updated. Suppose that
.
Then the minimizer must be to the right of . Thus, becomes the new lower bound, which I will
denote as . The beauty of the golden ratio comes in calculating the new lower test point, which I
will denote as .
Recall that
Substituting the right-hand side of this above equation for in the calculation of , some very
simple algebraic manipulation yields
Now, taking advantage of that special property of the golden ratio, , some more
simple algebraic manipulation yields
5
Advanced Optimization Method
Thus, the new lower test point is just the old upper test point. This saves computation time, because
we don’t need to compute a new lower test point; we can just recycle the value that we knew from
the old upper test point!
Similar logic for the case will show that the new upper test point is simply the old
lower test point. Notationally, if
,
then
.
Using the Script to Numerically Minimize a Function
In another post, you can find the function for implementing this special golden search method and
the script for minimizing the above function with the cusp. This script is basically the same as the
one written by Braun and Murdoch on Pages 134-135 in their book. My script used slightly more
self-evident variable names and included debugging statements to help me to identify why my
function was not working properly when I first wrote it. Debugging statements are statements that
show the values of key variables being computed in the flow of a script (and often
within loops or branches) to identify the point in the flow at which the script stopped working
properly. My mathematical and statistical programming skills became much better when I began
using debugging statements. They are good for debugging, but should generally be commented out
when the script is corrected and working properly. Since the problems that I am solving are quite
simple, I kept them in the script and will show part of the output to illustrate why they are useful.
Recall the function with a cusp that I plotted above.
Using my function, the initial boundaries 1 and 3, and an absolute tolerance of , here is what
my function, golden.section.search(), returned in the first iteration. > golden.section.search(f, 1, 3,
1e-5)
Iteration # 1
f1 = 0.8196601
f2 = 1.763932
f2 > f1
New Upper Bound = 2.236068
New Lower Bound = 1
New Upper Test Point = 1.763932
New Lower Test Point = 1.472136
This function used 26 iterations to get the final answer.
6
Advanced Optimization Method
Iteration # 26
f1 = 0.75
f2 = 0.75
f2 > f1
New Upper Bound = 1.500003
New Lower Bound = 1.499995
New Upper Test Point = 1.5
New Lower Test Point = 1.499998
The Fibonacci sequence has the property that a number is the sum of its two predecessors. Therefore
the sequence can be computed by repeated addition. The ratio of two consecutive numbers
approaches the Golden ratio, 1.618... Binary search works by dividing the seek area in equal parts
(1:1). Fibonacci search can divide it into parts approaching 1:1.618 while using the simpler
operations.
If the elements being searched have non-uniform access memory storage (i. e., the time needed to
access a storage location varies depending on the location accessed), the Fibonacci search may have
the advantage over binary search in slightly reducing the average time needed to access a storage
location. If the machine executing the search has a direct mapped CPU cache, binary search may
lead to more cache misses because the elements that are accessed often tend to gather in only a few
7
Advanced Optimization Method
cache lines; this is mitigated by splitting the array in parts that do not tend to be powers of two. If
the data is stored on a magnetic tape where seek time depends on the current head position, a
tradeoff between longer seek time and more comparisons may lead to a search algorithm that is
skewed similarly to Fibonacci search.
Fibonacci search is derived from Golden section search, an algorithm by Jack Kiefer (1953) to
search for the maximum or minimum of a uni modal function in an interval.
4. Algorithm
Let k be defined as an element in F, the array of Fibonacci numbers. n = Fm is the array size. If n is
not a Fibonacci number, let Fm be the smallest number in F that is greater than n.
The array of Fibonacci numbers is defined where Fk+2 = Fk+1 + Fk, when k ≥ 0, F1 = 1,
and F0 = 0.
To test whether an item is in the list of ordered numbers, follow these steps:
Set k = m.
If k = 0, stop. There is no match; the item is not in the array.
Compare the item against element in Fk−1.
Step 1. [Initialize]
i ← Fk, p ← Fk-1, q ← Fk-2 (throughout the algorithm, p and q will be consecutive Fibonacci
numbers)
Step 2. [Compare]
If K < Ki, go to Step 3; if K > Ki go to Step 4; and if K = Ki, the algorithm terminates successfully.
Step 3. [Decrease i]
8
Advanced Optimization Method
If q=0, the algorithm terminates unsuccessfully. Otherwise set (i, p, q) ← (i - q, q, p - q) (which
moves p and q one position back in the Fibonacci sequence); then return to Step 2
Step 4. [Increase i]
If p=1, the algorithm terminates unsuccessfully. Otherwise set (i,p,q) ← (i + q, p - q, 2q - p) (which
moves p and q two positions back in the Fibonacci sequence); and return to Step 2
The two variants of the algorithm presented above always divide the current interval into a larger
and a smaller subinterval. The original algorithm, however, would divide the new interval into a
smaller and a larger subinterval in Step 4. This has the advantage that the new i is closer to the
old i and is more suitable for accelerating search
5. Bi Section Method
The method is also called the interval halving method, the binary search method or the dichotomy
method. This method is used to find root of an equation in a given interval that is value of ‘x’ for
which f(x) = 0.
The method is based on The Intermediate Value Theorem which states that if f(x) is a continuous
function and there are two real numbers a and b such that f(a)*f(b) 0 and f(b) < 0), then it is
guaranteed that it has at least one root between them.
Steps:
Find middle point c= (a + b)/2 .
If f(c) == 0, then c is the root of the solution.
Else f(c) != 0
If value f(a)*f(c) < 0 then root lies between a and c. So we recur for a and c
Else If f(b)*f(c) < 0 then root lies between b and c. So we recur b and c.
Else given function doesn’t follow one of assumptions.
Since root may be a floating point number, we repeat above steps while difference between a and b
is less than a value (A very small value) as shown in figure 3.
9
Advanced Optimization Method
Figure 3. Bi Section
In mathematics, the bisection method is a root-finding method that applies to any continuous
functions for which one knows two values with opposite signs. The method consists of
repeatedly bisecting the interval defined by these values and then selecting the subinterval in which
the function changes sign, and therefore must contain a root. It is a very simple and robust method, but
it is also relatively slow. Because of this, it is often used to obtain a rough approximation to a solution
which is then used as a starting point for more rapidly converging methods. The method is also called
the interval halving method, the binary search method, or the dichotomy method. For polynomials,
more elaborated methods exist for testing the existence of a root in an interval (Descartes' rule of
signs, Sturm's theorem, Budan's theorem. They allow extending bisection method into efficient
algorithms for finding all real roots of a polynomial; see Real-root isolation. The method is applicable
for numerically solving the equation f(x) = 0 for the real variable x, where f is a continuous
function defined on an interval [a, b] and where f(a) and f(b) have opposite signs. In this
case a and b are said to bracket a root since, by the intermediate value theorem, the continuous
function f must have at least one root in the interval (a, b). At each step the method divides the interval
in two by computing the midpoint c = (a+b) / 2 of the interval and the value of the function f(c) at that
point. Unless c is itself a root (which is very unlikely, but possible) there are now only two
possibilities: either f(a) and f(c) have opposite signs and bracket a root, or f(c) and f(b) have opposite
signs and bracket a root. The method selects the subinterval that is guaranteed to be a bracket as the
new interval to be used in the next step. In this way an interval that contains a zero of is reduced in
width by 50% at each step. The process is continued until the interval is sufficiently small. Explicitly,
if f(a) and f(c) have opposite signs, then the method sets c as the new value for b, and if f(b) and f(c)
10
Advanced Optimization Method
have opposite signs then the method sets c as the new a. (If f(c)=0 then c may be taken as the solution
and the process stops.) In both cases, the new f(a) and f(b) have opposite signs, so the method is
applicable to this smaller interval.
6. Exhaustive Method
For discrete problems in which no efficient solution method is known, it might be necessary to test
each possibility sequentially in order to determine if it is the solution. Such exhaustive examination of
all possibilities is known as exhaustive search, direct search, or the "brute force" method. Unless it
turns out that NP-problems are equivalent to P-problems, which seems unlikely but has not yet been
proved, NP-problems can only be solved by exhaustive search in the worst case.
11
Advanced Optimization Method
UNIT – III – Advanced Optimization Methods – SPRA5301
1. Decision Tree Analysis
A Decision Tree Analysis is a graphic representation of various alternative solutions that are
available to solve a problem. The manner of illustrating often proves to be decisive when
making a choice. A Decision Tree Analysis is created by answering a number of questions
that are continued after each affirmative or negative answer until a final choice can be made.
A Decision Tree Analysis is a scientific model and is often used in the decision making
process of organizations. When making a decision, the management already envisages
alternative ideas and solutions. By using a decision tree, the alternative solutions and
possible choices are illustrated graphically as a result of which it becomes easier to make a
well-informed choice. This graphic representation is characterized by a tree-like structure in
which the problems in decision making can be seen in the form of a flowchart, each with
branches for alternative choices.
The Decision Tree Analysis makes good use of the ‘what if’ thought. There are several
alternatives that consider both the possible risks and benefits that are brought about by
certain choices. The possible alternatives are also made clearly visible and therefore the
decision tree provides clarity with respect to the consequences of any decisions that will be
made.
Representation
There are several ways in which a decision tree can be represented. This Analysis is
commonly represented by lines, squares and circles. The squares represent decisions, the
lines represent consequences and the circles represent uncertain outcomes. By keeping the
lines as far apart as possible, there will be plenty of space to add new considerations and
ideas.
The representation of the decision tree can be created in four steps:
Describe the decision that needs to be made in the square.
Draw various lines from the square and write possible solutions on each of the lines.
Put the outcome of the solution at the end of the line. Uncertain or unclear decisions are put
in a circle. When a solution leads to a new decision, the latter can be put in a new square.
Each of the squares and circles are reviewed critically so that a final choice can be made as
shown in figure 1 and 2.
2
Advanced Optimization Method
Figure 1. Decision Tree Model 1
3
Advanced Optimization Method
The above example illustrates that, in all likelihood, the company will opt for 1-2-2, because
the forecast of this decision is that profits will increase by 12%. This Analysis is particularly
useful in situations in which it is considered desirable to develop various alternatives of
decisions in a structured manner as this will present a clear substantiation. This method is
increasingly used by medical practitioners and technicians as it enables them to make a
diagnosis or determine car problems.
3. Utility Theory
Utility theory bases its beliefs upon individuals’ preferences. It is a theory postulated in
economics to explain behavior of individuals based on the premise people can consistently
rank order their choices depending upon their preferences. Each individual will show
different preferences, which appear to be hard-wired within each individual. We can thus
state that individuals’ preferences are intrinsic. Any theory, which proposes to capture
preferences, is, by necessity, abstraction based on certain assumptions. Utility theory is
a positive theory that seeks to explain the individuals’ observed behavior and choices. The
distinction between normative and positive aspects of a theory is very important in the
discipline of economics. Some people argue that economic theories should be normative,
which means they should be prescriptive and tell people what to do. Others argue, often
successfully, that economic theories are designed to be explanations of observed behavior of
agents in the market, hence positive in that sense. This contrasts with a normative theory,
one that dictates that people should behave in the manner prescribed by it. Instead, it is only
since the theory itself is positive, after observing the choices that individuals make, we can
draw inferences about their preferences. When we place certain restrictions on those
preferences, we can represent them analytically using a utility function—a mathematical
formulation that ranks the preferences of the individual in terms of satisfaction different
consumption bundles provide. Thus, under the assumptions of utility theory, we can assume
that people behaved as if they had a utility function and acted according to it. Therefore, the
fact that a person does not know his/her utility function, or even denies its existence, does
not contradict the theory. Economists have used experiments to decipher individuals’ utility
functions and the behavior that underlies individuals’ utility.
To begin, assume that an individual faces a set of consumption “bundles.” We assume that
individuals have clear preferences that enable them to “rank order” all bundles based on
desirability, that is, the level of satisfaction each bundle shall provide to each individual.
This rank ordering based on preferences tells us the theory itself has ordinal utility—it is
designed to study relative satisfaction levels. As we noted earlier, absolute satisfaction
depends upon conditions; thus, the theory by default cannot have cardinal utility, or utility
that can represent the absolute level of satisfaction. To make this theory concrete, imagine
that consumption bundles comprise food and clothing for a week in all different
4
Advanced Optimization Method
combinations, that is, food for half a week, clothing for half a week, and all other possible
combinations.
Rationality: This is the most important and controversial assumption that underlies all of
utility theory. Under the assumption of rationality, individuals’ preferences avoid any kind
of circularity; that is, if bundle A is preferred to B, and bundle B is preferred to C, then A is
5
Advanced Optimization Method
also preferred to C. Under no circumstances will the individual prefer C to A. You can
likely see why this assumption is controversial. It assumes that the innate preferences (rank
orderings of bundles of goods) are fixed, regardless of the context and time.
Whenever these four assumptions are satisfied, then the preferences of the individual can be
represented by a well-behaved utility function. The assumption of convexity of preferences
is not required for a utility function representation of an individual’s preferences to exist.
But it is necessary if we want that function to be well behaved. Note that the assumptions
lead to “a” function, not “the” function. Therefore, the way that individuals represent
preferences under a particular utility function may not be unique. Well-behaved utility
functions explain why any comparison of individual people’s utility functions may be a
futile exercise (and the notion of cardinal utility misleading). Nonetheless, utility functions
are valuable tools for representing the preferences of an individual, provided the four
assumptions stated above are satisfied. For the remainder of the chapter we will assume that
preferences of any individual can always be represented by a well-behaved utility function.
As we mentioned earlier, well-behaved utility depends upon the amount of wealth the
person owns. Utility theory rests upon the idea that people behave as if they make decisions
by assigning imaginary utility values to the original monetary values. The decision maker
sees different levels of monetary values, translates these values into different, hypothetical
terms (“utils”), processes the decision in utility terms (not in wealth terms), and translates
the result back to monetary terms. So while we observe inputs to and results of the decision
in monetary terms, the decision itself is made in utility terms. And given that utility denotes
levels of satisfaction, individuals behave as if they maximize the utility, not the level of
observed dollar amounts.
The analytic hierarchy process (AHP) is a structured technique for organizing and
analyzing complex decisions, based on mathematics and psychology. It was developed
by Thomas L. Saaty in the 1970s who partnered with Ernest Forman to develop Expert
Choice in 1983, and has been extensively studied and refined since then. It represents an
accurate approach for quantifying the weights of decision criteria. Individual experts’
experiences are utilized to estimate the relative magnitudes of factors through pair-wise
comparisons. Each of the respondents has to compare the relative importance between the
6
Advanced Optimization Method
two items under special designed questionnaire (note that while most of the surveys adopted
the five point likert scale, AHP's questionnaire is 9 to 1 to 9, see Li et al. (2019)
Now let me explain what paired comparison is. It is always easier to explain by an example.
Suppose we have two fruits Apple and Banana. I would like to ask you, which fruit you like
better than the other and how much you like it in comparison with the other. Let us make a
relative scale to measure how much you like the fruit on the left (Apple) compared to the
fruit on the right (Banana) as shown figure 3.
If you like the apple better than banana, you thick a mark between number 1 and 9 on left
side, while if you favor banana more than apple, then you mark on the right side.
For instance I strongly favor banana to apple then I give mark like this
7
Advanced Optimization Method
Now suppose you have three choices of fruits. Then the pair wise comparison goes as the
following
You may observe that the number of comparisons is a combination of the number of things
to be compared. Since we have 3 objects (Apple, Banana and Cheery), we have 3
comparisons. Table below shows the number of comparisons.
Number of comparisons
Number of things 1 2 3 4 5 6 7
number of comparisons 0 1 3 6 10 15 21
The scaling is not necessary 1 to 9 but for qualitative data such as preference, ranking and
subjective opinions, it is suggested to use scale 1 to 9.
In the next section you will learn how to analyze this paired comparisons
Making Comparison Matrix (How to make reciprocal matrix?)
By now you know how to make paired comparisons. In this section you will learn how to
make a reciprocal matrix from pair wise comparisons.
For example John has 3 kinds of fruits to be compared and he made subjective judgment on
which fruit he likes best, like the following
If the judgment value is on the left side of 1, we put the actual judgment value.
If the judgment value is on the right side of 1, we put the reciprocal value.
Comparing apple and banana, John slightly favor banana, thus we put in the row 1 column
2 of the matrix. Comparing Apple and Cherry, John strongly likes apple, thus we put actual
judgment 5 on the first row, last column of the matrix. Comparing banana and cherry,
banana is dominant. Thus we put his actual judgment on the second row, last column of the
matrix. Then based on his preference values above, we have a reciprocal matrix like this
To fill the lower triangular matrix, we use the reciprocal values of the upper diagonal. If
is the element of row column of the matrix, then the lower diagonal is filled using this
formula
Notice that all the element in the comparison matrix are positive, or .
Next section will discuss about how you will use this matrix.
6. Priority Vectors
Having a comparison matrix, now we would like to compute priority vector, which is the
normalized Eigen vector of the matrix. If you would like to know what the meaning of
Eigen vector and Eigen value is and how to compute them manually, go to my other tutorial
and then return back here. The method that I am going to explain in this section is only an
approximation of Eigen vector (and Eigen value) of a reciprocal matrix. This approximation
is actually worked well for small matrix and there is no guarantee that the rank will not
9
Advanced Optimization Method
reverse because of the approximation error. Nevertheless it is easy to compute because all
we need to do is just to normalize each column of the matrix. At the end I will show the
error of this approximation.
Suppose we have 3 by 3 reciprocal matrix from paired comparison
Then we divide each element of the matrix with the sum of its column, we have normalized
relative weight. The sum of each column is 1.
The normalized principal Eigen vector can be obtained by averaging across the rows
The normalized principal Eigen vector is also called priority vector . Since it is normalized,
the sum of all elements in priority vector is 1. The priority vector shows relative weights
among the things that we compare. In our example above, Apple is 28.28%, Banana is
64.34% and Cherry is 7.38%. John most preferable fruit is Banana, followed by Apple and
Cheery. In this case, we know more than their ranking. In fact, the relative weight is a ratio
scale that we can divide among them. For example, we can say that John likes banana 2.27
(=64.34/28.28) times more than apple and he also like banana so much 8.72 (=64.34/7.38)
times more than cheery.
Aside from the relative weight, we can also check the consistency of John's answer. To do
that, we need what is called Principal Eigen value. Principal Eigen value is obtained from
the summation of products between each element of Eigen vector and the sum of columns of
the reciprocal matrix.
10
Advanced Optimization Method
Computation and the meaning of consistency are explained in the next section.
As a note, I put the comparison matrix into Matlab to see how different is the result of
numerical computation of Eigen value and Eigen vector compared to the approximation
above.
The largest Eigen value is called the Principal Eigen value, that is which is
very close to our approximation (about 1% error). The principal Eigen vector
is the Eigen vector that corresponds to the highest Eigen value.
11
Advanced Optimization Method
Consistency Index and Consistency Ratio
What is the meaning that our opinion is consistent? How do we measure the consistency of
subjective judgment? At the end of this section will be able to answer those questions.
Let us look again on John's judgment that we discussed in the previous section. Is John
judgment consistent or not?
First he prefers Banana to Apple. Thus we say that for John, Banana has greater value than
Apple. We write it as .
Next, he prefers Apple to Cherry. For him, Apple has greater value than Cherry. We write it
as .
Since and , logically, we hope that or Banana must be preferable than
Cherry. This logic of preference is called transitive property. If John answers in the last
comparison is transitive (that he like Banana more than Cherry), then his judgment is
consistent. On the contrary, if John prefers Cherry to Banana then his answer is inconsistent.
Thus consistency is closely related to the transitive property.
A comparison matrix is said to be consistent if for all , and . However,
we shall not force the consistency. For example, has value and has value
, we shall not insist that must have value . This too much consistency is
undesirable because we are dealing with human judgment. To be called consistent , the rank
can be transitive but the values of judgment are not necessarily forced to multiplication
formula .
Prof. Saaty proved that for consistent reciprocal matrix, the largest Eigen value is equal to
the number of comparisons, or . Then he gave a measure of consistency, called
Consistency Index as deviation or degree of consistency using the following formula
12
Advanced Optimization Method
Thus in our previous example, we have and three comparisons, or , thus
the consistency index is
Knowing the Consistency Index, the next question is how do we use this index? Again, Prof.
Saaty proposed that we use this index by comparing it with the appropriate one. The
appropriate Consistency index is called Random Consistency Index ( ).
He randomly generated reciprocal matrix using scale , , …, , …, 8, 9 (similar to the
idea of Bootstrap) and get the random consistency index to see if it is about 10% or less.
The average random consistency index of sample size 500 matrices is shown in the table
below
Random Consistency Index ( )
n 1 2 3 4 5 6 7 8 9 10
RI 0 0 0.58 0.9 1.12 1.24 1.32 1.41 1.45 1.49
If the value of Consistency Ratio is smaller or equal to 10%, the inconsistency is acceptable.
If the Consistency Ratio is greater than 10%, we need to revise the subjective judgment.
For our previous example, we have and for is 0.58, then we have
7. Illustrative example
In this section, figure 4 shows an example of two levels AHP. The structure of hierarchy in
this example can be drawn as the following
13
Advanced Optimization Method
Figure 4 AHP Levels
Level 0 is the goal of the analysis. Level 1 is multi criteria that consist of several factors.
You can also add several other levels of sub criteria and sub-sub criteria but I did not use
that here. The last level (level 2 in figure above) is the alternative choices. You can see
again Table 1 for several examples of Goals, factors and alternative choices. The lines
between levels indicate relationship between factors, choices and goal. In level 1 you will
have one comparison matrix corresponds to pair-wise comparisons between 4 factors with
respect to the goal. Thus, the comparison matrix of level 1 has size of 4 by 4. Because each
choice is connected to each factor, and you have 3 choices and 4 factors, then in general you
will have 4 comparison matrices at level 2. Each of these matrices has size 3 by 3. However,
in this particular example, you will see that some weight of level 2 matrices are too small to
contribute to overall decision, thus we can ignore them.
Based on questionnaire survey or your own paired comparison, we make several
comparison matrices. Click here if you do not remember how to make a comparison matrix
from paired comparisons. Suppose we have comparison matrix at level 1 as table below.
The yellow color cells in upper triangular matrix indicate the parts that you can change in
the spreadsheet. The diagonal is always 1 and the lower triangular matrix is filled using
formula .
14
Advanced Optimization Method
The priority vector is obtained from normalized Eigen vector of the matrix. Click here if you
do not remember how to compute priority vector and largest Eigen value from a
comparison matrix. CI and CR are consistency Index and Consistency ratio respectively, as
I have explained in previous section. For your clarity, I include again here some part of the
computation:
We can do the same for paired comparison with respect to Factor C and D. However, the
weight of factor C and D are very small (look at Table 9 again, they are only about 9% and
5% respectively), therefore we can assume the effect of leaving them out from further
consideration is negligible. We ignore these two weights as set them as zero. So we do not
use the paired comparison matrix level 2 with respect to Factor C and D. In that case, the
weight of factor A and B in Table 9 must be adjusted so that the sum still 100%
15
Advanced Optimization Method
Adjusted weight for factor A =
Then we compute the overall composite weight of each alternative choice based on the
weight of level 1 and level 2. The overall weight is just normalization of linear combination
of multiplication between weight and priority vector.
(Acceptable)
Final Remark
By now you have learned several introductory methods on multi criteria decision making
(MCDM) from simple cross tabulation, using rank, and weighted score until AHP. Using
Analytic Hierarchy Process (AHP), you can convert ordinal scale to ratio scale and even
check its consistency.
16
Advanced Optimization Method
UNIT – IV – Advanced Optimization Methods – SPRA5301
We seek the numerical values , , , whose stock-goodness function does the best job
of evaluating stocks. Similar issues arise in tuning evaluation functions for game playing
programs such as chess.
Unconstrained optimization problems also arise in scientific computation. Physical systems
from protein structures to particles naturally seek to minimize their ``energy functions.''
Thus programs that attempt to simulate nature often define energy potential functions for the
possible configurations of objects and then take as the ultimate configuration the one that
minimizes this potential.
Is the function I am trying to optimize described by a formula or data? - If the function that
you seek to optimize is presented as an algebraic formula (such as the minimum
of ), the solution is to analytically take its derivative f'(n) and see for
which points p' we have f'(p') = 0. These points are either local maxima or minima, which
can be distinguished by taking a second derivative or just plugging back into f and seeing
what happens. Symbolic computation systems such as Mathematica and Maple are
fairly effective at computing such derivatives, although using computer algebra systems
effectively is somewhat of a black art. They are definitely worth a try, however, and you can
always use them to plot a picture of your function to get a better idea of what you are
dealing with.
How expensive is it to compute the function at a given point? - If the function f is not
presented as a formula, what to do depends upon what is given. Typically, we have a
program or subroutine that evaluates f at a given point, and so can request the value of any
given point on demand. By calling this function, we can poke around and try to guess the
maxima. Our freedom to search in such a situation depends upon how efficiently we can
evaluate f. If f is just a complicated formula, evaluation will be very fast. But suppose
that f represents the effect of the coefficients on the performance of the board
evaluation function in a computer chess program, such that is how much a pawn is
worth, is how much a bishop is worth, and so forth. To evaluate a set of coefficients as a
board evaluator, we must play a bunch of games with it or test it on a library of known
positions. Clearly, this is time-consuming, so we must be frugal in the number of
evaluations of f we use.
How many dimensions do we have? How many do we need? - The difficulty in finding a
global maximum increases rapidly with the number of dimensions (or parameters). For this
reason, it often pays to reduce the dimension by ignoring some of the parameters. This runs
counter to intuition, for the naive programmer is likely to incorporate as many variables as
possible into their evaluation function. It is just too hard to tweak such a complicated
function. Much better is to start with the 3 to 5 seemingly most important variables and do a
Advanced Optimization Methods 3
good job optimizing the coefficients for these. Figure 2 represents step by step procedure for
problem solving
How smooth is my function? The main difficulty of global optimization is getting trapped in
local optima. Consider the problem of finding the highest point in a mountain range. If
there is only one mountain and it is nicely shaped, we can find the top by just walking in
whatever direction is up. However, if there are many false summits or other mountains in
the area, it is difficult to convince ourselves whether we are really at the highest
point. Smoothness is the property that enables us to quickly find the local optimum from a
given point. We assume smoothness in seeking the peak of the mountain by walking up. If
the height at any given point was a completely random function, there would be no way we
could find the optimum height short of sampling every single point.
Efficient algorithms for unconstrained global optimization use derivatives and partial
derivatives to find local optima, to point out the
Today I will just solve few problems which I explained in the last class regarding the
maxima, minima and the saddle point identification of non-linear programming problem.
Now again we are concentrating on one dimensional unconstrained non-linear programming
problem. Let me take few examples on that.
(Credit to NPTEL)
This is one of example. The function has been given as . We need to find
out the minimum maximum and if there is any saddle point or not.
Now, if the function is giving to you, now the process is that now, one thing you
have to just see the pattern of the function. First of all, if you can find out the pattern of the
function, that would be nice. Now for finding out the pattern we have to judge whether the
function has in which part of the interval function has the convexity and which part of the
interval function has the concavity. Now once the function has been given us a convex
function in certain interval, here there is no restriction on the decision variable that is why
this is unrestricted in sign. x can take any value from minus infinity to plus infinity.
Now if you cons, if you just check in which interval function has the convexity. Then we
can declare that within that interval there must be certain minimum point. I explained you in
the last class regarding this. And if the function has the concave part in the some other
region then function must be having the maximum value within that. And if the function is
changing from convex to concavity; that means the pattern of the function is just changing
in certain point then that must be the saddle point.
Now I am not going into that geometry of the function of these let us try to solve this
function first. For doing that thing the necessary conditions suggest such we will find out the
first order derivative of the function . Then we will equate to 0 then we will get
the stationary points of these. If we considered the first order a
Now, if , then we must be getting there are three stationary points for this one is 0
another one is -3 and another one is 1. Now for checking the maximum and minimum of the
function we will go for the second order derivative of . Then it
would be . Let us see what is happening in individual 3
stationary points. At , we get that as well as the that means
must be the minimum point all right.
Now, what we see here at -3, we are getting the maximum value and at 1, it is changing the
pattern and 0 there is a minimum value for the function.
(Credit to NPTEL)
Figure 2 d. Problem Solving Steps
Again at one it is changing is pattern from minus from the concavity to convexity. That is
the thing, now we are applying the classical optimization technique for maxima and
minimum. Now if we just look at the function we could see the minimize coming at 0, and
maximize coming at -1. And after that we are having the changing pattern at one. That is
why the first order derivative. We could see now oh no at -1 also just we could see that is it
is changing it is pattern, the stationary points are . Now is these
Now this is another problem just see. Now a farmer has statistical records showing that if 25
orange trees are planted each tree would yield 500 oranges, but there is information that, if
we plant only one tree extra, then 10 oranges per tree will reduce. Now the question is given
to you how many tree should be planted for maximum total yield. For normal case we are
getting 500 oranges from a tree. Now, but it the next information if we plant one tree extra
from each tree 10 oranges will be reduce; that means, we will get from each plant. Now the
question is that how many trees should be planted. So, that maximum yield will come.
Let me considered the variable x as number of trees the farmer is planting in excess of 25.
Because the initially the information has been given that in the normal case is planting 25
plants. That is why in x is we are considering x plants are being planted. If x is equal to 1
then how many how much yield he will get he will get because there
are 25 plants. Now, if there are x number of plants, then what should be the number of for
plants all together it would total number of plants. Now how much yield I will
get? If it is normal we will get I will get total yield .
Now I have to maximize this. This is my question, because I have to maximize the total
number of yield all right.
Now, if this is a problem for us then just look at the function this function is coming as
all right we have to maximize this function. Now let us apply the
classical optimization technique, what we are getting That is why
the stationary points are if I just equate to 0, gives me that let us find out
for this. That means always it is negative. That is why it is suggested
that you plant 12.5 numbers of trees for getting maximum yield.
Now, we are getting. That is why this process is not correct process to judge that we are
getting the result nice result for it. That is why that is the disadvantage of using the classical
optimization technique. Now that is why there is another series of techniques available that
is called numerical optimization technique. There is range of variations of the
methodologies. Now today I will just discuss one simple methodology for solving through
the numerical optimization. And the advantage is that even the space is discrete, we cannot
use those methodologies and nicely we can get the solution of it the very well-known
method you must have been learned it that is the interval having process.
(Credit to NPTEL)
Figure 3 b. Problem Solving Steps
(Credit to NPTEL)
Figure 3 c. Problem Solving Steps
Now, this is not accepted. That is why we are going to the next level of technique
optimization technique through which we will solve the problem this is the very well-
known methodology. All of you must have been done in numerical analysis interval halving
process. There is another name to it that is called the by section method. We will solve the
same problem and we will see how nicely we will get the solution for the same problem
where the solution will be we can consider as integer all right.
Now let me tell you the steps of interval halving method first. Now the first step it saying
that whenever interval halving method is tells you that, as we could see from the function
(Credit to NPTEL)
Figure 3 d. Problem Solving Steps
If I just draw the intervals, that interval will be . Now we could see here that
we are getting the point that is the middle point of middle point of and the
middle point of . Just look at the methodology we have written here
and we are looking for the minimum value of the function certainly we are function is uni
modal. That is why there is only one more there only one
Let us consider the other case that ; that means, in the left side function is increasing
and then in the right side function is decreasing, but if I just minimize the function certainly, in
the left side function minimum cannot lie all right. That is why will discard that part that is why
you see we have written the methodology delete and you considered new interval of
uncertainty as .
Now, the other case, if we con if we see that ; that means, we are getting
the around we are having the minimum and one side this having the higher value one side
higher value; that means, the minimum cannot lie bellow to and the beyond to . That is why
will get the new interval of uncertainty as . After what we will do once we will get the
new interval of uncertainty, that one will consider at the initial interval of uncertainty we repeat
the process. We will again find out 4 parts of the interval, we will find out new . we will repeat
the process again and again and how long we will do in repeat the process as long as the
interval of uncertainty is very small.
Or we have certain target that we want to get accuracy of 10%, accuracy of 5%. Then we can
have the better pross this process can give us that kind of accuracy. In that way will we will get
the solution. Now this is one of the method numerical optimization. What even numerical
optimization method we will get we will see everywhere the challenge lies how to select .
Because these are only the guiding points for selecting the minimum value of the function. That
is why from the next class I will give you few more nice methodologies. And after learning all
the methodologies together we will just compare all the methodologies, that is all for today.
14
Advanced Optimization Method
UNIT – V – Advanced Optimization Methods – SPRA5301
1
Advanced Optimization Method
1. Swarm Intelligence
Swarm Intelligence systems employ large numbers on using fuzzy systems, Tabu Search and
Scatter Search, Ant colony algorithm, Multi Response optimization - Gray Relational Analysis.
2. Heuristic Algorithm
A heuristic algorithm is one that is designed to solve a problem in a faster and more efficient
fashion than traditional methods by sacrificing optimality, accuracy, precision, or completeness
for speed. Heuristic algorithms often times used to solve NP-complete problems, a class of
decision problems. In these problems, there is no known efficient way to find a solution quickly
and accurately although solutions can be verified when given. Heuristics can produce a solution
individually or be used to provide a good baseline and are supplemented with optimization
algorithms. Heuristic algorithms are most often employed of agents interacting locally with one
another and the environment. Swarm intelligence refers to the collective behavior of
decentralized systems and can be used to describe both natural and artificial systems. Specific
algorithms for this class of system include the particle swarm optimization algorithm, the ant
colony optimization algorithm, and artificial bee colony algorithm. Each of the previous
algorithms was inspired by the natural, self-organized behavior of animals.
3. Tabu Search
This heuristic technique uses dynamically generated tabus to guide the solution search to
optimum solutions. It examines potential solutions to a problem and checks immediate local
neighbors to find an improved solution. The search creates a set of rules dynamically and
prevents the system from searching around the same area redundantly by marking rule violating
solutions as “tabu” or forbidden. This method solves the problem of local search methods when
the search is stuck in suboptimal regions or in areas when there are multiple equally fit solutions.
4. Simulated Annealing
Borrowing the metallurgical term, this technique converges to a solution in the same way metals
are brought to minimum energy configurations by increasing grain size. Simulated annealing is
used in global optimization and can give a reasonable approximation of a global optimum for a
function with a large search space. At each iteration, it probabilistically decides between staying
at its current state or moving to another while ultimately leading the system to the lowest energy
state.2
5. Genetic Algorithms
Genetic algorithms are a subset of a larger class of evolutionary algorithms that describe a set of
techniques inspired by natural selection such as inheritance, mutation, and crossover. Genetic
algorithms require both a genetic representation of the solution domain and a fitness function to
evaluate the solution domain. The technique generates a population of candidate solutions and
2
Advanced Optimization Method
uses the fitness function to select the optimal solution by iterating with each generation. The
algorithm terminates when the satisfactory fitness level has been reached for the population or
the maximum generations have been reached.
3
Advanced Optimization Method
7. Function Approximation
When input data originates from a function with real-valued outputs over a continuous range, the
neural network is said to perform a traditional function approximation. An example of an
approximation problem could be one where the temperature of an object is to be determined
from secondary measurements, such as emission of radiation. Another more trivial example
could be to estimate shoe size based on a person's height. These two examples involve models
with one input and one output. A more advanced model of the second example might use gender
as a second input in order to derive a more accurate estimate of the shoe size.
Pure functions may be approximated with the following two network types:
4
Advanced Optimization Method
where the weights { , } are symbolized with the arrows feeding into the neuron.
The network output is formed by another weighted summation of the outputs of the neurons in
the hidden layer. This summation on the output is called the output layer. In Figure there is only
one output in the output layer since it is a single-output problem. Generally, the number of output
neurons equals the number of outputs of the approximation problem.
The neurons in the hidden layer of the network in Figure are similar in structure to those of the
perceptron, with the exception that their activation functions can be any differential function.
The output of this network is given by
where n is the number of inputs and nh is the number of neurons in the hidden layer. The
variables { , , , } are the parameters of the network model that are represented
collectively by the parameter vector . In general, the neural network model will be represented
by the compact notation g( ,x) whenever the exact structure of the neural network is not
necessary in the context of a discussion.
Some small function approximation examples using an FF network can be found in Section 5.2.
Note that the size of the input and output layers are defined by the number of inputs and outputs
of the network and, therefore, only the number of hidden neurons has to be specified when the
network is defined. The network in Figure 2.5 is sometimes referred to as a three-layer network,
counting input, hidden, and output layers. However, since no processing takes place in the input
layer, it is also sometimes called a two-layer network. To avoid confusion this network is called
a one-hidden-layer FF network throughout this documentation.
In training the network, its parameters are adjusted incrementally until the training data satisfy
the desired mapping as well as possible; that is, until ( ) matches the desired output y as closely
as possible up to a maximum number of iterations. The nonlinear activation function in the
neuron is usually chosen to be a smooth step function. The default is the standard sigmoid as
shown in figure 2
In[1]:=
5
Advanced Optimization Method
Figure 2. Performance Evaluation
The FF network in Figure is just one possible architecture of an FF network. You can modify the
architecture in various ways by changing the options.
9. Multilayer Networks
The package supports FF neural networks with any number of hidden layers and any number of
neurons (hidden neurons) in each layer. In Figure a multi-output FF network with two hidden
layers is shown in figure.
6
Advanced Optimization Method
10.Radial Basis Function Networks
After the FF networks, the radial basis function (RBF) network comprises one of the most used
network models. Figure illustrates an RBF network with inputs ,..., and output . The arrows
in the figure symbolize parameters in the network. The RBF network consists of one hidden
layer of basis functions, or neurons. At the input of each neuron, the distance between the neuron
center and the input vector is calculated. The output of the neuron is then formed by applying the
basis function to this distance. The RBF network output is formed by a weighted sum of the
neuron outputs and the unity bias shown in figure 4.
The RBF network in Figure is often complemented with a linear part. This corresponds to
additional direct connections from the inputs to the output neuron. Mathematically, the RBF
network, including a linear part, produces an output given by
where nb is the number of neurons, each containing a basis function. The parameters of the RBF
network consist of the positions of the basis functions , the inverse of the width of the basis
functions , the weights in output sum , and the parameters of the linear part ,..., . In most
cases of function approximation, it is advantageous to have the additional linear part but it can be
excluded by using the options.
The parameters are often lumped together in a common variable to make the notation compact.
Then you can use the generic description g( ,x) of the neural network model, where g is the
network function and x is the input to the network.
In training, the network the parameters are tuned so that the training data fits the network model
Eq. (2.10) as well as possible. In Eq. (2.10) the basis function is chosen to be the Gaussian bell
7
Advanced Optimization Method
function. Although this function is the most commonly used basis function, other basis functions
may be chosen. Also, RBF networks may be multi-output as illustrated in Figure 2.8.
FF networks and RBF networks can be used to solve a common set of problems. The built-in
commands provided by the package and the associated options are very similar. Problems where
these networks are useful include:
Function approximation
Classification
Modeling of dynamic systems and time series
Figure 6. System
A system with input signal u, disturbance signal e, and output signal y
Loosely speaking, a system is an object in which different kinds of signals interact and produce
an observable output signal. A system may be a real physical entity, such as an engine, or
entirely abstract, such as the stock market. There are three types of signals that characterize a
system, as indicated. The output signal y(t) of the system is an observable/measurable signal,
which you want to understand and describe. The input signal u(t) is an external measurable
8
Advanced Optimization Method
signal, which influences the system. The disturbance signal e(t) also influences the system but, in
contrast to the input signal, it is not measurable. All these signals are time dependent.
In a single-input, single-output (SISO) system, these signals are time-dependent scalars. In the
multi-input, multi-output (MIMO) systems, they are represented by time-dependent vectors.
When the input signal is absent, the system corresponds to a time-series prediction problem. This
system is then said to be noise driven, since the output signal is only influenced by the
disturbance e(t). The Neural Networks package supports identification of systems with any
number of input and output signals. A system may be modeled by a dynamic neural network that
consists of a combination of neural networks of FF or RBF types, and a specification of the input
vector to the network. Both of these two parts have to be specified by the user. The input vector,
or regressor vector, which it is often called in connection with dynamic systems, contains lagged
input and output values of the system specified by three indices: , and . For a SISO model
the input vector looks like this:
Index represents the number of lagged output values; it is often referred to as the order of the
model. Index is the input delay relative to the output. Index represents the number of lagged
input values. In a MIMO case each individual lagged signal value is a vector of appropriate
length. For example, a problem with three outputs and two inputs ={1,2,1}, ={2,1}, and
={1,0} gives the following regressor:
Models with a regressor like Eq. (2.0) are called ARX models, which stands for AutoRegressive
with eXtra input signal. When there is no input signal u(t), its lagged valued may be eliminated
from Eq. (2.0), reducing it to an AR model. Since the mapping g( , ) is based on neural
networks, the dynamic models are called neural ARX and neural AR models, or neural AR(X) as
9
Advanced Optimization Method
short form for both them. Figure 2.10 shows a neural ARX model, based on a one-hidden-layer
FF network.
A good model that explains the data well should yield small prediction errors. Therefore, a
measure of (t) may be used as a model-quality index.
10
Advanced Optimization Method
network, then it is associated with another pattern. If the network works properly, this associated
pattern is one of the class patterns. In some cases (when the different class patterns are
correlated), spurious minima can also appear. This means that some patterns are associated with
patterns that are not among the pattern vectors.
Hopfield networks are sometimes called associative networks since they associate a class pattern
to each input pattern.
The Neural Networks package supports two types of Hopfield networks, a continuous-time
version and a discrete-time version. Both network types have a matrix of weights W defined as
where D is the number of class patterns { , ..., }, vectors consisting of +/-1 elements, to be
stored in the network, and n is the number of components, the dimension, of the class pattern
vectors.
Discrete-time Hopfield networks have the following dynamics:
Eq. (2.26) is applied to one state, x(t), at a time. At each iteration the state to be updated is
chosen randomly. This asynchronous update process is necessary for the network to converge,
which means that x(t)=Sign[W x(t)].
A distorted pattern, x(0), is used as initial state for the Eq. (2.0), and the associated pattern is the
state toward which the difference equation converges. That is, starting with x(0) and then
iterating Eq. (2.0) gives the associated pattern when the equation converged.
For a discrete-time Hopfield network, the energy of a certain vector x is given by
It can be shown that, given an initial state vector x(0), x(t) in Eq. (2.26) will converge to a value
having minimum energy. Therefore, the minima of Eq. (2.27) constitute possible convergence
points of the Hopfield network and, ideally, these minima are identical to the class patterns
{ , ..., }. Hence, one can guarantee that the Hopfield network will converge to some
pattern, but one cannot guarantee that it will converge to the right pattern.
Note that the energy function can take negative values; this is, however, just a matter of scaling.
Adding a sufficiently large constant to the energy expression it can be made positive.
The continuous Hopfield network is described by the following differential equation
where x(t) is the state vector of the network, W represents the parametric weights, and is a
nonlinearity acting on the states x(t). The weights W are defined in Eq. (2.25). The differential
equation, Eq. (2.28), is solved using an Euler simulation.
11
Advanced Optimization Method
To define a continuous-time Hopfield network, you have to choose the nonlinear function .
There are two choices supported by the package, Saturated Linear and the default nonlinearity of
Tanh.
For a continuous-time Hopfield network, defined by the parameters given in Eq. (2.25), one can
define the energy of a particular state vector x as
As for the discrete-time network, it can be shown that given an initial state vector x(0) the state
vector x(t) in Eq. (2.28) converges to a local energy minimum. Hence, the minima of Eq. (2.29)
constitute the possible convergence points of the Hopfield network and ideally these minima are
identical to the class patterns { , ..., }. However, there is no guarantee that the minima will
coincide with this set of class patterns.
12
Advanced Optimization Method