Pgo Notes

Profile guided compilation uses runtime profile data to identify new optimization opportunities and perform cost-benefit analysis to apply optimizations. It can exploit opportunities not found through static analysis alone and avoid optimizations that may hurt performance. Profile data like control flow, values, and memory addresses help optimize programs in several ways, such as redundancy elimination, dead code elimination, and conditional branch elimination. Transformations like code motion and control flow restructuring are used to enable these optimizations based on profile information.

Uploaded by

Yang Elio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views10 pages

Pgo Notes

Uploaded by

Yang Elio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Profile Guided Compiler Optimization

Introduction

A profile-guided optimizer uses a program’s runtime profile in two ways:

1. Identify new optimization opportunities that are frequently observed during RT but are not
detected by static analysis.
2. Carry out sophisticated cost-benefit analysis to apply transformations that improve the
performance of one part of the program.

Existed classic optimization method may have serious drawbacks. First, this approach fails to
exploit many opportunities for optimization (which can be improved through the combination of
static analysis and run-time profile) and it may have an adverse impact on performance (which can
be addressed by using a complex performance model during the optimization).

Different typed of optimizations need different types of data, like control flow, value, and address
profiles. Given enough profile data, the following categories of optimization opportunities can
be exploited:
1. Optimization opportunities uncovered using static analysis whose exploitation involves
performance trade-offs. Mainly focused on transformation, whether to employ trade-off
depends on cost-benefit analysis. (Uncovered through SA but need trade-offs.)
2. Optimization opportunities that cannot be uncovered by SA, but are frequently observed
to exist during program execution. SA can be used to relate values of variables but not able
to identify specific value involved. E.g., i=i+1, SA tells that the statement increments the
value of i by one, but the exact value of i depends on maybe input, statements’ executed
times.

Consider the example below:

In the original code, x*y is computed twice, which is redundant. This can be optimized by
introducing variable t. If the first branch always goes to false, the optimization may not be much
useful. Therefore, this transformation is useful only if line4 is executed less frequently than line 7.
Also consider the case that through the profile, very frequently y=1 in line4, thus we have identified
an optimization opportunity. But suppose we optimized the code to (c). In that case, the extra
instruction introduced to check the value of y should be considered that the overall benefit of
eliminating the multiply is greater than the overall cost of executing the extra instructions (IA-64
can enable efficient implementation of the optimization in (c)). Thus, a simple cost-benefit analysis
should be carried out to determine whether or not the transformation should be applied.

The profile-guided compilation process is shown below:

Before the PGO, an instrumented version of the program must be run on representative inputs to
collect profile data. This profile data is then used by the compiler to recompile the program.

Types of Profile Information

 Control Flow Profiles

This profile captures a trace of the execution path taken by the program, which is referred to control
flow trace (CFT). Through the CFT, we could compute the execution frequency of any given
program subpath. But CFTs are large in size, therefore compressed forms are considered. Several
approximations are used based on the level of approximation:
1. Node profiles provide execution frequencies of the basic blocks in the CFG. When making
code placement decisions like the case (redundancy removal) before such profile is OK.
2. Edge profiles provide the execution frequencies of each edge in the CFG. Edge profiles are
superior to node profiles cause node profiles can always be computed from edge profiles,
not vice versa.
3. Two-edge profiles provide the execution frequencies of each pair of consecutive edges in
the CFG. This is better than edge profiles cause edge profiles can be computed from two-
edge profiles, not vice versa. And this profile can capture the correlation between the
execution of consecutive conditional branched.
4. Path profiles provide the execution frequencies of acyclic (does not include a loop back
edge) and intraprocedural (terminates if an exit node of a procedural is reached) subpaths
in the CFG.
The precision of estimation of the execution frequency of an arbitrary subpath in the program varies
among these methods.
 Value Profiles
This profile identifies the specific values encountered as an operand of instruction as well as the
frequencies with which the values are encountered. Example:
With this information, the compiler can recognize operands that are almost always constant and
utilize this information to carry out value specialization optimizations like constant folding, strength
reduction, and motion of nearly invariant code out of loops.
 Address Profiles
This profile can be collected in form of a stream of memory addresses that are referenced by a
program. These profiles are usually used to apply data layout and placement transformations for
improving the performance of memory hierarchy. Addresses can be collected at different
granularities.
The whole program stream (WPS) representation is analyzed to identify hot address streams.
Based on WPS, temporal relationship graph (TRG) has been proposed. The nodes in the graph
are data items of interest. Weighted links are established between pairs of nodes.

The weights on the links at the end of the program run can be used for achieving good cache behavior.

Profile Guided Classical Optimizations

In some optimizations specialization leads to elimination of a copy (e.g., redundancy and dead code
elimination) while during other optimizations specialization leads to simpler and more efficient code
(e.g. strength reduction and constant folding).
 Transformations
Two primary classes of transformation are used to carry out code replication and enable
specialization of conditionally optimizable code: code motion of different types and control flow
restructuring with varying scope.
1. Code motion of statements. The basic form is safe code motion, for every execution of a
statement during the execution of optimized code, there exists a corresponding execution of the
statement during the execution of unoptimized codes. Speculative code motion allows
compiler to introduce executions of a statement in the optimized code that are not present in the
unoptimized code. To be beneficial, the added cost of these extra executions must be offset by
savings resulting from speculative code motion (cost-benefit analysis based on profile is
needed). In predicated code motion, statements can be moved out of control structures and
executed at a different program point under the original conditions by predicating its execution
with an appropriately constructed predicate expression.
2. Control flow restructuring. If the conditions under which a statement can be optimized hold
along some execution paths, then control flow restructuring can be employed to enable the
optimization of the statement as follows.
a) Create a program such that when we reach the statement, based upon the incoming
edge along which we arrive at the statement, we can determine whether or not the
statement can be optimized.
b) Create unoptimized and optimized copies of the statement and place them along the
incoming edges. The scope of control flow restructuring determines the degree to
which the code can be optimized.
Function inlining is one way to achieve interprocedural control flow restructuring. However,
it is also accompanied with significant code growth. To limit code growth while performing
interprocedural optimizations a couple of alternative techniques have been proposed: partial
inlining of frequently executed paths through a procedure and creating procedures with
multiple entries and multiple exits.
 Optimizations
1. Partial Redundancy Elimination (PRE). Many opportunities for redundancy removal cannot
be exploited using this restricted form of code motion.

In case (a) node7 the expression x+y is partial redundancy because if node2 is visited then the
value has already been evaluated. If speculative code motion is used to hoist x+y above node 6,
as shown in case (b), then if the frequency of node3 is less than node7, it will result in a net
reduction in partial redundancy. By applying control flow restructuring in case (c), this partial
redundancy is fully eliminated but results in code growth. Thus profile is needed for selection
of these methods. Partial redundancy elimination can also be applied in other contexts such as
load removal and array bounds checks elimination.
2. Partial dead code elimination (PDE). The PDE of an assignment is achieved by delaying the
execution of the statement to a point where its execution is definitely required.
In case (d) node 2 is partially dead if control reaches node6 and node7, they re-assign x. Simply
delaying the assign to node8 is incorrect cause it will block the correct value of x from reaching
its use in node10. Straightforward code motion will blocks the sinking of x to node 4 and
therefore further. By predicating the statement we can enable its sinking past node 4 and down
to 8 and thus achieve PDE in case (e). This approach is beneficial if the frequency of node2 is
larger than node8. Control flow restructuring can also be applied in case (f). Partially dead
stores can also be eliminated through these methods.
3. Conditional branch elimination. A conditional branch is considered to be partially redundant
if along some paths its outcome can be determined at compile-time. Such elimination requires
flow graph restructuring, the path(s) mentioned before are separated from the path(s) along
which the outcome is unknown through code duplication.

The first path: x>0 is T, hence x=1 is definitely F. The second path: x=1 is T, if F() preserved
the value x, then x=0 is F. The first could be exploited through intraprocedural control flow
restructuring and the second requires interprocedural control flow restructuring.
 Partial Redundancy Elimination via Speculative Code Motion
1. Cost model
For a given expression exp and a conditional node n, we formulate a condition under which
it is potentially profitable to enable speculative hoisting of exp above n. Essentially we
identify the paths through n which can benefit from speculation of exp and paths that incur
an additional execution time cost due to speculation of exp. If the probability of following
paths that benefit is greater than the paths that incur a cost, then speculation of exp is
enabled at n. The computation of these probabilities is based upon an execution profile.
It is useful to categorize the program subpaths that either originate or terminate at n as
follows:

a. Available subpaths which are subpaths from the start node to n along which an
evaluation of exp is encountered and is not killed by a redefinition of any of its
operands prior to reaching n.
b. Unavailable subpaths which are subpaths from the start node to n along which exp
is not available at n. These subpaths include those along which either an evaluation
of exp is not encountered or if exp is encountered, it is killed by a redefinition of
one of its operands prior to reaching n.
c. Anticipable subpaths which are subpaths from n to the end node along which exp is
evaluated prior to any redefinition of the operands of exp.
d. Unanticipable subpaths which are subpaths from n to the end node along which exp
is not anticipable at n. These subpaths include those along which either exp is not
evaluated or if exp is evaluated, it is done after a redefinition of one of its operands.

The paths that can potentially benefit from hoisting of exp above a conditional node n are
the paths that are obtained by linking available subpaths with anticipable subpaths at
n. Due to the reasons below:
a. Along these paths there is redundancy. An evaluation of exp prior to reaching
n causes an evaluation of exp performed after visiting n to become redundant.
b. The redundancy cannot be removed unless exp is hoisted above node n.

Therefore we can now state that, given a path p that passes through the conditional node
n, the hoisting of an expression exp above n can benefit path p iff. exp is available and
anticipable at n's entry along p.

Denote the set of paths through n that benefit from hoisting of exp above n as
BenefitPathsexp(n). The expected overall benefit of speculating exp can be computed by
summing up the execution frequencies of paths in this set:
Similarly, the paths that incur a cost due to hoisting of exp above a conditional node n are
the paths that are obtained by linking unavailable subpaths with unanticipable subpaths
at n. The expected overall cost of speculating exp above n can be computed as below:

Speculation should be enabled if based upon the profile data we conclude that the expected
benefit exceeds the expected cost.

A boolean variable EnableSpecexp(n) is associated with an expression and conditional node

pair (exp, n). As shown below, speculation is enabled if the probability of following a
path through n that benefits is greater than the probability of following a path that incurs
an additional cost.

Note that the sum of the two probabilities used below need not be 1 because there maybe
other paths which are unaffected by speculation.

2. Cost-benefit analysis based upon probabilistic data flow analysis

Use probabilistic data flow analysis (PDFA) guided by two-edge profiles. This algorithm
neither requires the entire control flow trace nor computes data flow information on a per
path basis.
While traditional data flow analysis calculates whether a data flow fact holds or does not
hold at some program point, PDFAs calculate the probability with which a data flow fact
will hold at some program point. PDFA provides us with the following probabilities:
AvailProbexp(n) is the probability that expression exp is available at node n and
AntiProbexp(n) is the probability that exp is anticipable at node n. Therefore we
reformulate the speculation enabling condition for expression exp at a conditional node n
in terms of these probabilities as follows (ref. for detailed analysis system):
Availability and anticipability are calculated based on the traditional formulations of the
data-flow problems as shown below:

Availability analysis is guided by the local Boolean predicates LocAvail and LocBlkAvail.
LocAvailexp(n) equals T if there is an occurrence of exp in n which is not blocked by a
subsequent statement of n, that is, a statement which modifies variables used in exp.
LocBlkAvaiexp(n) equals T, if exp is blocked by some statement of n.
Similarly，anticipability analysis is guided by the local predicates LocAnti and LocBlkAnti.
LocAntiexp(n) equals T if there is an occurrence of exp in n which is not blocked by a
preceding statement of n, that is, a statement which modifies variables used in exp.
LocBlkAntiexp(n) equals T if exp is blocked by some statement of n.

Since probabilistic data-flow systems are based on any-problems in which the meet
operator is union, the analysis problems have to be transformed as described in ref[39]. If
a “must-be-X” problem is an intersection problem, then the “may-not-be-X” problem is a
union problem. The solution of the “must-be-X” problem is the complement of the
solution of the “may-not-be-X” problem.
Note that the initializations of the start and end node have to be changed appropriately as
well. Since now the confluence operator for both problems is union, the PDFA framework
can be applied in a straightforward way. The solutions N-NoAvail*, X-NoAvail*, N-
NoAnti*, and X-NoAnti* of the equation systems denote the probabilities of the
complementary problems. Thus, the probabilities for availability and anticipability are
given by N-Avail*= 1- N-NoAoail*, X-Avail*= 1- X-NoAvail*, N-Anti*= 1 - N-NoAnti*,
and X-Anti*= 1-X-NoAnti*.
ref[39] T. Reps, S. Horwitz, and M. Sagiv, “Precise interprocedural dataflow analysis via
graph reachability,” in Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on
Principles of programming languages - POPL ’95, San Francisco, California, United
States, 1995, pp. 49–61. doi: 10.1145/199448.199462.

Lecture 4 BSIT
No ratings yet
Lecture 4 BSIT
6 pages
Code Optimization Phases Explained
No ratings yet
Code Optimization Phases Explained
15 pages
CD Unit-Iv
No ratings yet
CD Unit-Iv
15 pages
Unit - Iv Run Time Storage Organization
No ratings yet
Unit - Iv Run Time Storage Organization
15 pages
Optimization Techniques Code Optimizations
No ratings yet
Optimization Techniques Code Optimizations
10 pages
Optimal Code Compiling in C: Nitika Gupta Nistha Seth Prabhat Verma
No ratings yet
Optimal Code Compiling in C: Nitika Gupta Nistha Seth Prabhat Verma
8 pages
Code Optimization Techniques Explained
No ratings yet
Code Optimization Techniques Explained
49 pages
Compiler Code Optimization Guide
No ratings yet
Compiler Code Optimization Guide
25 pages
What Do You Mean by Code Optimization
No ratings yet
What Do You Mean by Code Optimization
3 pages
Unit 5
No ratings yet
Unit 5
12 pages
Vision 2024 CD Chapter 5 Compiler Code Optimization 731689660928542
No ratings yet
Vision 2024 CD Chapter 5 Compiler Code Optimization 731689660928542
24 pages
Unit 5
No ratings yet
Unit 5
17 pages
Untitled 3
No ratings yet
Untitled 3
12 pages
Code Optimization Techniques
No ratings yet
Code Optimization Techniques
19 pages
Unit 4
No ratings yet
Unit 4
26 pages
A Comparison of Path Profiling and Edge Profiling in C++ Applications
No ratings yet
A Comparison of Path Profiling and Edge Profiling in C++ Applications
6 pages
Se (U5) 231127 184924
No ratings yet
Se (U5) 231127 184924
15 pages
Static Testing for Developers
No ratings yet
Static Testing for Developers
22 pages
Codeoptimization-Module 4B
No ratings yet
Codeoptimization-Module 4B
66 pages
Code Optimization for Developers
No ratings yet
Code Optimization for Developers
11 pages
Software Static Testing Guide
No ratings yet
Software Static Testing Guide
22 pages
Code Optimization Techniques Overview
No ratings yet
Code Optimization Techniques Overview
21 pages
Code Optimization Techniques in Compilers
No ratings yet
Code Optimization Techniques in Compilers
40 pages
CD Unit 5
No ratings yet
CD Unit 5
26 pages
Lecture Notes On Code Optimization
No ratings yet
Lecture Notes On Code Optimization
175 pages
CODE Optimization
No ratings yet
CODE Optimization
50 pages
Code Optimization: Compilation. The Optimizer Can Also Be Defined As A System Which Transforms
No ratings yet
Code Optimization: Compilation. The Optimizer Can Also Be Defined As A System Which Transforms
16 pages
UNIT 5 Notes CD
No ratings yet
UNIT 5 Notes CD
6 pages
280425
No ratings yet
280425
11 pages
Ar20 Aus CD Unit V
No ratings yet
Ar20 Aus CD Unit V
17 pages
Code Optimization and Generation Techniques
No ratings yet
Code Optimization and Generation Techniques
7 pages
Optimization
No ratings yet
Optimization
67 pages
Compiler Design: Code Optimization Techniques
No ratings yet
Compiler Design: Code Optimization Techniques
11 pages
SE 22-23 Answers
No ratings yet
SE 22-23 Answers
36 pages
Code Optimization
0% (1)
Code Optimization
90 pages
33 Code Optimization
No ratings yet
33 Code Optimization
16 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
Unit 5 Cd.
No ratings yet
Unit 5 Cd.
27 pages
3 CO and CG
No ratings yet
3 CO and CG
11 pages
Codeoptimization-150703103332-Lva1-App6892 PDF
No ratings yet
Codeoptimization-150703103332-Lva1-App6892 PDF
24 pages
Compiler Optimization Techniques
No ratings yet
Compiler Optimization Techniques
36 pages
@@code Optim
No ratings yet
@@code Optim
20 pages
Code-Optimization PPT
No ratings yet
Code-Optimization PPT
15 pages
Chapter 10 - Code Optimization
No ratings yet
Chapter 10 - Code Optimization
11 pages
Constraint Based Analysis: Seminar
No ratings yet
Constraint Based Analysis: Seminar
21 pages
Compiler Code Optimization Guide
No ratings yet
Compiler Code Optimization Guide
10 pages
ATC - Code Optimization and Code Generation
No ratings yet
ATC - Code Optimization and Code Generation
28 pages
Compiler 9 (Code Optimization)
No ratings yet
Compiler 9 (Code Optimization)
9 pages
Compiler Design: Key Concepts & Optimizations
No ratings yet
Compiler Design: Key Concepts & Optimizations
20 pages
CSE2002 Session38 Code Optimization3038
No ratings yet
CSE2002 Session38 Code Optimization3038
39 pages
Code Optimization
No ratings yet
Code Optimization
32 pages
CD Unit 5
No ratings yet
CD Unit 5
41 pages
Lendi - Se Unit - 4
No ratings yet
Lendi - Se Unit - 4
58 pages
Unit5 0CodeOptimization
No ratings yet
Unit5 0CodeOptimization
90 pages
Cd-Unit 5 Part-2
No ratings yet
Cd-Unit 5 Part-2
23 pages
Code Optimization
No ratings yet
Code Optimization
49 pages
Code Optimization Techniques
No ratings yet
Code Optimization Techniques
27 pages
Compiler Code Optimization Techniques
No ratings yet
Compiler Code Optimization Techniques
33 pages
Exc 4.1
No ratings yet
Exc 4.1
4 pages
1 s2.0 S0020025523001354 Main2024
No ratings yet
1 s2.0 S0020025523001354 Main2024
14 pages
Ignition System Delco EST
No ratings yet
Ignition System Delco EST
14 pages
Damage Control Booklet 4
No ratings yet
Damage Control Booklet 4
3 pages
Data-Link Layer Communication Overview
No ratings yet
Data-Link Layer Communication Overview
12 pages
1500-A - Bio Safety Cabinet
No ratings yet
1500-A - Bio Safety Cabinet
6 pages
Information and Communication Technology 09: Daily Class Notes
No ratings yet
Information and Communication Technology 09: Daily Class Notes
24 pages
Overview - Material Pointers
No ratings yet
Overview - Material Pointers
10 pages
API 650 Interpretations - 5
No ratings yet
API 650 Interpretations - 5
16 pages
Despiece Bombas 220v Zoeller
No ratings yet
Despiece Bombas 220v Zoeller
2 pages
NVOCC Case Studies
No ratings yet
NVOCC Case Studies
9 pages
Digital Electronics Basics
100% (1)
Digital Electronics Basics
58 pages
General Presentation Outline Guide
No ratings yet
General Presentation Outline Guide
4 pages
Pentax Program A
No ratings yet
Pentax Program A
70 pages
MultiLoad II Users Guide FV 4 33 08
No ratings yet
MultiLoad II Users Guide FV 4 33 08
509 pages
Virtual Reality, Augmented Reality and The Metaverse-Opportunities in Digital Worlds in
No ratings yet
Virtual Reality, Augmented Reality and The Metaverse-Opportunities in Digital Worlds in
12 pages
Hierarchy of Information Systems: Assignment
No ratings yet
Hierarchy of Information Systems: Assignment
4 pages
c.pCO and tERA Integration Guide
No ratings yet
c.pCO and tERA Integration Guide
4 pages
Quotation for Electronic Tools and Equipment
No ratings yet
Quotation for Electronic Tools and Equipment
1 page
Cisco-2 It6223
No ratings yet
Cisco-2 It6223
23 pages
English Assessment for Yogyakarta SMP
100% (1)
English Assessment for Yogyakarta SMP
28 pages
tGard Machine Safety Configurator Guide
No ratings yet
tGard Machine Safety Configurator Guide
21 pages
Manufacturing Control Plan Screen Printing Department
100% (1)
Manufacturing Control Plan Screen Printing Department
36 pages
SGV-advantage-9-22-2023-APAC No.10001108
No ratings yet
SGV-advantage-9-22-2023-APAC No.10001108
32 pages
NPK Under Water Manual
No ratings yet
NPK Under Water Manual
18 pages
R&S smw-k144 - k145
No ratings yet
R&S smw-k144 - k145
463 pages
Preparing For Audio Transcription
No ratings yet
Preparing For Audio Transcription
16 pages
Strategic Leadership for Competitive Advantage
No ratings yet
Strategic Leadership for Competitive Advantage
37 pages
VKILLER
67% (3)
VKILLER
26 pages
Understanding Priority Queues in Data Structures
No ratings yet
Understanding Priority Queues in Data Structures
9 pages