5907
Algorithms for Compile-Time Memory Optimization
Jordan Gergov”
Max-Planck Institute for Computer Science
Abstract AZlocation problem (CMA) is to compute a mapping
Given a program P in a structured programming language, from objects to memory regions such that: (1’) the
we propose the first polynomial-time algorithm for compile- size of the memory region that is allocated to an object
time memory allocation for the source objects of P (eg ar- equals the memory requirement of that object; (2’) if
rays, C structures) with a performance guarantee. Further, two memory regions overlap, then the associated object
we present a new and simple O(n log n) time bapproxima- pair is in the set of object pairs that are allowed by
tion algorithm for (off-line) Dynamic Storage Allocation and, (2) to share memory; and (3’) the memory usage is
thus, improve the best previous approximation ratio of 5. minimized. The trend towards more complex embedded
systems increases the interest in memory optimization
1 Problem Motivation and Definition tools, and CMA algorithms are also potentially useful in
optimization of software with a high memory demand.
Consider the following simple fragment of a C function:
The performance of all previous heuristics for this NP-
foo(int i) { char A[40]; intB[lO];if(i){...}else{...
hard task can deviate significantly from optimality, cf
} }. The dots in the i !=O (resp. i==O) branch of the if
[l]. In Section 3, we discuss a novel approach to
statement denote a C block without read or write access
fast CMA algorithms with a performance guarantee.
to the array B (resp. array A), ie either only A or only
Fabri [l] gives a more detailed treatment of the CMA
B is needed at run time. Assume that an int (char) is
definition as well as an account of previous research.
represented by 4 (1) bytes. Now, let us slightly simplify
One special case of CMA with an independent
the compile-time memory allocation task and ask: how
research history as well as with numerous applications is
much storage does the compiler need to allocate for the
the off-line Dynamic Storage Allocation (DSA). Assume
arrays A and B ? First, the compiler can use two non-
that we are given a straight-line program, ie its source
overlapping storage areas for A and B, ie 40 + 10.4 = 80
code does not contain branch or loop statements and, for
bytes. Alternatively, the compiler could find out that A
ease of presentation, no function calls. It turns out that
and B will never “exist” at the same time when the
in this case CMA has a nice geometric interpretation.
program is running. Hence, a storage area of 40 bytes
A DSA input I consists of n triples of numbers, ie
is sufficient for both arrays, and the first solution can
I= {(Sl,~l,C1)7-.-, (sn,m,c,.,)}. Each triple (s~,Q,c~)
be improved by a factor of 2.
can be interpreted as an axis-parallel rectangle with
In general, given a source program .the compiler
a projection (ri,ci) on the z-axis and a projection of
can use control-flow analysis and similar techniques to
length si on the y-axis. We are only allowed to slide
determine pairs of (source-code) objects such that the
the rectangles along the y-axis while the s-projections
compiler is allowed to map the objects in each pair to
of all rectangles stay fixed as in the input I. The
overlapping memory regions. Now, assume that we have
objective is to pack all rectangles in a horizontal strip
derived the following information from a program in a
of minimum height. Intuitively speaking, (Si, Ti) Ci)
structured programming language: (1) the set of all
is associated with a source-code object that is used
(source-code) objects and their memory requirements;
only between the rith and the c&h statement and
(2) a set of pairs of objects such that the objects in a
requires Si contiguous memory cells. The previous
pair cannot “interfere” with each other at run time and,
research on fast DSA algorithms with a performance
hence, can share memory; (3) the associated control-
guarantee was based on on-line coloring, and results
flow graph, and, for any object, the vertices of the
were obtained independently by Woodall, Kierstead,
control-flow graph (ie source-code statements) at which
Chrobak, and Slusarek, cf [2] for references. The best
the object is accessed as well as the associated access
approximation ratio of 5 for this NP-hard problem is
type information. Then, the Compile-time Memory
based on a different approach and is due to Gergov [2].
In the remaining sections, we give a rather condensed
‘Max-Planck Institut ftir Informatik, Im Stadtwald, 66 123
Saarbriicken, Germany; E-Mail:
[email protected] presentation of our results.
5908
2 Dynamic Storage Allocation in [l], each (source-code) object is associated with
Given a DSA instance I={(sl,rl~cl):. . :(sn,r,:cn)}, some of the vertices of F. Intuitively speaking, if an
a solution to I is a mapping Q from I to the nonneg- object is not associated with a specific vertex: then it
ative reals o:I--tRz, such that if (ri; ci)n(rj, cj)#O, does not need to be “maintained” at this vertex. We
then either i=j or o((s~,T~:c~))+s~~Q((s~:T~:c~)) or modify the instance 1 so that the following property
a((Sj; Tj, Cj))+Sj_<cY((S,: Ti, Ci)). The cost of Q is defined holds: (con) the vertices each object is associated with
as maxk(o((sk:rk,cI;))+sI;:r. Geometrically speaking, o induce a connected component in the undirected graph
maps each rectangle to the y-coordinate of its lower side. corresponding to F(1). The property (con) is crucial
The algorithm described below is based on the concept to Theorems 3.1 and 3.2. The conflict graph G(1)
of 2-allocation, cf [2]. It first constructs an infeasible of the ChlA instance .I is the graph whose vertices
mapping cy’ that is then transformed into a feasible so are the (source-code) objects, and there is an edge
lution cr to I. between two objects iff there is a vertex v of the control-
Incremental 2-Allocation Construction (MC): flow graph F(I) such that both objects are associated
Initially, we set: J to the input I; H (an auxiliary set of with v. We can view the control-flow graph F(I) of
triples) to { (0, mini ri7 mmj cj)}; and CX’ (a mapping from a program in a structured programming language as
a set of triples to RIO) to the empty-domain mapping. a series parallel (S.-P.) graph by introducing dummy
while(J # 0) { vertices and doing some edge reversals. Note that the
pick (‘w,zr,zr) E H such that w is minimal CMA definition assumes a program in a structured
among all triples in H and delete it from H; programming language. The nesting depth d(S) of a
if ( there exists (s: r, c) E J s.t. (r, c) n (zl, q) # 0 single vertex S.-P. graph S is I by definition. If an S.-P.
and, for all (w’,zi,z;) E H. (T,c) n (z;,z:) = 0) { graph S is the series (parallel) composition of two s-p.
pick and delete from J the (s,r,c) in the if test; graphs Sr and & , then its nesting depth d(S) is defined
set a’((s,r,c)) to w; as m=4d(Sd,d(Sd) (m=(Wd, d(W) + 1).
insert (w + s, max(zl,r), min(c, 2,)) into H; THEOREM 3.1. Let I be an instance of CMA. Given
if (2r < r) { insert (w,z~,r) into H; > any weighting of the vertices of G(I), a maximum
if (c < 2,) { insert (w,c,zr) into H; 3 3 //if weighted clique of G(I) can be built in polynomial time.
3 //while We obtain the next result using decomposition tech-
At this point, we have constructed a mapping a’ from I niques and a procedure closely related to IAC.
to RX. Now, define G(V,E) by V=I and E={(vi,vj)l THEOREM 3.2. Let F be the control-flow graph of a
i # 7, (Ti,Ci) fl (Tj,Cj) $ 0 # (Ol’(Vi)yQ’(Vi)+ Si) n CMA instance I. A solution to I of cost at most 3-d(F)
(o’(Vj),aC’(Vj) + Sj)} where vk denotes (sk,rk,ck)EI- times the optimum can be computed in polynomial time.
Order (s,r, C)EV in the same order as they have been
4 Conclusion
deleted from J, and construct a coloring f:V+{O, 1,. . .)
of G by means of First Fit. Finally, output cu:l-+$e We can view Theorems 2.1 and 3.2 as positive results
where a(ui)=o’(vi)+f(vi) mtij(o’(uj) + sj). // end about the approximability of the NP-hard interval col-
oring problem of two classes of intersection graphs, ie
Our C++ implementation of IAC has O(n logn)
of interval graphs and of intersection graphs related to
running time and handles a few thousand triples in less
CMA. Our approximation results are based on the con-
than one second on a SUN SPARC 4.
struction of suitable infeasible solutions, cf 2-allocations
THEOREM 2.1. Given a DSA instance I, the IAC algo- in [2], and on coloring of specific intersection graphs. As
rithm builds a solution Q to I of cost not exceeding 3 the next result shows, our approach can be used to de-
times the optimum cost in O(nlogn) time. signapproximation algorithms for different intersection
graph classesas well.
3 Compile-Time Memory Allocation
THEOREM 4.1. Given a weighted unit-disk graph, an
In this and the next sections, the definitions of the interval coloring of cost at most 7 times the optimum
notions in Sans Serif are standard and can be found can be constructed in linear time.
in the technical literature, cf [l] and [2]. Let us also References
point out that the control-flow analysis is not part
of CMA: poor analysis wiIl produce a CMA instance 1113. Fabri, Automatic storage optimization, ACM SIG-
with a short list of objects allowed to share memory, PLAN Notices, 14 (8), 1979,pp. 83-91.
PI J. Gergov, Approximation algorithms for dynamic stor-
and, hence, there will be lessopportunities for memory
age allocation, European Symposium on Algorithms
optimization. Now, assumethat F(I) is the control- (ESA’96), Springer, LNCS 1136, 1996,pp. 52-61.
flow graph of a CM.4 instance I. As it is shown