0% found this document useful (0 votes)
7 views11 pages

Query Optimization in OODB

The document discusses query optimization in object-oriented databases (OODB), focusing on how relationships between objects can be leveraged to enhance query processing efficiency. It presents a prototype query processor that utilizes semantic transformations to optimize queries, ensuring that the inherent semantics of object relationships are considered. The paper outlines the design of the query processor, the generation of query evaluation plans, and preliminary results from simulations demonstrating the feasibility of the approach.

Uploaded by

yeabalex18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

Query Optimization in OODB

The document discusses query optimization in object-oriented databases (OODB), focusing on how relationships between objects can be leveraged to enhance query processing efficiency. It presents a prototype query processor that utilizes semantic transformations to optimize queries, ensuring that the inherent semantics of object relationships are considered. The paper outlines the design of the query processor, the generation of query evaluation plans, and preliminary results from simulations demonstrating the feasibility of the approach.

Uploaded by

yeabalex18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Singapore Management University

Institutional Knowledge at Singapore Management University

Research Collection School Of Computing and School of Computing and Information Systems
Information Systems

1-1992

Query Optimization in OODB


Hwee Hwa PANG
Singapore Management University, hhpang@[Link]

Hongjun LU
National University of Singapore

Beng Chin OOI


Institute of Systems Science, Singapore

Follow this and additional works at: [Link]

Part of the Databases and Information Systems Commons

Citation
Hwee Hwa PANG; LU, Hongjun; and OOI, Beng Chin. Query Optimization in OODB. (1992). Database
Systems for Advanced Applications '91: Proceedings of the Second International Symposium on
Database Systems for Advanced Applications, April 2-4, 1991, Tokyo, Japan. 1-10.
Available at: [Link]

This Conference Proceeding Article is brought to you for free and open access by the School of Computing and
Information Systems at Institutional Knowledge at Singapore Management University. It has been accepted for
inclusion in Research Collection School Of Computing and Information Systems by an authorized administrator of
Institutional Knowledge at Singapore Management University. For more information, please email
cherylds@[Link].
Published in Database Systems for Advanced Applications '91: Proceedings of the Second International Symposium on Database
Systems for Advanced Applications, April 2-4, 1991, Tokyo, Japan. pp. 1-10.

Query Processing in OODB


HweeHwa Pang’, Hongjun Lu2, BengChin Ooi’

lInstitute of Systems Science


department of Information Systems and Computer Science
National University of Singapore
Kent Ridge, Singapore 05 11

scope of a class must be realised. Since more semantics are


Abstract In obiect-oriented databases, relationships are captured in OODB, it is desirable that the available semanuc
generally maintained explicitly. The partial result of a retrieved knowledge is used to optimize the query. In contrast to
object can be used to efficiently retrieve related objects. Instead relational database systems, joins of two unrelated objects are
of optimizing joins as in relational database systems, pointer not that common in OODBMS. Objects in a query are usually
chasing is optimized in object-oriented database systems. related, and their relationships are explicitly maintained. As
Further, semantics inherent in the object-oriented database,like
superclass-subclass relationships and composite-component such, objects can be retrieved using the information stored in
relationships between object classes, must be realised. In this the already retrieved objects. In this paper we present our effort
in query optimization in an OODBMS.
paper, we describe our initial result in query optimization in an
object-oriented database system. Semantic qu;;; In [Kor88], Korth attempted to recast a subset of object queries
transformation is used to preprocess the query. as relational queries so as to enable those queries that are
semantically optimized query is then translated into a query expressible in a relational language to be optimized using
evaluation plan which comprises method invocations that can standard relational techniques. Osbom [Osb88] investigated
be evaluated directly by the system. In the process of query query optimization on an object algebra, taking into account the
evaluation plan generation, initial results tend to show that a difference between identity and quality of the results of two
one source query plan is almost optimal. A prototype based on queries. The EXODUS optimizer generator [GrD87] creates an
this design has been completed and some results from a optimizer from a rule-based description of the query algebra.
simulation study on this prototype are also reported in this The algebra can be extended by augmenting an algebra
paper. description file. Freytag [Fre87] used transformation rules to
generate query evaluation plans (QEPs) from initial query
specifications. The transformation rules can easily be modified
1. Introduction to accommodate changes or extension to the set of possible
QEPs that the optimizer is capable of generating.
One of the essential properties of the object-oriented database
systems is encapsulation, whereby information in any class is This paper reports the design of our query processor. When
hidden from users and the only way to accessthe information is the query processor receives a high level query, semantics
through methods provided in that class. Methods are inherent in the relationships that involve object classes in the
procedures written in a programming language. The query are used to modify it to ensure the correctness of the
computational completeness of the programming language query results. Such relationships include the superclass-
makes methods significantly more expressive than the relational subclass and composite-component relationships. The next step
algebra. However, it is still desirable to provide an ad-hoc is semantic optimization, a technique which uses available
query interface to object-oriented databasemanagement systems semantic knowledge to transform the query into a new query
(OODBMS) that allows users to pose high level queries in a that produces the same answer as the original query, but
declarative language, much like SQL in the relational context. requires less resources to evaluate. As semantic optimization
Such a language has limited expressive power but, on the other increases the processing overhead considerably, efficiency of
hand, allows users to interact with the OODBMS using the the overall optimizer must be taken into consideration. In the
schema and is ideal for non-expert users. A query processor final step, an efficient query evaluation plan is generated, in
must be provided to translate such high level queries to which the sequence of the retrieval of objects and their access
invocations of methods in the [Link] to the nature of the methods are determined. Once the QEP is determined, the
object-oriented concepts, some features that are absent in elementary operations are then mapped into method invocations
relational database systems must now be considered. Before a which can be evaluated directly by the OODBMS. There may
query can be processed, all its inherent semantics such as the be more than one method to implement an elementary
operation, and the method that is most efficient in the query
context will be selected. The final sequence of method
invocations makes up the executable QEP. Since OODBMS
supports extensible data types and hence new methods, it is
important that the set of methods available to the optimizer be
extensible. The translation process is guided by a search
strategy that is of polynomial complexity. Some preliminary
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS ‘91 studies were carried out on a prototype we developed, and the
Ed. A. Makinouchi results show that our design is indeed feasible and reasonably
@World Scientific Publishing Co. efficient.

1
Database Schema:
supplier(name, address, supplies)
cxgo@de. &SC, quantity, supplies, collecrs)
vehicle(vehicle#,de.%,class,engcotttp, collects, drives)
engine(engine#, capacity, engComp)
employ@[Link],
[Link])
manager(name,
clearance,
rank,belongslb)
driver(name, clearance, rank, belongsTo, license#, licenseClass,
licenseDate, drives)
supervisor(name. clearance,rank,belongsTo,license#, licenseClass,
licenseDate, drives)
&parlment(name,
securityClass,
belongsTo)
[Link] italic arcpointersusedto implementrelationshipsbetweenobjectclasses.
Figure 2.1: An Example Database
The paper is organized as follows. In Section 2, we give an (e.g. belongsTo). Some fields might also be attached to the set
overview of the query processor. Section 3 describes the of pointers to represent attributes of the relationship.
semantic transformations that are carried out on a query before
it is translated into a QEP. Section 4 presents the selection of A query in our system involves object classes that are linked by
the cheapest access path and the method for retrieving each relationships. Retrieval begins with a chosen object class. All
object class. The generation of a QEP, based on the accesspath other object classes can only be retrieved through pointers in
and methods chosen, is described in Section 5. Implementation those classes that have already been accessed. Given a query,
and empirical results are presented in Section 6. We canclude there are many ways to execute the query. Each way gives rise
in Section 7 with discussion on future work. to an access path. Figure 2.2 shows an access path for a query
involving some object classes (represented by circles) and
2. Preliminary relationships (represented by solid lines between the object
classes) in the schema in Figure 2.1. The arrows denote the
The concept of object-oriented engenders different requirements sequence in which object classes are accessed, and parallel
in query processing strategies. Firstly, instances of the branches starting from a vertical bar denotes parallel retrieval of
subclasses of an object class are also instances of it. When these branches. In each access path diagram, there is at least
instances of the class are requested, implicitly instances of the one object class with no incoming arc. Such object classes are
subclassesare also required (unless explicitly stated otherwise). known as sources and the remaining object classes are
Assuming that instances of the subclasses are not duplicated in intermediates. Object class engine is a source and the rest are
the object class itself [Ba*87], the query processor has to intermediates. Object classes with no outgoing arcs are further
explicitly add to queries subclasses of those object classes known as sinks. Object classes driver and supervisor are sinks.
requested. Furthermore, if two or more branches merge at an object class,
that class is known as a sync.
The second issue arises from the fact that there might be
composite-component relationships between object classes All the inputs to the optimizer are given in the following
involved in a query. To be semantically correct, such conceptual format.
relationships must be taken into account during query
evaluation. (SELECT (projectList) (joinPredicateList}
1selectPredicateList)
engine vehicle (relationshipList) (classList))

w The different parts of the query describe the attributes required,


the join predicates and selection predicates on object classes,
the relationships between the object classes involved, and the
Query: Get the cngik# of each vehicle driven by class 3 drivers. object classes to be accessed, respectively. A join predicate is
one that involves attributes from more than one object class,
Fig&e 2.2: A Query Access Path while a selection predicate is one that involves attributes from
only one class. To avoid ambiguity in the query path, the
Consider the database schema shown in Figure 2.1, which is relationships joining the object classes involved in a query have
used in all the examples in this paper. In this example, we to be explicitly identified. Though there are some redundancies,
assumethat a subclass (e.g. manager and driver) inherits all the this representation is nevertheless chosen to improve the clarity
attributes, including the relationships, of its superclass (e.g. of our illustrations. In what follows, restriction predicates are
employee). Unlike relational databases in which each used to refer to join predicates and selection predicates
relationship is usually represented by a separate relation, collectively.
relationships in object-oriented databases are typically
represented by pointers. Hence each object instance in a class A declarative query is mapped to a series of elementary
participating in a relationship contains an associated set of operations and commands. The following is a list of
pointers. This set of pointers stores the object ids of instances elementary operations:
in other participating classes associated to it by this relationship

2
(SCAN class [attributeList) (joinPredicateList] In the definition, scan1 and scann are methods that implement
(selectPredicateList) resultClass) the elementary retrieving operation scan using different
Denotes an object class scan, with the attributes required searching techniques. To estimate the cost of retrieving any
(in the attributelist) from it and the join and selection object class, all the cost estimation functions in that class are
predicates that have to be satisfied. resultclass is a virtual evaluated, and the lowest cost is returned. The method chosen
class (view) that contains the instances that form the to retrieve the class is the one associated to the lowest estimated
results. Before concatenating the output to the partial cost. Note that
result in resultclass, the join predicates in
joinPredicateList are evaluated and instances that do not .
satisfy all the join predicates are eliminated. The access Each cost function costi associated with method scar+
path and retrieval method are not specified. requires a compulsory argument sf, which represents the
selectivity factor. The estimation of selectivity factors
(FORK (list1 ) .. {list,)) will be discussed in Section 4.
This is a keyword which activates n parallel threads to .
retrieve the n lists of object classes. Object classes in an QQDB are related by superclass-
subclass relationships, with the system class being the
superclass of ail other classes. Hence the object classes
(LINK (source) (sink) (joinPredicateList] form a directed acyclic graph (DAG). If a particular
[ selectPmdicateList) retrieval method scani is shared by all the object classesin
{relationshipList) (classlist) rest&Class)
Denotes retrieval of an arbitrary number of object classes a subgraph in the DAG, that method can be factored out
(given in classlist) without specifying their order, except and placed in the root of that subgraph. This way all the
for the source and possibly the sink. The join predicates classes in the subgraph will inherit the method. For
(in joinpredicatelist) and selection predicates (in example, an indexed retrieval method and a sequential
se1ectPredicateLi.u) that the results must satisfy, and the retrieval method, together with their associated cost
functions, might be provided in the root class (the root of
relationships between object classes (in relationshiplist) the object class DAG), and all the other classes can inherit
are also given. The output is to be placed in resultClass.
these methods by default.
There must be at least one implementation (method) for each
elementary operation. The most efficient method in the query The optimizer, outlined in Figure 2.3, consists of four
context will be chosen, and the chosen method will replace the modules. Semantic preprocessing transforms queries to reflect
elementary operation. The final sequenceof methods makes up the semantics inherent in the object-oriented database, eg. the
the QEP and can be executed by the DBMS directly. As the superclass-subclassrelationships and the composite-component
relationships between object classes. This module is essential
choice of the most efficient methods is system-dependent, we for the correctness of the query results. Semantic optimization
will only briefly explain how this is done, using the elementary uses semantic constraints to modify the queries so that they can
retrieval operation, scan, as illustration. The main emphasis of
this paper, however, is on the transformation from declarative be evaluated more efficiently, without changing the semantics
queries to elementary operations. of the queries. Access path selection decides on the order to
retrieve object classes and the method to use to access each
Each class definition includes the various methods that can be object class. QEP generation produces series of method
used to retrieve the object instances in it. In addition, a function invocations to execute the queries.
must be provided for each method that gives its estimated
execution cost. For example, the definition of class A might be 3. Semantic Transformations
class A { 3.1 Semantic Preprocessing
attributes:
a 1: ..; The semantic preprocessor uses four preprocessing rules to
make semantics inherent in the object-oriented context explicit.
am: ..; This step is necessary for the cotrectness of the query results.
class methods: . Adding Subclasses
scan(A, ...). scan&A, ..); Since instances of an object class include those that
belong to its subclasses, subclasses (and in turn their
scan(A, ...>. scanu(A, ..); subclasses) of object classes involved in the query must
be added to it. Note that the subclasses inherit the
costt(A, sf, ..): ..; selection predicates and join predicates on the object
class. The subclasses of each object class can be
cost,&, sf, ..>: ..; identified from the schema and is available from the data
scanCost(A, sf, ..): t& costi(A, sf, ..); dictionary during runtime. For example,
(SELECT ([Link], [Link]#) ( ) ( )
(drives) (driver, vehicle))

Semantic Access Path QEP


query Preprocessin Selection Generation

Figure 2.3: Overview of the Optimizer

3
becomes (SELECT ([Link], [Link]#,
[Link]) ( ) ( )
(SELECT ( (drivername, [Link]), (drives, engcomp) (driver, vehicle,
[Link]#) ( ) { ) engine))
(drives) (driver, supervisor, vehicle))
3.2 Semantic Optimization
which means [Link] and [Link] will be
“unioned” together to form one attribute in the results. Semantic query optimization was first proposed by King
[Kin8 1] and by Hammer and Zdonik [HaZSO]. Three semantic
. Identifying Existing Subclasses optimization rules - restriction elimination, restriction
Among the classes listed in the query, some might be introduction and class elimination (the equivalence of relation
subclasses of others. For example, elimination), have been incorporated into our optimizer. In
[PLO!20], we presented an efficient algorithm to implement
(SELECT ([Link], [Link], these optimization rules and we briefly discuss it here. In our
[Link]#) ( ) ( ) approach, all possible transformations are tentatively applied to
(drives) (driver, supervisor, vehicle)) the query. Instead of physically modifying the query, the
transformation process classifies the predicates in the query into
As it is, the results would be wrong since [Link] imperative, optional or redundant. An irnperative predicate is
and [Link] get “multiplied” together. A one whose removal will affect the final results. An optional
transformation is required to group together the classes predicate is one that, though its inclusion or exclusion does not
related by superclass-subclassrelationships: affect the final results, might affect the execution efficiency of
the query by enabling us to make use of indices, or by cutting
(SELECT ( ([Link], [Link]), down the number of instances returned from some object
[Link]#) ( ) ( ) classes and thereby reducing the size of intermediate results. A
(drives) (driver, supervisor, vehicle)) redundant predicate affects neither the final results nor the
execution efficiency. At the end of the transformation process,
. all the imperative predicates are retained while the redundant
Deep Retrieval predicates are eliminated. Whether an optional predicate should
When a composite class is to be retrieved, it is possible to be retained depends on the estimated cost savings it can help
automatically retrieve some of its component classes. bring about and the cost of evaluating it. This effectively means
Whether this rule is active depends on the value of a the task of choosing the beneficial transformations is delayed
system variable. The depth of retrieval, i.e. number of until all the possible transformations have been considered.
layers of components thus retrieved, can either be Using this approach, previous transformations do not preclude
computed dynamically based on available display space, other transformations and the order of transformations is
or be some pre-determined values that might vary from immaterial. Hence the algorithm is of polynomial complexity.
class to class, or be given as part of the query. For Another advantage of this approach is there is no need to check
example, the profitabilities of all the transformations, eg. those
transformations that re-classify predicates to redundant should
(SELECT ([Link], vehicle.*) ( ) ( ) always be carried out. Another issue addressed in the paper is
(drives) (driver, vehicle)) the grouping of semantic constraints to reduce the overhead of
retrieving constraints and checking whether each constraint is
is transformed to relevant to the current query. Constraints are grouped according
to the object classes they reference. When optimizing a query,
(SELECT ([Link], vehicle.*, engine.*) ( ) only selected groups of constraints are considered. At the same
( ] (drives, engComp) (driver, vehicle, engine )) time, constraints are also classified as intra- or inter-class,
depending on the number of object classes each references.
There are different ways to present the results. If This classification is exploited during optimization to reduce the
relations are used, a component class is shown as a transformation overhead.
“nested” relation in the relation that represents its
composite class. Note that we represent a nested object
such as [Link] as two objects with a relationship. 4. Access Path Selection
In this section, we use the following [Link]. NCARD(Oi) is
. Identifying Existing Components the number of object instances in class Oi, and is available in
Among the classes to be retrieved, some might be the data dictionary. Di refers to the number of instances in Oi
components of others but these composite-component
relationships are not stated in the query. For example, that have to be retrieved from the [Link] device. After the
retrieval, the instances might be tested against some selection or
(SELEa ([Link], [Link]#, join predicates and only those instances which satisfy the
[Link]) ( ) ( ) predicates are returned. Pi measures the number of such
(drives) (driver, vehicle, engine)) returned instances. Hence Pi I Di.
The results of this query would not be (logically) correct
since engine and vehicle simply get “multiplied” together, 4.1 Estimation of Selectivity Factors
and would not show the engine capacity of each vehicle. Before the optimizer can calculate the costs of different access
This preprocessing rule makes the composite-component paths, it first has to estimate the selectivity factor for each
relationships explicit:
attribute and object class. The selectivity factor is defined as
the number of the qualified instances over the total instances. It
will be used to estimate the number of instances that need to be

4
mievexj and the number of instances that will be returned from Any object class in the query can be used as the source in the
each object class. When accessing attributes, there are two access path. We define a selectivity matrix Y which contains
possible categories of selections. selectivity factor on the object id in each object class as follows.
For a query involving m object classes,
. Selection before retrieval: This is the selection on an
attribute via an index such that only instances that satisfy y = ( Wij ), 1 <ijIm
the restriction predicates on the indexed attribute are
retrieved We denote the selectivity factor before retrieval
of object Oi based on attribute Aj as SFBib When the I1 ifi=j
attribute is not important to the discussion, only SFBi is [Link] = 1 SFAi if Q and Oj am connected by an arc
used. l0 otherwise

. Selection after retrieval: This is the selection on attributes When evaluating a query, the selection on object classes
which are not part of the index. Whether an instance accessed previously might reduce the number of instances in
satisfies the restriction predicates on the attribute can only the current object class that need to be retrieved. Assuming an
be determined after the instance has been retrieved. We access path of O,, .., Om. The selectivity factors on O,, .., 0,
denote the selectivity factor after retrieval of object Oi 1 need to be propagated to Oi. Hence we define Y2”, n 11 as
based on attribute Aj as SFAC.

Estimation of selectivity factors before retrieval for sources and 9” = ( vijzn )


selectivity factors after retrieval has been studied by various
researchers. The System R optimizer [SAC791 used simple where
statistics, such as the minimum and maximum values in a given
attribute, to estimate selectivity factors of attributes. This
scheme produces good selectivity estimates only if the attribute
values are uniformly distributed. Equi-depth histograms have
been proposed as an alternative for estimating selectivity of The selectivity matrix Y” can be interpreted as follows. If we
attributes whose values are n t uniformly distributed [ShC84, start from Oi and access intermediates via object ids, the
MuD88]. In our current prot2 type, we use a simple scheme that
is similar to System R to minimize overhead. With this scheme, selectivity factor before retrieval of an object class Oj linked by
the selectivity factor before retrieval, SFQ for an intermediate at most n relationships to Oi is yijn. Hence for a query
Oi accessed directly via pointers from ObJectclasses 0,, O,, ..,
and 0, is defined as: involving m object classes, we need to compute Ytt, n 2 m-l to
know the selectivity factor before retrieval of any object class
SFuij = fi SFA,, given an arbitrary source. The number of matrix muhiplications
k =I is thus p = rlog2m-11, and the total number of primary
Selectivity factors are used to estimate the number of object operations needed is m3 x rr0g2d1.
instances to be retrieved, and the number of instances that will
be returned. Assuming restriction predicates on different
attributes are independent (indeed this condition is satisfied Having computed the selectivity matrix Yp, we compute the
since redundant predicates are removed), the selectivity factor cost matrix Q, as follows.
before/after retrieval of an object class Oi, SFBJSFAi, is the
product of the selectivity factors before/after retrieval of the @ = ( 4tij )
attributes in that class. From now on we shall refer only to
selectivity factors of object [Link] where
Di = SFBi X NCARD(Oi) l$ij = SCaJlCOSt(i, WijP, . ..>
Pi = SFAi X NCARD(Oi)
Every bij E @ gives USthe minimum cost of retrieving object
4.2 Determination of Access Path class Oj If we use Oi as the source. Hence we choose as source
We now present the algorithm for access path determination. Oi such that the total cost of retrieving all the object classes is
This has been demonstrated to be a hard combinatorial problem the lowest.
[SwG88] and algorithms like simulated annealing and
techniques based on local search have been developed [Pas821
to solve it. In our current implementation, only those access COSl+~ = mitt g $ij 4Qj E @
IS&m
paths with one source am considered. This reduces the access j=l

path determination problem to the problem of finding the


reachability between any two nodes. Hence the algorithm is of We use as accesspath the one from which we derive the lowest
polynomial complexity. Our justification for using this retrieval cost using the chosen source. Figure 2.2 shows a
simplified approach is, since our selectivity factors are only possible accesspttth with engine as the source.
estimates of the actual values, it is futile to attempt very detailed
computations as the errors from estimating the selectivity The most expensive part of our accesspath selection algorithm
factors would get magnified. is the computation of the selectivity matrix. Hence the time

5
is transformed to
complexity of the algorithm is o(m310gZm), where m is the
number of object classes in the query. An access path chosen (REQUIRE ( ([Link], [Link]) )
by our algorithm is essentially a tree with the source as root. (LINK (to, I I
We prefer such paths because they do not contain sync& so ([Link] = vehicleclass,
each branch of the tree can be retrieved in parallel without [Link] := [Link])
having to wait for another branch at a sync. ([Link] > 1000)
(drives, engcomp)
5. QEP Generation (driver, supervisor, vehicle, engine)
resultClass))
After the access path has been determined, the QEP can be
generated from the query using QEP generation rules. We The generation of QEPs can be divided into three steps. In the
define predicate A(j) such that A(i) evaluates to true if all the first step, the optimizer transforms the query to reflect the
accesspath chosen. The notations for the QEP generation rules
attributes involved in join predicate j have already been used in this step are listed in Table 5.1.
retrieved, otherwise A(j) is false. In addition, we define
function F as follows: . When the sink in the access path has been reached, Rule
5.1 terminates the step by replacing the LINK operation
. F(p) returns the set of object classes referenced by p, with a null list.
where p is a restriction predicate, i.e. join predicate or
selection predicate. Rule 5.1: (LINK F L J S R T resultClass) +-Cl

. where Cl = (F = L).
F(a) returns the set of object classes that contain attribute
a. . The following two rules generate a SCAN operation for
the first object class in the access path, and the selection
The query is first converted to algebraic form using the predicates on this object class are moved into the SCAN
following rule to ease the generation process. operation. The object class that is linked to this first
object class will in turn be the fit among the remaining
(SELECT el e2 e3 e4 e5) + classes. The difference between the two rules is the
(REQUIRE el (LINK (to) ( ) e2 e3 e4 e5 treatment of join predicates that involve the first object
resultClass)) class. If a join predicate also involves another object class
which has yet to be retrieved, Rule -5.2keeps it with the
Note that a dummy source, rV has been introduced into the LINK operation; otherwise Rule 5.3 is used to move it
into the SCAN operation.
term LINK. This is to facilitate generation of QEP and will be
removed at the end of the process. resulfCZuss will contain the
results of the query and is initialized to au empty class. Rule 5.2: (LINK (t,) L Ju(jf)Su(sf)Ru(r)
C2A Itot@(
Example 5.L The algebraic form for the query “get the w(~ ) resultclass +
1
names of those drivers whose license classifications equal the ((SCAN tf ( ) ( ) ( sf) resultclass)
classifications of the vehicles they are driving, and whose
vehicles have engine capacity greater than 1OOOcc”. (LINK (t,) L Ju(jf r E tt-r) S R T
resultClass))
(SELECT ( ([Link], [Link]) }
([Link] = [Link],
[Link] = [Link]) Rule 5.3: (LINK (t,) L Ju(jf) Su(sf) Ru(r)
(enginecapacity > 1fKKl) C2 A A(js
(drives, engcomp) w(~ ) resultClass +
1
(driver, supervisor, vehicle, engine)) ((SCAN tf ( ) (jt) ( sf) resultclass)
Table 5.1: Notations for QEP Generation Rules

F asetofsources,F=()or(td
L a set of sinks, L = ( ) or (tt)
J a list of join predicates
S a list of selection predicates
R a list of relationships
T a list of object classes
i-2 a set of values retrieved from attribute r in the previous SCAN
Sf the selection predicates on object class tf
jf the join predicates that refer to object class tf

6
(LINK ( tl ) L Ju( r E $r) S R T resultclass)) (LINK &,I (t,,+ll Jnu& r,, 6 f&,) SnRn
Tn resultClass))
where c2 = (r links t,and tl)
(LINKh+, 1L J,+,%) %+,Rn+lTn+l
resultClass))
. The next two rules generate a SCAN operation for the
first object class in the access path and divides the Rule 5.7: (LINK (tf) L Ju(jf) Su(sf) Ru(rl,..,rn)
remaining object classes (together with their selection
predicates and relationships) into sub-blocks that are C4 A AI&)
evaluated in parallel. Each of these sub-blocks contains Tu ( t , ,..,t”,t,+, ) resultC1ass +
an object class that is linked to the fast object class, and ((SCAN tf ( ) (j,) ( sf) resultClass)
which will in turn be the first in the sub-block. The (FORK
difference between Rule 5.4 and Rule 55 is the treatment
of join predicates that involve the first object class, and is (LINK ($1 (fn+l) Jluir, E $-r,) S, R, T,
similar to the difference between Rule 5.2 and Rule 5.3. resultClass)

Rule 5.4: (LINK (t,) L Ju(jf) Su(sfJ Ru(rt,..,r,) (LINK ($1 It,,,,) J,,Mr,,E $.‘,,I SnR,,Tn
c3/? Ilot@( resultClass))
Tu( t _ , ) resultclass +
7. 7 CLINK
4t+l)L Jn*l%+I%I+1
T*+l
((SC& t,Y{ I I ) {sf) resultclass) resultcl:lass))
(FORK n+l
(LINK (t, ) 1,J,u{jf r1 E tfrl) St R, T, where C4 = (( iyt j E Ji, t E TiU( ti) for SOITS t E J?(j))
resultClass) II+1
.. A (iyl s E Si, T(S) = t where t E TiU( 6))
(LINK It,) L JnuCif rn E $-.r,) SnR,, T, n+l n+l n+l n+l
resultClass))) A(U Ji=J)r\(U Si=S)A(U Ri=R)A(U Ti=T)
\ i=l
i+l i=l i=l
Rule 5.5: (LINK ($1 L Ju(jf] Su(sf) Ru(rl,..,rn) A (itI ri links tpand b))
C3 A A(if)
Tu( t __ t ) resultclass +
1’ ‘n
((SCAN ti’( ) (j,) ( sf) resultclass) In the second step of QEP generation, each attribute specified in
(FORK REQUIRE is moved into the SCAN operations for the object
classesthat contain the attribute, and finally the term REQUIRE
(LINK (t,) L Jlu(rt E tfrl} St R, T, is removed.
rest&Class)
(REQUIRE ( ) (.. ..)) + (.. ..)
(LINK (t,) L J,u{r, E tfrn} Sn Rn T,
resultClass))) (REQUIRE (..a..) (..(SCAN t (.. ..) J S
resultclass)..)) +c5
where C3 = (tii, j E Ji, t E TiU(t;) for some t E r(i)) (REQUIRE (.. ..) (..(SCAN t (..a..) J S
resultclass)..))
A tiil s E Si, T(S) = t where t E TiU(ti))
where C5 = (t E T(a)).
^(I: Ji=J)I\(L Si=S)h(~Ri=R)*(~Ti=T)
i+l i=l i=l i=l
Example 5.2: For the query in Example 5.1, if the access
A (iil ri links tr and ti)) path in Figure 3.2 is chosen, the corresponding QEP would be
((SCAN engine [ ) ( ) (capacity > 1000] resultClass)
. The last two rules are similar to Rule 5.4 and Rule 5.5,
(SCAN vehicle ( ) (engcomp E
except there is a sync at the end of the parallel sub-
blocks. [Link]) ( ) resultclass)
(FORK
(SCAN driver [name)
Rule 5.6: (LINK (tl) L Ju(jf) Su[sf] Ru{rt,..,m)
{licenseClass = [Link], drives E
resultclass -+c4A not@(ir)) [Link])
W+,t,,,t,,+J ( ) resultClass)
((SCAN I,. { ] ( ) ( sf) resultclass) (SCAN supervisor (name)
(FORK (licenseClass = [Link], drives E
(LINK It, I ttn+l) J,u& r1 E ffrll S, RI [Link])
T, resultChss) ( ) resultClass)))

7
Table 6.1 Database Sizes

Strictly speaking, attributes (including those that represent Table 6.2 shows the ratio of the cost of optimized query
relationships) in an object class that are involved in selection or (including query transformation time) and the cost of original
join predicates should appear in the attribute list of the query for each pair of queries and each database instance. For
corresponding SCAN operation, so that they will be in DB 1, the smallest database we used, performance worsened for
resultClass for evaluation of the predicates. Furthermore, if 40% of the queries, and the extra overheads were limited to
such attributes am not among those required in the final results, about 10%. Only 34% of the queries executed faster after
they should be dropped from resultCluss immediately after the optimization, out of which 20% were significant
relevant predicates have been evaluated. However, we leave out improvements. This result is expected because when the
such attributes to simplify the presentation of our QEP database is small, execution times of the original queries are
generation rules and examples. low (in our experiment most of them took 1 to 2 seconds).
Unless the outputs can be obtained without going to the
While determining the access path, the optimizer had to decide database, the overheads of optimization usually more than
on the cheapest method scani to retrieve each object class i. In offset the little savings derived. As the database grew in size,
the last step, this is reflected in the QEP by changing each semantic optimization became much more effective. For DB4,
SCAN operation to scani . the largest database we used, 67% of the queries executed
faster after optimization. Furthermore, some queries (27%)
which originally either took hours to exec:ute or could not be
6. Implementation and Empirical Analysis executed at all (because the system ran out of resources) were
able to be executed significantly faster after optimization.
Our prototype query optimizer was implemented on a SUN-
3/160 workstation. As different parts of our object-oriented We conclude, as one may expect, that it is probably not worth
database system (OODB) are currently being developed, the doing semantic optimization when the database is small or
effectiveness of our optimizer was evaluated as follows. A when the query execution cost is expected to be low, i.e. the
sample database whose schema is shown in Figure 2.1 was semantic optimizer should be disabled. (in fact it has a
built. All possible paths in this schema were identified, where a monitoring mechanism which would cause it to disable itself
path consists of a series of interconnecting object classes and automatically). However when the database is large or when
relationships, and no object class or relationship appears more the query execution cost is expected to be high, the usefulness
than once, A “sensible” query was formulated for each such of the semantic optimizer is apparent.
path and thus a set of queries was generated. From this set of
queries, 40 test queries were randomly chosen and sent to the Since the optimizer does not exhaustively check all the possible
optimizer. After optimization, each pair of original and accesspaths, the QEPs generated might not always be optimal.
optimized query was sent to a DBMS to be executed. A However, our search strategy gives us access paths that are
relational DBMS, Oracle 5.0 running on an IBM RT, was used optimal or near-optimal in most cases. We evaluated 100
to simulate the cost ratios of the optimized and original queries. different queries with average selectivity factors equal to 0.25,
Although relational DBMS and OODB might use different 0.5, and 0.75 in turn and calculated the ratio of the cost of the
access mechanisms, we believe an improvement in execution optimal access path to the cost of the best one-source access
efficiency in the relational context would most likely indicate a path. The results are shown in Table 6.3. We observe that, for
similar improvement in OODB context. 70% of the queries, we were only able to reduce the execution
costs by at most another 20% if we had continued to search for
In our experiments, each object class had an average of 3 the optimal paths instead of stopping after getting the best one-
semantic constraints attached to it. The database was populated source paths. These potential savings did not justify the extra
with randomly generated data, the amount of which was varied
to test the performance of the optimizer under different database computation and spaceoverheads involved, especially when the
sizes. Here we present the performance under four database query execution costs were relatively low. Hence in our
instances, each of increasing size. Some statistics of the current implementation we feel it suffices to consider only one-
database instances are given in Table 6.1. source paths. However, if a stop criterion for the searching of
the optimal path can be determined, then search space is worth
enlarged by considering multiple source accesspaths.
Table 6.2: Ratio of Optimized Cost and Original Cost

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110%
DBl _ 20 _ _ _ _ _ _ 7 7 26 40
DB2 _ 20 27 7 13 33
DB3 13 13 7 r I r 7 7 13 _ 40 _
DB4 27 13 _ _ 7 _ _ 7 13 - 33

8
Table 6.3: Optimality of One-Source Paths

Cost of optimal path


Cost of best l-source path 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Percentage of queries 6% 6% 9% 9% - 24% 3% 43%

7. Conclusion [Fre87] J.C. Freytag, A Rule-Based View of Query


Optimization, Proc. of the ACM SIGMOD 1987
This paper reports the design and implementation of a query Annual Conference.
processor for our OODBMS. In this work, we clearly describe
the process of translating a declarative query that possibly [GrD87] G. Graefe, D. Dewitt, The EXODUS Optimizer
involves many classes and relationships into a sequence of Generator, Proc. of the ACM SIGMOD 1987
method invocations that can be evaluated directly by the Annual Conference.
OODBMS. Each method invocation activates a method that
retrieves objects in the class which the method belongs to. IH~803 M.M. Hammer, S.B. Zdonik, Knowledge Based
Several issues peculiar to the query optimization in OODBMS Query Processing, Proc. of the 6th Int. Conf. on
were discussed, in particular, the optimization of using Very Large Data Bases, Sep 1980.
pointers maintained in the objects to answer the query. In order
to support new data types and new methods, the program [Hue801 G. Huet, Confluent Reductions: Abstract
transformation approach is adopted to generate the QEP. Some Properties and Applications of Term Rewriting
simulation studies have been carried out on this prototype query Systems, Journal of the ACM 27,4 (Ott 1980) pp.
processor. The initial result shows that the optimizer is 797-821.
effective.
[Kin811 J.J. King, QUIST: A System for Semantic Query
Our current work in this area is to enhance the access path Optimization in Relational Databases, Proc. of the
determination algorithm to incorporate joins, instead of doing yt8;nt. Conf. on Very Large Data Bases, Sep
only pointer-chasing. Recall that in this work, if class B
precedes class A in the chosen access path, objects in B can be
retrieved only via pointers from A. Under certain [Kor88] H.F. Korth, Optimization of Object-Retrieval
circumstances, it might be cheaper to retrieve all the objects in Queries, Proc. of the 2nd Int. Workshop on
A and then perform the join operation between these objects Object-Oriented Database Systems, Sep 1988.
and the retrieved objects from B. Of course, incorporating joins &NO891 H. Lu et al On the Design of a Multimedia Object-
complicates the algorithm considerably but we feel it is a Oriented Database System, Working Paper,
possibility worth exploring. Institute of Systems Science, National University
of Singapore, Dee 1989.
Acknowledgements J. hbo, J. Minker, A Metaprogramming Approach
to Semantically Optimize Queries in Deductive
We would like to thank Desai Narasimhalu and TokWang Ling Databases, Proc. of the 2nd Int. Conf. on Expert
for helping to improve an earlier version of this paper, and Database Systems, Apr 1988.
anonymous referees for useful comments.
References [~861 C.V. Malley, A Knowledge-Based Approach to
Query Optimization, Proc. of the 1st Int. Conf. on
Expert Database Systems, Apr 1986.
[Ba*87] J. Banerjee, et al., Data Model Issues in Object
Oriented Applications, ACM Transactions on [MUD881 M. Muralikrishna, D. Dewitt, Equi-Depth
Office Information System, Vol. 5, No. 1, Mar Histograms For Estimating Selectivity Factors For
1987. Multi-Dimensional Queries, Proc. of the ACM
[BGWIl] P.A. Bernstein, N. Goodman, E. Wong, et al, SIGMOD 1988 Annual Conference.
Query Processing in a System for Distributed [Osb88] S.L. Osborn, Identity, Equality and Query
Database, ACM Trans. on Database Systems, Optimization, Proc. of the 2nd Int. Workshop on
6(4):602-625, Dee 1981. Object-Oriented Database Systems, Sep 1988.
[CFh484] U.S. Chakravarthy, D.H. Fishman, J. Minker, [Pas821 C.H. Papadimitriou, K. Steiglitz, Combinatorial
Semantic Query Optimization in Expert Systems Optimization: Algorithms and Complexity,
and Database Systems, Proc. of 1st Int. Workshop Prentice-Hall, 1982.
on Expert Database Systems, Ott 1984.
[PLO901 H. Pang, H. Lu, B. Ooi, An Efficient Semantic
[CMG86] U.S. Chakravarthy, J. Minker, J. Grant, Semantic Query Optimization Algorithm, JEEE Int. Data
Query Optimization: Additional Constraints and Engineering Conf., Japan, 1991 (to appear).
Control Strategies, Proc. of the 1st Int. Conf. on
Expert Database Systems, Apr 1986. [SAC791 P.G. Selinger, M.M. Asaahan, D.D. Chamberlin,
[Fre86] J.C. Freytag, Rule-Based Translation of Relational R.A. Lorie, T.G. Price, Access Path Selection in a
Queries into Iterative Programs, Proc. of the ACM Relational Database Management System, Proc. of
SIGMOD 1986 Annual Conference. the ACM SIGMOD 1979 Annual Conference.
[ShC84] G.P. Shapiro, C. Connell, Accurate Estimation of (REQUIRE ( ([Link], [Link]))
the Number of Tuples Satisfying a Condition, ((SCAN to ( ) ( ) (1 resultClass)
Proc. of the ACM SIGMOD 1984 Annual (SCAN engine () ( ) ([Link]> 1000)
Conference. tesultClass)
(SCAN vehicle ( ) (engcomp E
[Sh087] S.T. Shenoy, Z.M. Ozsoyoglu, A System for [Link])( )
Semantic Query Optimization, Proc. of the ACM rcsultClass)
SIGMOD 1987 Annual Conference. (FORK
(LINK (driver) ()
[Sie881 M.D. Siegel, Automatic Rule Derivation for ([Link]-=[Link],
Semantic Query Optimization, Proc. of the 2nd Int. drives E [Link]) { )
Conf. on Expert Database Systems, Apr 1988. ( ) ( ) resultClass)
(LINK (supervisor) ( )
[SwG88] A. Swami, A. Gupta, Optimization of Large Join { [Link]= [Link],
Queries, Proc. of the ACM SIGMOD 1988 Annual drives E [Link]) (1
Conference. ( I t 1 =~t(J=m))
1 by Rule 5.3
[Zdo891 S.B. Zclonik, Query Optimization in. Object- (REQUlRE ( {[Link], [Link]))
Oriented Databases, IEEE Proc. of the 22nd ((SCAN to ( ) ( ) () resultClass)
Annual Hawaii Int. Conf. on System Sciences,
1989 (Vol II). (SCAN engine ( ) ( ) (enghxzcapacity > 1000)
resullclass)
Appendix (SCAN vehicle ( ) (engcomp E
[Link])()
In this Appendix, we show how we derive the QEP in Example tcsultClass)
(FORK
5.2 from the query in Example 5.1, using the access path in ((SCAN driver I )
Figure 3.2. ..
([Link]&rseClass :=[Link].
(REQU&EI (($~jname, [Link])) drives E [Link]) ()
rcsultclass)
0 (LINK () (] () () () () resultClass))
([Link]= [Link], ((SCAN supervisor ( )
[Link]= [Link]) ([Link]= [Link],
([Link]> 1000)
(drlvcs, cngComp) drives E [Link]]( )
(driver, supervisor, vehicle, engine) resultClass)
resultclass)) (LINK () () (} () () () resultClass)))))
1 by Rule 5.2 3. by Rule 5.1
(REQUIRE ( ([Link], [Link]]) (REQUIRE ( ([Link], [Link]))
((SCAN to ( ) ( ) ( ) resultClass) ((SCAN t ( ) ( ) ( ) resultClass)
0-M bszine~ iI (SCAN e&inc ( ) () (engkcapacity > 1000)
([Link]= [Link], resultClass)
[Link]= [Link]) (SCAN vehicle ( ) (engcomp E
(enginecapacity 5 1000) [Link])( )
(drives, engComp) rcsultclass)
(driver, supervisor,vehicle) (FORK
resullClass)))
(SCAN driver ( )
1 by Rule 5.2 ([Link]= [Link],
(REQUIRE ( (drivername, [Link]))
((SCAN to ( ) ( ) () resultClass) drives E [Link]) ( )
resultclass)
(SCAN engine ( ) () ([Link]> 1000) (SCAN supervisor ( )
resultClass) ([Link]= [Link],
@JNK (vchiclc) ()
drives E [Link]) ()
[ [Link]= VehiCkCk3SS, resultclass))))
[Link]= [Link], 1 project attributes,removedummy scan
engCompE [Link]) ((SCAN engine () { ) (capacity > 1ooO)resuhCla&
tiives) (SCAN vehicle ( ) (engCompl
[driver, supervisor) [Link]){ ) resultClass)
resultclass))) @OF=
(SCAN driver (name)
-1 by Rule 5.5 (licenseClass= [Link]:,drives E
[Link]) ( }
rcsultClass)
(SCAN supervisor (name)
(1icenscClass= [Link],drives E
[Link]) ( )
lwlltclass)))

10

You might also like