0% found this document useful (0 votes)
351 views4 pages

Db2 LUW Recursive SQL - Part 1

This document summarizes an approach to using a recursive common table expression (RCTE) in SQL to count the total number of direct and indirect employees for each organization unit in a hierarchy. It first discusses representing the organization hierarchy as an adjacency list and transitively closing it to explicitly include indirect relationships. It then shows the SQL code that defines a RCTE to close the relationships and select the results with the employee counts for each unit.

Uploaded by

robert.mala
Copyright
© Public Domain
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
351 views4 pages

Db2 LUW Recursive SQL - Part 1

This document summarizes an approach to using a recursive common table expression (RCTE) in SQL to count the total number of direct and indirect employees for each organization unit in a hierarchy. It first discusses representing the organization hierarchy as an adjacency list and transitively closing it to explicitly include indirect relationships. It then shows the SQL code that defines a RCTE to close the relationships and select the results with the employee counts for each unit.

Uploaded by

robert.mala
Copyright
© Public Domain
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

technical support | article

Recursive SQL—Part One


By Rob Mala

INTRODUCTION
AAA (1) AAA (9)
| |
I was recently asked if I could write an SQL +----------+----------+ +----------+----------+
statement that would count the number of | | | | | |
BBB (2) FFF (1) GGG (0) BBB (6) FFF (1) GGG (1)
direct and indirect employees for each unit
| | | |
within an organization hierarchy. “Sure,” I +-----+-----+ | +-----+-----+ |
said, “give me 10 minutes and I’ll have some- | | | | | |
CCC (1) DDD (2) HHH (1) CCC (1) DDD (3) HHH (1)
thing ready for you.” After several false starts
| | | |
I managed to write the SQL statement, though | | | |
it took a little longer than 10 minutes. | | | |
EEE (1) III (0) EEE (1) III (0)
I had read about Recursive Common Table
Expressions (RCTE) and I knew that this was Figure 1: The number of employees that work directly Figure 2: The number of employees that work, directly
just the thing needed to tackle this sort of prob- for each organization unit or indirectly, for each organization unit
lem. Though, not having ever developed an
SQL statement that used a RCTE before, it AAA
took me a while to understand the syntax and |
+-------+-------+-------+---+---+-------+-------+-------+
find the right approach to solving this problem. | | | | | | | |
BBB CCC DDD EEE FFF GGG HHH III
PROBLEM | |
+-------+-------+ +---+---+
| | | | |
The problem was to report the total number CCC DDD EEE HHH III
of employees that work, directly and indirectly, | |
| |
for each unit within an organization hierarchy. | |
Figure 1 shows an organization hierarchy and EEE III
the number of employees that work directly for Figure 3: Transitively closed organization hierarchy
each organization unit (the number in brackets).
Figure 2 shows the same organization hier-
-- table definition -- table definition
archy, this time with the number of employees create table org_rel create table emp
that work, directly and indirectly, for each unit. ( fr_unit char(3) not null unique ( emp_id smallint not null unique
, to_unit char(3) ); , unit char(3) );
This is a strict hierarchy in that each unit can
have many children and at most one parent. -- table data -- table data
Organization unit ‘DDD’ has a total of three insert into org_rel insert into emp
values ( 'AAA', null ) values ( 1, 'AAA' )
employees that work directly and indirectly to it.
, ( 'BBB', 'AAA' ) , ( 2, 'BBB' )
Two employees work directly and one employee , ( 'CCC', 'BBB' ) , ( 3, 'BBB' )
works indirectly to unit ‘EEE’. Organization , ( 'DDD', 'BBB' ) , ( 4, 'CCC' )
, ( 'EEE', 'DDD' ) , ( 5, 'DDD' )
unit ‘BBB’ has a total of six employees working
, ( 'FFF', 'AAA' ) , ( 6, 'DDD' )
directly and indirectly to it. Two employees , ( 'GGG', 'AAA' ) , ( 7, 'EEE' )
work directly and four employees work indi- , ( 'HHH', 'GGG' ) , ( 8, 'FFF' )
, ( 'III', 'HHH' ); , ( 9, 'HHH' );
rectly to units ‘CCC’, ‘DDD’ and ‘EEE’.
Figure 4: The org_rel table definition and data Figure 5: The emp table definition and data
APPROACH

Before delving into the SQL, a discussion


on the approach used to solve the problem is

©2006 Technical Enterprises, Inc. Reproduction of this document without permission is prohibited. Technical Support | March 2006
required. In order to be able to easily aggre-
gate the number of direct and indirect employ- WITH name AS ( fullselect ) fullselect
ees, I needed to re-express the organization ,

hierarchy so that all the indirect relationships ( column-name )


between the organization units were explicitly Figure 6: Common Table Expression syntax
detailed. This “transitively closes” the organi-
zation hierarchy.
If you are very familiar with SQL you may WITH name ( column-name ) AS (initial-subquery UNION ALL recursive-subquery )
have heard of an optimization technique called Figure 7: Recursive Common Table Expression syntax
“predicate transitive closure.” This technique
allows the SQL optimizer to include addi- with tran_close_rel (fr_unit, to_unit, iteration) as
tional predicates to an SQL statement. If col- (
--* Initial subquery
umn a equals column b, and column b equals select fr_unit
, to_unit
column c, then the SQL optimizer can include , 0
a predicate where column a equals column c. from org_rel
union all
Similarly, if the organization unit ‘AAA’ has a --* Recursive subquery
relationship to ‘BBB’, and ‘BBB’ has a rela- select a.fr_unit
, b.to_unit
tionship to ‘CCC’, then ‘AAA’ having a rela- , case when a.iteration + 1 > 10 then
tionship to ‘CCC’ can be explicitly stated. raise_error('70001','Oops! tran_close_rel iteration error')
else
Figure 3 shows the organization hierarchy a.iteration + 1
after it has been transitively closed. The end
from tran_close_rel a
underlined organization units indicate the , org_rel b
where a.to_unit = b.fr_unit
relationships added to the transitively close )
the hierarchy. select row_number() over() row_num
, fr_unit
, coalesce(to_unit,'?') to_unit
ADJACENCY LIST HIERARCHY , iteration
from tran_close_rel

Typically, storing a hierarchy in relational Figure 8: SQL statement used to transitively close the org_rel adjacency list hierarchy
table is done as an “adjacency list.” This is
where the parent identifier is held with the tran_close_rel org_rel
+---------+---------+ +---------+---------+
Initial subquery | fr_unit | to_unit | | fr_unit | to_unit |
child identifier in the same row. An example (Iteration 0) +---------+---------+ +---------+---------+
| AAA | ? | | AAA | ? |
of an adjacency list is an employee table that | BBB | AAA | | BBB | AAA |
| FFF | AAA | | CCC | BBB |
includes the manager identifier the employee | GGG | AAA | | DDD | BBB |
| CCC | BBB | | EEE | DDD |
identifier reports to. | DDD | BBB | | FFF | AAA |
| EEE | DDD | | GGG | AAA |
The organization hierarchy is held as an | HHH | GGG | | HHH | GGG |
| III | HHH | | III | HHH |
adjacency list in the org_rel table. Figure 4 +---------+---------+ +---------+---------+
shows the table definition for the org_rel table tran_close_rel org_rel
+---------+---------+ +---------+---------+
and insert statement to populate the table. The Recursive subquery | fr_unit | to_unit | | fr_unit | to_unit |
(Iteration 1) +---------+---------+ +---------+---------+
fr_unit column is the unique child identifier | BBB | ? | | AAA | ? |
| FFF | ? | | BBB | AAA |
and the to_unit column is the parent identifier. | GGG | ? | | CCC | BBB |
| CCC | AAA | | DDD | BBB |
The employees and the organization units | DDD | AAA | | EEE | DDD |
| EEE | BBB | | FFF | AAA |
they work for are held in the emp table. Figure | HHH | AAA | | GGG | AAA |
| III | GGG | | HHH | GGG |
5 shows the table definition for the emp table +---------+---------+ | III | HHH |
+---------+---------+
and the insert statement to populate the table. tran_close_rel org_rel
The emp_id column is the unique employee +---------+---------+
Recursive subquery | fr_unit | to_unit |
+---------+---------+
| fr_unit | to_unit |
identifier and the unit column is the organiza- (Iteration 2) +---------+---------+
| CCC | ? |
+---------+---------+
| AAA | ? |
tion unit the employee works in. |
|
DDD
EEE
|
|
?
AAA
|
|
|
|
BBB
CCC
|
|
AAA
BBB
|
|
| HHH | ? | | DDD | BBB |
| III | AAA | | EEE | DDD |
+---------+---------+ | FFF | AAA |
COMMON TABLE EXPRESSIONS | GGG | AAA |
| HHH | GGG |
| III | HHH |
+---------+---------+
An understanding of Common Table
tran_close_rel
Expressions (CTEs) is needed at this point in +---------+---------+
Recursive subquery | fr_unit | to_unit |
order to understand the SQL that will be pre- (Iteration 3) +---------+---------+
| EEE | ? |
sented later. This is because Recursive CTEs | III | ? |
+---------+---------+
are a special type of CTE.
A CTE is a named result table that exists
during the duration of the SQL statement.
This is similar to a Nested Table Expression, Figure 9: Results for each subquery of the Recursive Common Table Expression

Technical Support | March 2006 ©2006 Technical Enterprises, Inc. Reproduction of this document without permission is prohibited.
+---------+---------+---------+-----------+ with tran_close_rel (fr_unit, to_unit, iteration) as
| row_num | fr_unit | to_unit | iteration | (
+---------+---------+---------+-----------+ --* Initial subquery
| 1 | AAA | ? | 0 | select fr_unit
| 2 | BBB | AAA | 0 | , to_unit
| 3 | CCC | BBB | 0 | , 0
| 4 | DDD | BBB | 0 | from org_rel
| 5 | EEE | DDD | 0 | union all
| 6 | FFF | AAA | 0 | --* Recursive subquery
| 7 | GGG | AAA | 0 | select a.fr_unit
| 8 | HHH | GGG | 0 | , b.to_unit
| 9 | III | HHH | 0 | , case when a.iteration + 1 > 10 then
| 10 | BBB | ? | 1 | raise_error('70001','Oops! tran_close_rel iteration error')
| 11 | CCC | AAA | 1 | else
| 12 | DDD | AAA | 1 | a.iteration + 1
| 13 | EEE | BBB | 1 | end
| 14 | FFF | ? | 1 | from tran_close_rel a
| 15 | GGG | ? | 1 | , org_rel b
| 16 | HHH | AAA | 1 | where a.to_unit = b.fr_unit
| 17 | III | GGG | 1 | )
| 18 | CCC | ? | 2 | select t.unit
| 19 | DDD | ? | 2 | , sum(t.occurs) count
| 20 | EEE | AAA | 2 | from (
| 21 | HHH | ? | 2 | select coalesce(a.to_unit, a.fr_unit) unit
| 22 | III | AAA | 2 | , case when b.unit is not null then 1 else 0 end occurs
| 23 | EEE | ? | 3 | from tran_close_rel a
| 24 | III | ? | 3 | left outer join emp b
+---------+---------+---------+-----------+ on b.unit = a.fr_unit
) t
Figure 10: Result of the SQL statement group by t.unit
order by t.unit

except that a CTE can be referenced any- Figure 11: Completed SQL statement
where within the SQL statement. Another
way to think of a CTE is as a temporary view, An RCTE differs from a CTE in that a col- +---------+-------+
allowing access to a result set multiple times umn name list is mandatory, the fullselect is | fr_unit | count |
+---------+-------+
in the one SQL statement. Indeed, a CTE can comprised of the UNION ALL set operator,
| AAA | 9 |
replace the need for creating a view for some and the recursive subquery references the CTE | BBB | 6 |
queries. Though, unlike a view, a CTE can in which it is defined. | CCC | 1 |
| DDD | 3 |
reference a host variable within its definition. An RCTE works as follows. The initial
| EEE | 1 |
(This feature of CTEs will be demonstrated in subquery is evaluated once. The recursive | FFF | 1 |
part 2 of this article.) subquery is then evaluated for each row | GGG | 1 |
| HHH | 1 |
Figure 6 shows the syntax for defining returned by the initial subquery. The recur-
| III | 0 |
CTEs. A CTE is defined using the WITH sive subquery is then re-evaluated for each +---------+-------+
clause. A name is required to be given to a row returned by the previous evaluation of
Figure 12: Result of the completed SQL statement
CTE and an optional column name list can the recursive subquery. The RCTE com-
be provided. This provides a name for the pletes when an empty result set is returned
result set of the full-select defined which can for all the rows of the previous recursive more easily answered. To transitively close the
then be referenced elsewhere in the SQL subquery. The combined result sets of all org_rel table adjacency list hierarchy using an
statement. More than one CTE can be queries can then be accessed by the remain- RCTE required careful consideration for the
defined, which may reference previously der of the query. A more detailed explana- join predicate and select columns of the recur-
defined CTEs or itself, within the same tion on how RCTEs work is provided in the sive subquery. Figure 8 shows the RCTE to
WITH clause. next section. transitively close the organization hierarchy.
CTEs are useful in situations where the There are several restrictions on the The initial subquery is evaluated first,
result set for a given query needs to be recursive subquery, such as column func- retrieving all rows from the org_rel table.
accessed multiple times within the same SQL tions, group-by-clauses, or having-clauses The result set for this query is added to the
statement or for combing multiple SQL Data are not allowed. Please refer to the SQL tran_close_rel. The recursive subquery is
Change statements. Reference Manual for a complete discussion then evaluated, referencing the org_rel table
on the on restrictions. and the newly added rows to tran_close_rel.
RECURSIVE COMMON TABLE The result set of this query is added to
EXPRESSION USING AN RCTE TO TRANSITIVE tran_close_rel. The recursive subquery is
CLOSE AN ADJACENCY LIST then re-evaluated while there are newly
An CTE that references itself is a Recursive HIERARCHY added rows to be processed from
CTE (RCTE). Figure 7 shows the syntax for tran_close_rel. It is important to notice here
RCTEs. When the organization hierarchy is transi- that the recursive subquery only “sees” the
tively closed, it allows certain questions to be rows added to the tran_close_rel by the pre-

©2006 Technical Enterprises, Inc. Reproduction of this document without permission is prohibited. Technical Support | March 2006
vious iteration of the recursive subquery. straight forward to develop. Consideration of
When no more new rows are available to be the approach used to solve the problem and
processed the RCTE is complete. close attention to the recursive join predicate
As with any iterative process, it is possible and returned columns of the recursive subquery
to enter into an infinite loop using an RCTE. is needed. It is very easy to code a Recursive
This can be either due to incorrectly coding Common Table Expressions that enters into an
the RCTE or cyclical data within the table. It infinite loop. It is up to you to add code to guard
is always good practice to include code to ter- against run-away SQL statements.
minate “run-away” recursive statements.
The RCTE in Figure 8 uses a counter to ter-
minate the statement, via the raise_error func- NaSPA member Rob Mala has worked as an Application
tion, if more than 10 iterations of the recursive Development and Technical Specialist, and has been a DB2 DBA
subquery are evaluated. While the value 10 is for z/OS and LUW since 1995, working for various companies in
arbitrary and has been hard coded into the Australia, UK and Europe. Robert can be contacted on
statement, it could be a host variable. [email protected].
The solid arrows show the first matching row
for the recursive subquery predicate
tran_close_rel.to_unit = org_rel.fr_unit.
The broken arrows show the recursive subquery
returned values, tran_close_rel.fr_unit and
org_rel.to_unit. These values are used by the
next iteration of the recursive subquery predicate
tran_close_rel.to_unit = org_rel.fr_unit.
The result sets of the initial subquery and
recursive subqueries are combined to form a
final result set for the CTE (Figure 10).
Now that the organization hierarchy is tran-
sitively closed, all that is needed is to left outer
join to the emp table and sum the result by
organization unit (Figure 11).
A left outer join is required as all organiza-
tion units are required even if they have no
employees assigned. Coalescing the to_unit
and fr_unit removes null values from the
result set. The CASE statement sets the
occurs to 1 if the employee organization unit
is not null otherwise it is set to 0. The unit is
then used to sum the occurrences. Figure 12
shows the result of running the completed
SQL statement.

CONCLUSION

This concludes part 1 of this article. As you


can see, Common Table Expressions (CTEs)
are very powerful, enabling new classes of
problems to be solved using SQL without
additional application logic.
Recursive Common Table Expressions
(RCTEs) are a special type of CTE. CTEs,
when combined with other SQL language ele-
ments such as views, insert with sub-select,
user defined functions, stored procedures or
combining multiple SQL Data Change state-
ments enable the SQL programmer or DBA
considerable expressive power.
With this power comes syntax complexity
and scope for getting into trouble. RCTEs aren’t

Technical Support | March 2006 ©2006 Technical Enterprises, Inc. Reproduction of this document without permission is prohibited.

You might also like