Algorithm of Data Integration
Example of Data Integration
Given Relations: A(a, b, u), B(a, b, c, v), C(a, c, d, w), D(c, d, e, x), E(c, d, y), F(c, e, z), G(a, b, c, d,
e, m), H(x, y, z);
Target attribute list {a, b, c, d, e}.
Notations:
Horizontal Integration on a set of relations HI {. , . , .}
Vertical Integration on set of relations and attributes VI {. , . , . }attributes
o Example HI { { A } , VI { B , C }( a ,b ) } represents horizontal integration (set union) of relation A with the
relation formed by vertical integration (join) of relation B with C on their common attributes set (a, b)
Relation Attribute Matrix Order 1 ( RAM 1 ) is a matrix of relations vis-à-vis target attributes where each row
represents one relation and each column represents one attribute from the set of target attributes. Each cell of the
matrix is filled with a 1 if the attribute is present in the relation, 0 if absent.
Relation Attribute Matrix Order k ( RAM k ), k ≥2, is similar to RAM 1 except that here each relation consists of k
R1 of order (k-1) with another relation R2 of order 1. Each cell is
individual relations formed by joining a relation
formed by ORing the corresponding cell values in RAM k −1 and RAM 1 . This indicates whether an attribute is
present in the joined relation. In addition, it has a column which shows the set of common attributes (CA), if exists, for
joining R1 with R2. CA is counted if the corresponding cell value is 1 in both RAM k −1 and RAM 1. It has another
column which indicates whether the attribute set in the joined relation of R1 with R2 has at least one more attribute
than both R1 and R2 or not (Super Set –Yes or No) and finally a Status column which indicates whether the joined
relation of R1 R 2 is Full (Super Set = Yes, has all the attributes of the target list with at least one common attribute),
Part (Super Set = Yes, has some but not all the attributes of the target list with at least one common attribute), None
(neither Part nor Full)
Candidate Set CS k is the set of possible candidate relations of order k formed by joining a relation R1 of order (k-1)
with another relation R2 of order 1.
1. Create C S1 = { { A } , { B } , { C } , { D } , { E } , { F } , {G } , {H }}. Create Relation Attribute Matrix Order 1 ( RAM 1).
Attributes
Relation
A b c d e
A 1 1 0 0 0
B 1 1 1 0 0
C 1 0 1 1 0
D 0 0 1 1 1
E 0 0 1 1 0
F 0 0 1 0 1
G 1 1 1 1 1
H 0 0 0 0 0
Set HI =∅ , VI =∅ , C S1 =∅ . Examine RAM1 row by row. If all the columns of a row are 1 then add the relation
to HI. If some but not all columns are 1 then add the relation to C S1 . Ignore the relation (row) if all columns contain
0. Thus, in this example, HI ={{G }} and C S1 ={ { A } , { B } , { C } , { D } , { E } , { F } }
2. Create CS 2 by joining each element of CS 1 with a different element of CS 1 so that there is no repetition. Thus, in this
example,
CS2= { { A , B } , { A ,C } , { A , D } , { A , E } , { A , F } , { B , C } , { B , D } , { B , E } , { B , F } , {C , D } , { C , E } , { C , F } , { D , E } , { D , F
. Create RAM2 using CS 2 and RAM 1
Relation a B C d e CA SS Status
{A, B} 1 1 1 0 0 a, b No None
{A, C} 1 1 1 1 0 a Yes Part
{A, D} 1 1 1 1 1 - Yes None
{A, E} 1 1 1 1 0 - Yes None
{A, F} 1 1 1 0 1 - Yes None
{B, C} 1 1 1 1 0 a, c Yes Part
{B, D} 1 1 1 1 1 c Yes Full
{B, E} 1 1 1 1 0 c Yes None
{B, F} 1 1 1 0 1 c Yes None
{C, D} 1 0 1 1 1 c, d Yes Part
{C, E} 1 0 1 1 0 c, d No None
{C, F} 1 0 1 1 1 c Yes Part
{D, E} 0 0 1 1 1 c, d No None
{D, F} 0 0 1 1 1 c, e No None
{E, F} 0 0 1 1 1 c Yes Part
Examine RAM2 row by row. If any row has a status “Full” then add it to HI and remove from CS 2 . If any row has a
“Part” status, add it to VI with the attribute(s) for joining. If any row has a status “None” remove it from CS 2 .
Thus, in this example, HI ={{ G } , VI { B , D }c }. VI ={ { A , C }a , { B , C }(a ,c) , { C , D }(c ,d ) , { C , F }c , { E , F }c }
.
CS2= { { A ,C } , { B , C } , { C , D } , { C , F } , { E , F } }
3. Create CS3 by joining each element of CS2 with an element of CS1.
{
{ { A , C } , B } , { { A ,C } , D } , { { A , C } , E } , { { A , C } , F } , { { B , C } , A } , { { B ,C } , D } , { { B , C } , E } , { { B ,C
C S3 = { { C , D } , A } , { {C , D } , B } , { { C , D } , E } , { {C , D } , F } , { {C , F } , A } , { { C , F } , B } , { {C , F } , D } , { {C , F } , E } , { { E , F
{ { E , F } , C } , {{ E , F } , D }
Create RAM3 using CS3 and RAM 1 , RAM 2. While creating RAM3 we allow repetition if the earlier combination results in
None status. Otherwise we do not try any repetition. For example, since { { A , C } , B } resulted in None, we tried
{ { B ,C } , A }. But as { { A , C } , D } resulted in Full status, there is no reason to try { {C , D } , A }
Relation a b c d e CA SS Status
{ { A , C }, B} 1 1 1 1 0 a, b, c No None
{ { A , C } , D }1 1 1 1 1 c, d Yes Full
{ { A , C }, E } 1 1 1 1 0 c, d No None
{ { A , C }, F } 1 1 1 1 1 c Yes Full
{ { B ,C } , A } 1 1 1 1 0 a, b No None
{ { B ,C } , D } 1 1 1 1 1 c, d Yes Full
{ { B ,C } , E } 1 1 1 1 0 c, d No None
{ { B ,C } , F } 1 1 1 1 1 c Yes Full
{ {C , D } , A } 1 1 1 1 1 a Yes Full (Same as ACD)
{ {C , D } , B } 1 1 1 1 1 a, c Yes Full (Same as BCD)
{ {C , D } , E } 1 0 1 1 1 c, d No None
{ {C , D } , F } 1 0 1 1 1
c, e
No None
{ {C , F } , A } 1 1 1 1 1 a Yes Full (Same as ACF)
{ {C , F } , B } 1 1 1 1 1 a, c Yes Full (Same as BCF)
{ {C , F } , D } 1 0 1 1 1
c, d, e
No None
{ {C , F } , E } 1 0 1 1 1 c, d No None
{ {E , F }, A }1 1 1 1 1 - Yes None
{ {E , F }, B} 1 1 1 1 1 c Yes Full
{ {E , F }, C } 1 0 1 1 1 c, d Yes Part (CFE is None)
{ {E , F }, D }0 0 1 1 1 c, d, e No None
Examine RAM3 row by row. If any row has a status “Full” then add it to HI and remove from CS3. Initialize VI to phi. If any
row has a “Part” status, add it to VI after removing the corresponding relation from VI. If any row has a status “None”
remove it from CS 3 . Thus HI becomes,
HI { {G } , VI { B , D }c ,VI {VI { A , C }a , D } (c ,d ) ,VI { VI { A , C }a , F }c , VI {VI { B , C }(a ,c ) , D }(c ,d ) , VI {VI { B , C }(a , c ) , F }c ,VI { { E
VI ={ { { E , F }c ,C }( c, d ) }
CS3={{ E , F } , C }
C S 4= {{ { E , F } , C } , A } , { { { E , F } ,C } , B } , {{ { E , F } ,C } , D }, Create RAM4 from CS4, RAM3 and RAM1
Relation a b c d e CA SS Status
{ { { E , F } ,C } , A1} 1 1 1 1 a Yes Full
{ { { E , F } ,C } , B1} 1 1 1 1 a, c Yes Full
{ { { E , F } ,C } , D1} 0 1 1 1 c, d, e No None
Now first two rules are added to HI and CS4 becomes a null set. Hence the algorithm stops here.
{
Thus final rules of all possible integrations are
{G }
VI { B , D }c
VI {VI { A ,C }a , D }(c , d)
VI { VI { A ,C }a , F }c
HI VI {VI { B , C }( a ,c ) , D }( c, d )
VI {VI { B , C }( a ,c ) , F }c
VI { { E , F }c , B }c }
{
VI {VI { VI { E , F }c , C }(c , d )} , A }
a
{
VI {VI { VI { E , F }c , C }(c ,d ) } , B (a , c) }
Algorithm to find Integration Rules for Query Processing
Integration_Rule
Input: 1) R: Set of relations with their attributes list, 2) Target_List: List of target attributes, 3) p:
Order of integration desired (optional)
Output: Integration Rules consisting of Horizontal and Vertical Integration
HI =∅ . Create candidate set of order 1 C S1 =∅ .
Step 1: Initialize the Horizontal Integration set
Create Relation Attribute Matrix of Order 1 (RAM 1 ) of size m x n where m is the number of relations
and n is the number of attributes in the target list. Fill a cell (i, j) with 1 if i th relation contains
attribute j, 0 otherwise.
for each row in RAM1
if all the columns are 1 (the relation has all the attributes of the target list) then add the
relation to HI
else
if some columns are 1 (the relation has some but not all the attributes of the target list) then
add the relation to C S1
next
Step2: Iterate to find Horizontal and Vertical Integration
Set k = 2;
Loop
Create C S k
For k = 2 to n – 1
Create C S k −List by taking cross product of each element of x ∈C S k−1−List with y ∈C S1−List where y
does not belong to x. If the xy so formed already belongs to C S k−1 −List then ignore it. If not add xy to
C S k−1 −List .
Create all possible pairs of relations from Relation Matrix by ORing the corresponding numbers. If both numbers are 1 then
count it is as a common attribute (CA). The attribute list so created for a pair is now checked whether it adds any extra
attribute by integrating the relations. For example in the pair table AC extra attributes compared to both A and C, thus it is
said to cover both relations. If newly formed relation has at least one CA and covers both the relations it is accepted for the
next step, if not it is no more considered.
Create
In the example C S2 −List={{ AB},{ AC ,{ AD
We create the following vertical integration list VI-List = {{A,C}, {B, C}, {C, D}, {C, F}, {E, F}}
EFC
A 1 1 1 1 1 1 Both Yes
EFC
B 1 1 1 1 1 1 Both Yes
HI ¿
VI (VI (B , C), D),VI ( B , C) , F ¿ ,
VI (VI (VI (E , F),C ), A),VI (VI (VI ( E , F ), C), B)¿