0% found this document useful (0 votes)
9 views5 pages

Algorithm of Data Integration

The document outlines an algorithm for data integration using horizontal and vertical integration techniques across multiple relations. It describes the creation of Relation Attribute Matrices (RAM) of different orders to evaluate the presence of target attributes in various combinations of relations, ultimately identifying candidate sets for integration. The algorithm iteratively refines these sets to derive integration rules for query processing, culminating in a final set of horizontal and vertical integrations.

Uploaded by

Subrata Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Algorithm of Data Integration

The document outlines an algorithm for data integration using horizontal and vertical integration techniques across multiple relations. It describes the creation of Relation Attribute Matrices (RAM) of different orders to evaluate the presence of target attributes in various combinations of relations, ultimately identifying candidate sets for integration. The algorithm iteratively refines these sets to derive integration rules for query processing, culminating in a final set of horizontal and vertical integrations.

Uploaded by

Subrata Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Algorithm of Data Integration

Example of Data Integration

Given Relations: A(a, b, u), B(a, b, c, v), C(a, c, d, w), D(c, d, e, x), E(c, d, y), F(c, e, z), G(a, b, c, d,
e, m), H(x, y, z);
Target attribute list {a, b, c, d, e}.

Notations:
 Horizontal Integration on a set of relations HI {. , . , .}

 Vertical Integration on set of relations and attributes VI {. , . , . }attributes


o Example HI { { A } , VI { B , C }( a ,b ) } represents horizontal integration (set union) of relation A with the
relation formed by vertical integration (join) of relation B with C on their common attributes set (a, b)

 Relation Attribute Matrix Order 1 ( RAM 1 ) is a matrix of relations vis-à-vis target attributes where each row
represents one relation and each column represents one attribute from the set of target attributes. Each cell of the
matrix is filled with a 1 if the attribute is present in the relation, 0 if absent.

 Relation Attribute Matrix Order k ( RAM k ), k ≥2, is similar to RAM 1 except that here each relation consists of k
individual relations formed by joining a relation R1 of order (k-1) with another relation R2 of order 1. Each cell is
formed by ORing the corresponding cell values in RAM k −1 and RAM 1 . This indicates whether an attribute is
present in the joined relation. In addition, it has a column which shows the set of common attributes (CA), if exists, for
joining R1 with R2. CA is counted if the corresponding cell value is 1 in both RAM k −1 and RAM 1. It has another
column which indicates whether the attribute set in the joined relation of R1 with R2 has at least one more attribute
than both R1 and R2 or not (Super Set –Yes or No) and finally a Status column which indicates whether the joined
relation of R1 R 2 is Full (Super Set = Yes, has all the attributes of the target list with at least one common attribute),
Part (Super Set = Yes, has some but not all the attributes of the target list with at least one common attribute), None
(neither Part nor Full)

 Candidate Set CS k is the set of possible candidate relations of order k formed by joining a relation R1 of order (k-1)
with another relation R2 of order 1.

1. Create C S1 = { { A } , { B } , { C } , { D } , { E } , { F } , {G } , {H }}. Create Relation Attribute Matrix Order 1 ( RAM 1).


Attributes
Relation
A b c d e
A 1 1 0 0 0
B 1 1 1 0 0
C 1 0 1 1 0
D 0 0 1 1 1
E 0 0 1 1 0
F 0 0 1 0 1
G 1 1 1 1 1
H 0 0 0 0 0

Set HI =∅ , VI =∅ , C S1 =∅ . Examine RAM1 row by row. If all the columns of a row are 1 then add the relation
to HI. If some but not all columns are 1 then add the relation to C S1 . Ignore the relation (row) if all columns contain
0. Thus, in this example, HI ={{G }} and C S1 ={ { A } , { B } , { C } , { D } , { E } , { F } }
2. Create CS 2 by joining each element of CS 1 with a different element of CS 1 so that there is no repetition. Thus, in this
example,
CS2= { { A , B } , { A ,C } , { A , D } , { A , E } , { A , F } , { B , C } , { B , D } , { B , E } , { B , F } , {C , D } , { C , E } , { C , F } , { D , E } , { D , F
. Create RAM2 using CS 2 and RAM 1

Relation a B C d e CA SS Status
{A, B} 1 1 1 0 0 a, b No None
{A, C} 1 1 1 1 0 a Yes Part
{A, D} 1 1 1 1 1 - Yes None
{A, E} 1 1 1 1 0 - Yes None
{A, F} 1 1 1 0 1 - Yes None
{B, C} 1 1 1 1 0 a, c Yes Part
{B, D} 1 1 1 1 1 c Yes Full
{B, E} 1 1 1 1 0 c Yes None
{B, F} 1 1 1 0 1 c Yes None
{C, D} 1 0 1 1 1 c, d Yes Part
{C, E} 1 0 1 1 0 c, d No None
{C, F} 1 0 1 1 1 c Yes Part
{D, E} 0 0 1 1 1 c, d No None
{D, F} 0 0 1 1 1 c, e No None
{E, F} 0 0 1 1 1 c Yes Part

Examine RAM2 row by row. If any row has a status “Full” then add it to HI and remove from CS 2 . If any row has a
“Part” status, add it to VI with the attribute(s) for joining. If any row has a status “None” remove it from CS 2 .

Thus, in this example, HI ={{ G } , VI { B , D }c }. VI ={ { A , C }a , { B , C }(a ,c) , { C , D }(c ,d ) , { C , F }c , { E , F }c }


.
CS2= { { A ,C } , { B , C } , { C , D } , { C , F } , { E , F } }
3. Create CS3 by joining each element of CS2 with an element of CS1.

{
{ { A , C } , B } , { { A ,C } , D } , { { A , C } , E } , { { A , C } , F } , { { B , C } , A } , { { B ,C } , D } , { { B , C } , E } , { { B ,C
C S3 = { { C , D } , A } , { {C , D } , B } , { { C , D } , E } , { {C , D } , F } , { {C , F } , A } , { { C , F } , B } , { {C , F } , D } , { {C , F } , E } , { { E , F
{ { E , F } , C } , {{ E , F } , D }
Create RAM3 using CS3 and RAM 1 , RAM 2. While creating RAM3 we allow repetition if the earlier combination results in
None status. Otherwise we do not try any repetition. For example, since { { A , C } , B } resulted in None, we tried
{ { B ,C } , A }. But as { { A , C } , D } resulted in Full status, there is no reason to try { {C , D } , A }
Relation a b c d e CA SS Status
{ { A , C }, B} 1 1 1 1 0 a, b, c No None
{ { A , C } , D }1 1 1 1 1 c, d Yes Full
{ { A , C }, E } 1 1 1 1 0 c, d No None
{ { A , C }, F } 1 1 1 1 1 c Yes Full
{ { B ,C } , A } 1 1 1 1 0 a, b No None
{ { B ,C } , D } 1 1 1 1 1 c, d Yes Full
{ { B ,C } , E } 1 1 1 1 0 c, d No None
{ { B ,C } , F } 1 1 1 1 1 c Yes Full
{ {C , D } , A } 1 1 1 1 1 a Yes Full (Same as ACD)
{ {C , D } , B } 1 1 1 1 1 a, c Yes Full (Same as BCD)
{ {C , D } , E } 1 0 1 1 1 c, d No None
{ {C , D } , F } 1 0 1 1 1
c, e
No None
{ {C , F } , A } 1 1 1 1 1 a Yes Full (Same as ACF)
{ {C , F } , B } 1 1 1 1 1 a, c Yes Full (Same as BCF)
{ {C , F } , D } 1 0 1 1 1
c, d, e
No None
{ {C , F } , E } 1 0 1 1 1 c, d No None
{ {E , F }, A }1 1 1 1 1 - Yes None
{ {E , F }, B} 1 1 1 1 1 c Yes Full
{ {E , F }, C } 1 0 1 1 1 c, d Yes Part (CFE is None)
{ {E , F }, D }0 0 1 1 1 c, d, e No None

Examine RAM3 row by row. If any row has a status “Full” then add it to HI and remove from CS3. Initialize VI to phi. If any
row has a “Part” status, add it to VI after removing the corresponding relation from VI. If any row has a status “None”
remove it from CS 3 . Thus HI becomes,
HI { {G } , VI { B , D }c ,VI {VI { A , C }a , D } (c ,d ) ,VI { VI { A , C }a , F }c , VI {VI { B , C }(a ,c ) , D }(c ,d ) , VI {VI { B , C }(a , c ) , F }c ,VI { { E
VI ={ { { E , F }c ,C }( c, d ) }
CS3={{ E , F } , C }

C S 4= {{ { E , F } , C } , A } , { { { E , F } ,C } , B } , {{ { E , F } ,C } , D }, Create RAM4 from CS4, RAM3 and RAM1

Relation a b c d e CA SS Status
{ { { E , F } ,C } , A1} 1 1 1 1 a Yes Full
{ { { E , F } ,C } , B1} 1 1 1 1 a, c Yes Full
{ { { E , F } ,C } , D1} 0 1 1 1 c, d, e No None

Now first two rules are added to HI and CS4 becomes a null set. Hence the algorithm stops here.

Thus final rules of all possible integrations are


{
{G }
VI { B , D }c
VI {VI { A ,C }a , D }(c , d)
VI { VI { A ,C }a , F }c
HI VI {VI { B , C }( a ,c ) , D }( c, d )
VI {VI { B , C }( a ,c ) , F }c
VI { { E , F }c , B }c }
VI { {VI { VI { E , F }c , C }(c , d )} , A }a

{ }
VI {VI { VI { E , F }c , C }(c ,d ) } , B (a , c)

Algorithm to find Integration Rules for Query Processing

Input: 1) Number of relations (m); Relations Names [rel_name(1), rel_name(2), …, rel_name(m)]


2) Number of attributes in the target list (n); Attribute Names [attr(1), attr(2), …, attr(n)]
3) Maximum Order of integration desired (p) // optional
4) Relation Attribute Matrix of Order 1 (RAM1) of size m xn

Output: Integration Rules consisting of possible Horizontal and Vertical Integrations

Step 1: Initialization
1. Initialize the set of Horizontal Integration HI =∅ .
2. Create candidate set of order 1 C S1 =∅ . // Set of sets of size 1 i.e. singleton element
3. For each row r in RAM1
c1:= count of number of columns having 1
If c1 = n then
add the relation {rel_name(r)} to HI // the relation has all the attributes of the
target list – case of HI
ElseIf (c1 > 0) then
add the relation {rel_name(r)} to C S1 // the relation has some but not all the
attributes of the target list
EndIf
Next
4. If CS1 is empty or has a single relation then output HI and exit // no new relation to add

Step2: Iterate to find Horizontal and Vertical Integrations


1. Set k = 2;
2. Loop
a. Initialize the set of Vertical Integration
V I =∅ .
b. Create C S k by taking cross product of C S k−1 with C S1 such that no individual relation is
repeated in C S k.
// C S k is the Candidate set of order k means each element set in CSk has k number of
relations
c. Create RAMk of size ¿ C S k ∨¿ x (n+1)
d. For each row r in RAMk
If the relations in C S k (r ) has already appeared in any of the relations from
C S k (1)… C Sk (r−1) then
// Check the status field of that earlier computation, if it is Full or Part ignore this row, if
None then proceed
If RAMk (r, n+1) <> None then Exit For
attribute_list:=””; c1=0, c2 = 0, c3 =0
For each column c in (1 … n)
RAMk (r, c) := RAMk-1 (r, c) OR RAM1 (r, c) // Logical OR
If RAMk-1 (r, c) = 1 and RAM1 (r, c) = 1 then
attribute_list:= attribute_list + attr(i) // common attribute
If RAM1 (r, c) = 1 then
c1:= c1 + 1 // c1:= count of number of columns having 1 in RAM1
If RAMk-1 (r, c) = 1 then
c2:= c2 + 1 // c2:= count of number of columns having 1 in RAMk-1
If RAMk (r, c) = 1 then
c3:= c3 + 1 // c3:= count of number of columns having 1 in RAMk
Next
If attribute_list ≠ “” and (c3 > c1 and c3 > c2) then
// New relation has common attribute(s) i.e. can be joined and it generates a
super set
If (c3 = n) then
RAMk (r, n+1):= Full //Update status of the relation as Full
add the relation C S k (r ) to HI in the form VI{CSk-1(r), CS1(r)}attribute_list
// Case of Full set of attributes obtained by vertical integration of CS k-1(r) and CS1(r) on the
common attributes
Else
RAMk (r, n+1):= Part //Update status of the relation as Part
add the relation C S k (r ) to V I in the form VI{CSk-1(r), CS1(r)}attribute_list
Else
RAMk (r, n+1):= None //Update status of the relation as None

Next
e. Delete all the rows from CSk whose corresponding entry in RAMk is either Full or None
f. If k = p then report HI and exit else continue
End Loop

You might also like