Data Localization:
Localization means assigning each part of a decomposed query to the node where the
relevant data already exists, so:
Local operations are preferred.
Data transfer is minimized.
Query execution is faster and more efficient.
Parallel processing across nodes becomes possible.
Example:
Student Table
StudentID Name CourseID
1 Ali 101
2 Sara 102
3 Ahmed 103
Course Table
CourseID CourseName TeacherID
101 Math 201
102 Science 202
103 History 203
Teacher Table
TeacherID TeacherName
201 Mr. A
202 Ms. B
203 Mr. C
SQL Query
SELECT S.Name, C.CourseName, T.TeacherName
FROM Student S
JOIN Course C ON S.CourseID = C.CourseID
JOIN Teacher T ON C.TeacherID = T.TeacherID
WHERE C.CourseName = 'Math';
(RA)Translated Query
Student ⨝ (Student.CourseID = Course.CourseID) Course
⨝ (Course.TeacherID = Teacher.TeacherID) Teacher
σ (CourseName = 'Math')
π (Name, CourseName, TeacherName)
Decomposed Query
1. σ (CourseName = 'Math') on Course
2. Student ⨝ Student.CourseID = Course.CourseID (Filtered Course)
3. Result ⨝ Course.TeacherID = Teacher.TeacherID Teacher
4. π (Name, CourseName, TeacherName)
Assume that the data is distributed across three different nodes:
Node Tables Stored
Node A Student
Node B Course
Node C Teacher
Step 1:
σ (CourseName = 'Math') on Course
This selection operation involves only the Course table.
Since the Course table is stored on Node B,
This filter (selection) is applied locally on Node B.
This is an example of local processing (localization) — the operation is
performed on the node that stores the relevant data.
Step 2:
Student ⨝ Student.CourseID = Course.CourseID (Filtered Course)
Now we need to join the Student table with the filtered Course records.
Student data is stored on Node A, while the filtered Course data is on Node B.
There are two options:
Option 1: Send the filtered (smaller) Course data from Node B to Node A
and perform the join at Node A.
Option 2: Send Student data from Node A to Node B and perform the join at
Node B.
Optimal localization would be Option 1, because filtered Course data is likely
smaller, reducing data transfer.
Step 3:
Result ⨝ Course.TeacherID = Teacher.TeacherID Teacher
Now the result from Step 2 is on Node A.
Teacher data is stored on Node C.
Again there are two options:
Send result data to Node C and perform join there.
Or, send the (likely smaller) Teacher table from Node C to Node A.
To minimize transfer, we would send the Teacher table to Node A and perform
the join there.
This again follows the localization principle — perform operations where data is
already present or bring only the minimal required data.
Step 4:
π (Name, CourseName, TeacherName)
Now the final joined data is at Node A,
So we perform the projection (selecting required columns) locally at Node A.