Unit 2: Distributed Database Query Processing
Lecture Hours: 15
---
### **Database Integration (Bottom-up Design Approach)**
- Combines multiple databases into a unified system.
- Bottom-up: Start by integrating existing databases.
- Challenges: Data heterogeneity, schema mismatch, naming conflicts.
Schema Matching:
- Aligning attributes from different databases with the same meaning.
- Example: "Emp_ID" = "Employee_No".
Schema & Access Control Integration:
- Combines schema structures and user access rules.
- Ensures uniform security and consistency.
Schema Mapping:
- Defines relationships between source and global schema.
- Helps translate queries from global schema to local schemas.
Data Cleaning:
- Detects and corrects data errors/inconsistencies.
- Techniques: Deduplication, normalization, validation.
View Management:
- Views provide virtual integration.
- Simplifies access and hides complexity from users.
Data Security:
- Protects against unauthorized access or alteration.
- Techniques: Encryption, authentication, authorization.
Semantic Integrity Control:
- Ensures data meaning consistency.
- Example: Referential integrity, domain constraints.
---
### **Query Processing Problem**
- Processing distributed queries efficiently across multiple sites.
Objectives of Query Processing:
1. Minimize data transfer cost.
2. Minimize response time.
3. Maximize resource utilization.
4. Ensure correct and consistent results.
Relational Algebra Operations:
- Basic operations: SELECT, PROJECT, JOIN, UNION, INTERSECT, DIFFERENCE.
Characterization of Query Processors:
- **Centralized:** Single-site processing.
- **Distributed:** Multi-site, requires coordination.
Layers of Query Processing:
1. **Query Parsing:** Syntax & semantic analysis.
2. **Query Decomposition:** Breaks query into sub-queries.
3. **Data Localization:** Maps sub-queries to data sites.
4. **Optimization:** Chooses best execution plan.
5. **Execution:** Executes optimized query plan.
---
### **Query Decomposition**
- Breaks global query into smaller sub-queries.
- Simplifies execution and optimization.
Localization of Distributed Data:
- Identifies where data is located.
- Translates global queries into local ones.
Query and Data Localization Optimization:
- Minimizes communication cost.
- Chooses efficient data access paths.
Join Ordering in Distributed Queries:
- Determines best sequence of joins.
- Affects performance significantly.
Distributed Query Optimization:
- Objective: Find least-cost plan.
- Techniques: Heuristics, cost-based optimization.
Issues in Multi-Database Query Processing:
- Heterogeneous schemas and query languages.
- Data replication and consistency.
- Security and autonomy of local databases.
Query Rewriting Using Views:
- Rewrites a query using materialized or virtual views.
- Improves performance and reusability.
Query Optimization and Execution:
- Optimization: Selecting best strategy.
- Execution: Carrying out optimized plan efficiently.
---
Summary:
- Distributed query processing aims for efficient and consistent access across multiple sites.
- Involves decomposition, localization, optimization, and execution.
- Key challenges include data integration, security, and minimizing communication cost.