0% found this document useful (0 votes)
2 views3 pages

Distributed Database Query Processing Notes

Unit 2 covers distributed database query processing, focusing on integrating multiple databases and addressing challenges like data heterogeneity and security. It outlines the objectives of query processing, including minimizing costs and maximizing resource utilization, along with the steps involved such as query decomposition and optimization. Key issues include managing heterogeneous schemas and ensuring consistent query execution across distributed systems.

Uploaded by

yashupadhyay8868
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Distributed Database Query Processing Notes

Unit 2 covers distributed database query processing, focusing on integrating multiple databases and addressing challenges like data heterogeneity and security. It outlines the objectives of query processing, including minimizing costs and maximizing resource utilization, along with the steps involved such as query decomposition and optimization. Key issues include managing heterogeneous schemas and ensuring consistent query execution across distributed systems.

Uploaded by

yashupadhyay8868
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Unit 2: Distributed Database Query Processing

Lecture Hours: 15
---
### **Database Integration (Bottom-up Design Approach)**
- Combines multiple databases into a unified system.
- Bottom-up: Start by integrating existing databases.
- Challenges: Data heterogeneity, schema mismatch, naming conflicts.

Schema Matching:
- Aligning attributes from different databases with the same meaning.
- Example: "Emp_ID" = "Employee_No".

Schema & Access Control Integration:


- Combines schema structures and user access rules.
- Ensures uniform security and consistency.

Schema Mapping:
- Defines relationships between source and global schema.
- Helps translate queries from global schema to local schemas.

Data Cleaning:
- Detects and corrects data errors/inconsistencies.
- Techniques: Deduplication, normalization, validation.

View Management:
- Views provide virtual integration.
- Simplifies access and hides complexity from users.

Data Security:
- Protects against unauthorized access or alteration.
- Techniques: Encryption, authentication, authorization.

Semantic Integrity Control:


- Ensures data meaning consistency.
- Example: Referential integrity, domain constraints.

---
### **Query Processing Problem**
- Processing distributed queries efficiently across multiple sites.

Objectives of Query Processing:


1. Minimize data transfer cost.
2. Minimize response time.
3. Maximize resource utilization.
4. Ensure correct and consistent results.

Relational Algebra Operations:


- Basic operations: SELECT, PROJECT, JOIN, UNION, INTERSECT, DIFFERENCE.

Characterization of Query Processors:


- **Centralized:** Single-site processing.
- **Distributed:** Multi-site, requires coordination.

Layers of Query Processing:


1. **Query Parsing:** Syntax & semantic analysis.
2. **Query Decomposition:** Breaks query into sub-queries.
3. **Data Localization:** Maps sub-queries to data sites.
4. **Optimization:** Chooses best execution plan.
5. **Execution:** Executes optimized query plan.

---
### **Query Decomposition**
- Breaks global query into smaller sub-queries.
- Simplifies execution and optimization.

Localization of Distributed Data:


- Identifies where data is located.
- Translates global queries into local ones.

Query and Data Localization Optimization:


- Minimizes communication cost.
- Chooses efficient data access paths.
Join Ordering in Distributed Queries:
- Determines best sequence of joins.
- Affects performance significantly.

Distributed Query Optimization:


- Objective: Find least-cost plan.
- Techniques: Heuristics, cost-based optimization.

Issues in Multi-Database Query Processing:


- Heterogeneous schemas and query languages.
- Data replication and consistency.
- Security and autonomy of local databases.

Query Rewriting Using Views:


- Rewrites a query using materialized or virtual views.
- Improves performance and reusability.

Query Optimization and Execution:


- Optimization: Selecting best strategy.
- Execution: Carrying out optimized plan efficiently.

---

Summary:
- Distributed query processing aims for efficient and consistent access across multiple sites.
- Involves decomposition, localization, optimization, and execution.
- Key challenges include data integration, security, and minimizing communication cost.

You might also like