Dd Mid Answers
Dd Mid Answers
1.Define distributed data processing. How does it differ from centralized data
processing?
Distributed data processing refers to the approach of handling and analyzing data
across multiple interconnected devices or nodes. In contrast to centralized data
processing, where all data operations occur on a single, powerful system,
distributed processing decentralizes these tasks across a network of computers.
This method leverages the collective computing power of interconnected devices,
enabling parallel processing and faster data analysis
7. Security
Data Encryption and Security: Ensuring that data is encrypted both at rest
and in transit across multiple locations adds complexity.
Access Control: Ensuring that only authorized users can access and modify
data across distributed locations is more difficult to enforce.
8. Scalability
Dynamic Scaling: While distributed databases are designed to be scalable,
dynamically adding or removing nodes, redistributing data, and managing
the resulting load balancing can be a complex task.
Load Balancing: Distributing traffic and queries evenly across all nodes is a
significant challenge, especially in large-scale systems.
9. Cost
Higher Infrastructure Costs: Operating and maintaining distributed
databases often require more resources, including hardware, storage,
network bandwidth, and engineering expertise.
Overhead from Redundancy: Data redundancy (e.g., replication) across
multiple nodes increases storage and operational costs.
3. What is the difference between a homogeneous and heterogeneous
distributed database system?
3.Mixed Fragmentation
The combination of vertical fragmentation of a table followed by further
horizontal fragmentation of some fragments is called mixed or hybrid
fragmentation. For defining this type of fragmentation, we use the SELECT and
the PROJECT operations of relational algebra. In some situations, the horizontal
and the vertical fragmentation isn’t enough to distribute data for some
applications and in those conditions, we need a fragmentation called a mixed
fragmentation.
2. Data Replication
Making copies of data (replicas) and storing them on multiple nodes for
availability and fault tolerance.
Types of Replications:
Full Replication: Entire database is replicated on all nodes.
Partial Replication: Only some fragments are replicated on select nodes.
No Replication: Each fragment is stored on only one node.
Challenges:
Deciding which data to replicate and where.
Managing consistency between replicas (e.g., eventual consistency vs.
strong consistency).
Balancing the trade-offs between replication overhead and availability.
3. Data Allocation
Determining which fragments or replicas are stored on which nodes in the system.
Strategies:
Centralized: All data is stored on a single node (rare in distributed systems).
Decentralized: Data is distributed across multiple nodes based on
predefined rules.
Dynamic Allocation: Data placement changes dynamically based on
workload or access patterns.
Challenges:
Minimizing data movement between nodes during queries.
Balancing load across nodes to avoid bottlenecks.
Considering geographic location for latency optimization.
5. Transaction Management
Ensuring ACID (Atomicity, Consistency, Isolation, Durability) properties for
transactions in a distributed environment.
Challenges:
Managing distributed transactions across multiple nodes.
Handling failures during transactions to maintain atomicity and consistency.
Deciding on a concurrency control mechanism (e.g., locking, timestamp
ordering).
Using distributed commit protocols like Two-Phase Commit (2PC) or Three-
Phase Commit (3PC).
10. Briefly describe at least two algorithms used for distributed query
optimization. Highlight their advantages and limitations.
Algorithms aim to minimize costs (e.g., communication, computation, and storage
costs) while ensuring timely and accurate results. Below are the main algorithms
and strategies used in distributed query optimization:
1. Dynamic Programming-Based Algorithms
Description:
These algorithms systematically evaluate all possible query execution plans and
select the one with the lowest cost.
Typically used for optimizing joins.
Explores join orders (e.g., left-deep, right-deep, bushy trees).
Algorithm Steps:
Generate all possible query plans.
Estimate the cost of each plan (e.g., communication, I/O, CPU).
Choose the plan with the minimum cost.
Advantages:
Produces an optimal plan for small queries.
Disadvantages:
Computationally expensive for large queries (exponential complexity).
Example: Optimize SELECT * FROM A, B, C WHERE A.id = B.id AND B.id = C.id by
evaluating all possible join orders
2. Greedy Algorithms
Description:
These algorithms iteratively build a query execution plan by choosing the best
option at each step (locally optimal choices).
Unlike dynamic programming, greedy algorithms don't backtrack to reconsider
earlier decisions.
Algorithm Steps:
Start with the smallest subquery or data fragment.
Choose the next operation or join that has the lowest incremental cost.
Repeat until the full query is processed.
Advantages:
Simple and efficient for queries with many fragments.
Disadvantages:
May result in suboptimal plans due to lack of global optimization.
Example: Join the two smallest tables first and proceed iteratively.
3. Cost-Based Algorithms
Description:
These algorithms evaluate the cost of different execution plans based on a cost
model that considers:
Communication cost (data transfer between nodes).
Computation cost (CPU time for joins, filters, etc.).
Disk I/O cost (reading/writing data to/from disk).
Algorithm Steps:
Estimate the cost of executing each operation on each node.
Generate alternative plans for query execution.
Select the plan with the lowest estimated cost.
Advantages:
Produces high-quality plans for small to medium queries.
Disadvantages:
Requires accurate cost models, which can be challenging in dynamic
environments.
Example: Use a cost model to decide whether to ship data from Node A to Node B
for a join or to process it locally.
11. Write syntax and give necessary examples for DDL Commands.
DDL Commands:
1. Create
2. Alter
3. truncate
4. drop
5. Rename
1.CREATE:
This command is used to create a new table in SQL.
The user has to give information like table name, column names, and their
datatypes.
Syntax –
CREATE TABLE table_name
(
column_1 datatype,
column_2 datatype,
column_3 datatype);
Example –
We need to create a table for storing Student information of a particular
College. Create syntax would be as below.
CREATE TABLE Student_info
(
College_Id number(2),
College_name varchar(30),
Branch varchar(10)
);
2.ALTER:
This command is used to add, delete or change columns in the existing table.
The user needs to know the existing table name and can do add, delete or
modify tasks easily.
Add column in table
Syntax–
Syntax to add a column to an existing table.
ALTER TABLE table_name
ADD column_name datatype;
Example–
In our Student_info table, we want to add a new column for CGPA. The
syntax would be as below as follows.
ALTER TABLE Student_info ADD CGPA number;
Add multiple columns in table
ALTER TABLE customers
ADD (customer_name varchar2(45),
city varchar2(40) DEFAULT 'Seattle');
Modify column in table
Syntax
To MODIFY A COLUMN in an existing table, the Oracle ALTER TABLE
syntax is:
ALTER TABLE table_name
MODIFY column_name column_type;
For example:
ALTER TABLE customers
MODIFY customer_name varchar2(100) NOT NULL;
Modify Multiple columns in table
Syntax
To MODIFY MULTIPLE COLUMNS in an existing table, the Oracle
ALTER TABLE syntax is:
ALTER TABLE table_name
MODIFY (column_1 column_type,
column_2 column_type,
...
column_n column_type);
Example
Let's look at an example that shows how to modify multiple columns in an
Oracle table using the ALTER TABLE statement.
For example:
ALTER TABLE customers
MODIFY (customer_name varchar2(100) NOT NULL,
city varchar2(75) DEFAULT 'Seattle' NOT NULL);
Drop column in table
Syntax
To DROP A COLUMN in an existing table, the Oracle ALTER TABLE
syntax
ALTER TABLE table_name
DROP COLUMN column_name;
Example
Let's look at an example that shows how to drop a column in an Oracle table
using the ALTER TABLE statement.
For example:
ALTER TABLE customers
DROP COLUMN customer_name;
Rename column in table
Syntax
To RENAME A COLUMN in an existing table, the Oracle ALTER TABLE
syntax is:
ALTER TABLE table_name
RENAME COLUMN old_name TO new_name;
Example
Let's look at an example that shows how to rename a column in an Oracle
table using the ALTER TABLE statement.
For example:
ALTER TABLE customers
RENAME COLUMN customer_name TO cname;
Command-3 :
TRUNCATE :
This command is used to remove all rows from the table, but the structure of
the table still exists.
Syntax –
Syntax to remove an existing table.
TRUNCATE TABLE table_name;
Example –
The College Authority wants to remove the details of all students for new
batches but wants to keep the table structure. The command they can use is as
follows.
TRUNCATE TABLE Student_info;
Command-4 :
DROP :
This command is used to remove an existing table along with its structure
from the Database.
Syntax –
Syntax to drop an existing table.
DROP TABLE table_name;
Example –
If the College Authority wants to change their Database by deleting the
Student_info Table.
DROP TABLE Student_info;
Command -5
RENAME:
It is possible to change name of table with or without data in it using simple
RENAME command.
We can rename any table object at any point of time.
Syntax –
RENAME TABLE <Table Name> To <New_Table_Name>;
Example:
If you want to change the name of the table from Employee to Emp we can
use rename command as
RENAME TABLE Employee To EMP;
12. Write syntax and give necessary examples for DML Commands.
Data Manipulation Language (DML)
Data Manipulation Language (DML) commands in Oracle are used to manage
and interact with data in a database. Some examples of DML commands in
Oracle include:
INSERT: Adds rows to an existing table
UPDATE: Updates data in an existing table
DELETE: Deletes rows from a table
SELECT: Accesses data in a database
INSERT
The INSERT statement adds rows to an existing table. For example, you can
use the INSERT statement to add a row to an employee table.
Inserting a single record using the Values keyword
1. INSERT INTO table (column1, column2, ... column_n ) VALUES
(expression1, expression2, ... expression_n );
2. INSERT INTO table VALUES (expression1, expression2, ...
expression_n );
Using INSERT ALL (Preferred for inserting multiple different rows)
INSERT ALL
INTO employees (id, name, salary) VALUES (101, 'Alice', 50000)
INTO employees (id, name, salary) VALUES (102, 'Bob', 55000)
INTO employees (id, name, salary) VALUES (103, 'Charlie', 60000)
SELECT * FROM dual;
UPDATE
The UPDATE statement updates data in an existing table. You can use the
UPDATE statement to update one or more columns.
Updating a Student's Grade
UPDATE students
SET grade = 10
WHERE student_id = 101;
Updating Multiple Fields at Once
UPDATE students
SET name = 'Alicia', email = 'alicia@email.com'
WHERE student_id = 101;
DELETE
The DELETE statement deletes rows from a table. If you don't use the WHERE
clause, the DELETE statement deletes all rows in the table.
Deleting a Record
DELETE FROM grades
WHERE student_id = 101 AND subject = 'Math';
Deleting All Records from a Table
DELETE FROM students;
SELECT
The SELECT statement accesses data in a database. You can use the SELECT
statement to manipulate the accessed data before returning the results.