Advanced Database Chapter 6 and 7 | PDF | Databases | Database Transaction
0% found this document useful (0 votes)
2 views

Advanced Database Chapter 6 and 7

Distributed Database Systems (DDBMS) integrate data across multiple sites, allowing for decentralized storage while appearing centralized to users. They utilize various data allocation strategies, such as centralized, partitioned, and replicated, and involve components like local and distributed DBMS, global system catalog, and data communication. DDBMS face challenges in query processing, transaction management, and security, but offer advantages like data sharing, reliability, and scalability.

Uploaded by

alemunuruhak9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Advanced Database Chapter 6 and 7

Distributed Database Systems (DDBMS) integrate data across multiple sites, allowing for decentralized storage while appearing centralized to users. They utilize various data allocation strategies, such as centralized, partitioned, and replicated, and involve components like local and distributed DBMS, global system catalog, and data communication. DDBMS face challenges in query processing, transaction management, and security, but offer advantages like data sharing, reliability, and scalability.

Uploaded by

alemunuruhak9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Distributed Database Systems

Advanced Database Systems


Distributed Database Systems
 Database development facilitates the integration of data available in an
organization from a number of applications and enforces security on
data access on a single local site.
 But it is not always the case that organizational data reside in one
central site.
 This demand databases at different sites to be integrated and
synchronized with all the facilities of database approach.
 This is will be made possible by computer networks and data
communication optimized by internet, mobile and wireless computing
and intelligent devices.
 This leads to Distributed Database Systems.
Distributed Database Systems
 Distributed Database is not a centralized database.
Distributed Database Systems
 Distributed DB stores logically related shared data and metadata at
several physically independent sites connected via network.
 Distributed DBMS is the software system that permits the
management of a Distributed DB
 Data allocation is the process of deciding where to allocate/store
particular data item.
 There are 3 data allocation strategies:
Data Allocation
There are 3 data allocation strategies:
1. Centralized: the entire DB is located at a single site and computers access through the network.
Known as distributed processing.
2. Partitioned: the DB is split into several disjoint parts (called partitions, segments or fragments)
and stored at several sites
3. Replicated: copies of one or more partitions are stored at several sites
 In a distributed database system, the database is logically stored as single database
but physically fragmented on several computers.
 The computers in a distributed system communicate with each other through
various communication media, such as high speed buses or telephone line.
Distributed Database Systems
 A distributed database system has the following components.
1. Local DBMS
2. Distributed DBMS
3. Global System Catalog (GSC)
4. Data communication (DC)
 A distributed database system consists of a collection of sites, each of
which maintains a local database system (Local DBMS) but each local
DBMS also participates in at least one global transaction where
different databases are integrated together.
 Local Transaction: transactions that access data only in that single site
 Global Transaction: transactions that access data in several sites.
Distributed Database Systems
 Three architectures for parallel DDBMS:
 Shared Memory- for fast data access for a limited number of processors.
 Shared Disk- for application inherently centralized
 Shared nothing.- massively parallel
 What makes DDBMS different is that
 The various sites are aware of each other
 Each site provides a facility for executing both local and global transactions.
 The different sites can be connected physically in different topologies.
 Tree Network,
 Star Network and
 Ring Network
Distributed Database Systems
 The distribution of the database sites could be:
 Large Geographical Area: Long-Haul Network
 relatively slow
 less reliable
 uses telephone line, satellite
 Small Geographical Area: Local Area Network
 higher speed
 lower rate of error
 coaxial, fiber optics
Distributed Database Systems
 Even though integration of data implies centralized storage and
control, in distributed database systems the intention is different.
 Data is stored in different database systems in a decentralized manner
but act as if they are centralized through development of computer
networks.
 A distributed database system consists of loosely coupled sites that
share no physical component and database systems that run on each
site are independent of each other.
 Those which share physical components are known as Parallel DBMS.
 Transactions may access data at one or more sites.
Functions of DDBMS
 DDBMS have the following functionality.
 Extended Communication Services to provide access to remote sites.
 Distributed Query Processing - optimization of query remote data access.
 Extended security- access control to a distributed data.
 Extended Concurrency Control –maintain consistency of replicated data.
 Extended Recovery Services- failures of individual sites and the
communication line.
Issues in DDBMS
 How is data is stored in DDBMS
There are several ways of storing a single relation in distributed database
systems.
Replication
 System maintains multiple copies of similar data (identical data)
 Stored in different sites, for faster retrieval and fault tolerance.
 Duplicate copies of the tables can be kept on each system (replicated). With this option, updates to
the tables can become involved (of course the copies of the tables can be read-only).
 Advantage: Availability, Increased parallelism (if only reading)
 Disadvantage: increased overhead of update
Issues in DDBMS
 How is data is stored in DDBMS
There are several ways of storing a single relation in distributed database
systems.
Fragmentation
 Relation is partitioned into several fragments stored in distinct sites
 The partitioning could be vertical, horizontal or both.
Issues in DDBMS
 How is data is stored in DDBMS
There are several ways of storing a single relation in distributed database
systems.
Horizontal Fragmentation
 Systems can share the responsibility of storing information from a
single table with individual systems storing groups of rows.
 Performed by the Selection Operation
 The whole content of the relation is reconstructed using the UNION
operation
Issues in DDBMS
 How is data is stored in DDBMS
There are several ways of storing a single relation in distributed database
systems.
Vertical Fragmentation
 Needs attribute with tuple number (the primary key value be repeated.)
 Performed by the Projection Operation
 The whole content of the relation is reconstructed using the Natural JOIN operation
using the attribute with Tuple number (primary key values).
Issues in DDBMS
 How is data is stored in DDBMS
There are several ways of storing a single relation in distributed database
systems.
Both (hybrid fragmentation)
 A system can share the responsibility of storing particular attributes of a
subset of records in a given relation.
 Performed by projection then selection or selection then projection
relational algebra operators.
 Reconstruction is made by combined effect of Union and natural join
operators.
Issues in DDBMS
Fragmentation is correct if it fulfils the following
 Complete: - a data item must appear in at least one fragment of a given
relation R (R1, R2…Rn).
 Reconstruction:- it must be possible to reconstruct a relation from the
fragments.
 Disjointness: - a data item should only be found in a single fragment except
for vertical fragmentation (the primary key is repeated for reconstruction).
Data Transparency
The degree to which system user may remain unaware of the details of how
and where the data items are stored in a distributed system.
 Distribution transparency Even though there are many systems they appear as one-
seen as a single, logical entity.
 Replication transparency Copies of data floating around everywhere also seem like just
one copy to the developers and users
 Fragmentation transparency A table that is actually stored in parts everywhere across
sites may seem like just a single table in a single
 Location Transparency- the user doesn‘t need to know where a data item is physically
located.
How does it work ?
Distributed computing can be difficult to implement, particularly for
replicated data that can be updated from many systems.
In order to operate a distributed database system has to take care of
 Distributed Query Processing
 Distributed Transaction Management
 Replication Data Management. If you are going to have copies of data on many
machines how often does the data get updated if it is changed in another system? Who is
in charge of propagating the update to the data?
 Distributed Database Recovery. If one machine goes down how does that affect the
others.
 Security: Just like any computer network, a distributed system needs to have a common
way to validate users entering from any computer in the network of servers.
Homogeneous and Heterogeneous DDBMS

In a homogeneous distributed database


 All sites have identical software (DBMS)
 Are aware of each other and agree to cooperate in processing user requests.
 Each site surrenders part of its autonomy in terms of right to change schemas or software
 Appears to the user as a single system.
In a heterogeneous distributed database
 Different sites may use different schemas and software (DBMS)
 Difference in schema is a major problem for query processing
 Difference in software is a major problem for transaction processing

 Sites may not be aware of each other and may provide only limited facilities for
cooperation in transaction processing.
 May need gateways to interface one another.
Why DDBMS ?/Advantages of DDBMS

Many existing systems


 Possibly there are many different existing system, with possible different kinds of systems
(Oracle, Informix, …) that need to be used together.
Data sharing and distributed control:
 User at one site may be able access data that is available at another site.
 We will have local as well as global database administrator
Reliability and availability of data: If one site fails the rest can continue operation
Speedup of query processing: Query can be sent to the least heavily loaded sites.
Expansion (Scalability)
Disadvantages of DDBMS

 Software Development Cost: Is difficult to install, thus is costly


 Greater Potential for Bugs: Parallel processing may endanger correctness
of algorithms
 Increased Processing Overhead: Exchange of message between sites –
high communication latency.
 Increased Complexity and Data Inconsistency Problems: Since clients
can read and modify closely related data stored in different database
instances concurrently.
 Security Problems: network and replicated data security.
Query Processing in DDBMS

we have to consider the following in distributed query processing:


 Cost of data transmission over the huge network
 Gain of parallel processing of a single query
 For the case of Replicated data allocation, even though parallel processing is used to
increase performance, update will have a great impact since all the sites containing the
data item should be updated.
 For the case of fragmentation, update works more like the centralized database but
reconstruction of the whole relation will require accessing data from all sites containing
part of the relation.
 There are different ways of executing a query.
 Then one can select the strategy that will reduce the data transfer cost for this specific
query.
Transaction Management in DDBMS
 A Distributed Transaction is a transaction that includes one or more statements that,
individually or as a group, update data on two or more distinct nodes of a distributed
database.
 There are two types of transaction in DDBMS to access data from other sites:
 Remote Transaction: contains only statements that access a single remote node. Thus, Remote Query statement is a
query that selects information from one or more remote tables, all of which reside at the same remote node or site.
 For example, the following query accesses data from the dept table in the Addis schema (the site) of the remote
sales database:
SELECT * FROM Addis.dept@sales.midroc.telecom.et;
 Distributed Transaction: contains statements that access more than one node.
 For example, the following query accesses data from the local database as well as the remote sales database:
SELECT ename, dname FROM Awassa.emp AW, Addis.dept@ sales.midroc.telecom.et AD WHERE AW.deptno = AD.deptno;

 If all statements of a transaction reference only a single remote node, the transaction is
remote, not distributed.
Database Security and Authorization
 Privacy – Ethical and legal rights that individuals have with regard to control over the
dissemination and user of their personal information.
 Database security – Protection of information contained in the database against
unauthorized access, modification or destruction.
 Database integrity – Mechanism that is applied to ensure that the data in the database is
correct and consistent.
 A good database security management system has the following characteristics:
 Privacy signifies that an unauthorized user cannot disclose data
 Integrity ensures that an unauthorized user cannot modify data
 Availability ensures that data be made available to the authorized user unfailingly
 Copyright ensures the native rights of individuals as a creator of information.
 Validity ensures activities to be accountable by law.
Database Security and Authorization
 Database Security - the mechanisms that protect the database against intentional or
accidental threats. Database security encompasses hardware, software, people and data.
 Database security and integrity is about protecting the database from being inconsistent
and being disrupted. We can also call it database misuse.
 Database misuse could be Intentional or Accidental, where accidental misuse is easier to
cope with than intentional misuse.
 Accidental inconsistency could occur due to:
 System crash during transaction processing
 Anomalies due to concurrent access
 Anomalies due to redundancy
 Logical errors
Intentional misuse could be:
 Unauthorized reading of data
 Unauthorized modification of data or
 Unauthorized destruction of data
Levels of Security Measures
Security measures can be implemented at several levels and for different components of the
system. These levels are:
 Physical Level: concerned with securing the site containing the computer system. The
site or sites containing the computer systems must be physically secured against armed or
sneaky entry by intruders.
 Human Level: concerned with authorization of database users for access the content at
different levels and privileges.
 Operating System: concerned with the weakness and strength of the operating system
security on data files.
 Database System: concerned with data access limit enforced by the database system.
 software-level security: with the network software is as important as physical security,
both on the Internet and networks private to an enterprise.
Authentication
 All users of the database will have different access levels and permission for different
data objects, and authentication is the process of checking whether the user is the one
with the privilege for the access level.
 Is the process of checking the users are who they say they are.
 Each user is given a unique identifier, which is used by the operating system to determine
who they are.
 Thus the system will check whether the user with a specific username and password is
trying to use the resource.
 Associated with each identifier is a password, chosen by the user and known to the
operation system, which must be supplied to enable the operating system to authenticate
who the user claims to be.
Authorization/Privilege
 Authorization refers to the process that determines the mode in which a particular
(previously authenticated) client is allowed to access a specific resource controlled by a
server.
Forms of user authorization on the data
 Read Authorization: the user with this privilege is allowed only to read the content of
the data object.
 Insert Authorization: the user with this privilege is allowed only to insert new records
or items to the data object.
 Update Authorization: users with this privilege are allowed to modify content of
attributes but are not authorized to delete the records.
 Delete Authorization: users with this privilege are only allowed to delete a record and
not anything else.
Authorization/Privilege
 Authorization refers to the process that determines the mode in which a particular
(previously authenticated) client is allowed to access a specific resource controlled by a
server.
Forms of user authorization on the database schema
 Index Authorization: deals with permission to create as well as delete an index table for
relation.
 Resource Authorization: deals with permission to add/create a new relation in the
database.
 Alteration Authorization: deals with permission to add as well as delete attribute.
 Drop Authorization: deals with permission to delete and existing relation.
Reading Assignments

 Discretionary Access Control Based on Granting /Revoking of


Privileges
 Mandatory Access Control for Multilevel Security
 Statistical DB Security

You might also like