Classification of DBMS
The DBMS can be classified according to the number of users
and the database site locations. Initial we will discuss the
database classification on the basis of the number of users.
These are Single-user DBMS and Multi-user DBMS.
• Single-user DBMS and Multi-user DBMS
Further database can be classified on the basis of the site
location. These are
•Centralized DBMS
•Parallel DBMS
•Distributed DBMS and
•Client/server DBMS.
Centralized Database System
The centralized database system consists of a single processor
together with its associated data storage devices and other
peripherals. It is physically confined to a single location. Data
can be accessed from the multiple sites with the use of a
computer network while the database is maintained at the
central site
Disadvantages of Centralized Database System
•When the central site computer or database system goes
down, then every user is blocked from using the system until
the system comes back.
•Communication costs from the terminals to the central site
can be expensive.
Parallel Database
Parallel database system consists of a multiple Central
Processing Units (CPUs) and data storage disk in parallel.
Hence, they improve processing and Input/ Output (I/O)
speeds. Centralized and client–server database systems are
not powerful enough to handle such applications. In parallel
processing, many operations are performed simultaneously,
as opposed to serial processing, in which the computational
steps are performed sequentially. Parallel database systems
are used in the application that have to query extremely large
databases or that have to process an extremely large number
of transactions per second.
Advantages of a Parallel Database System
•Extremely large databases
•Large number of transactions
•Throughput (number of tasks that can be completed in a
given time interval)
•High response time (that is, the amount of time it takes to
complete a single task from the time it is submitted)
•Enabling a single system to serve thousands of users
•Failure at one node does not bring the entire system down
•Higher speed up and scale up can be attained.
Disadvantages of a Parallel Database System
In a parallel database system, there is a startup cost
associated.
Since process executing in a parallel system often access
shared resources.
Distributed Database System
A logically interrelated collection of shared data physically
distributed over a computer network is called as distributed
database and the software system that permits the management
of the distributed database and makes the distribution
transparent to users is called as Distributed DBMS.
Advantages of Distributed Database System
Distributed database architecture provides greater efficiency
and better performance.
A single database (on server) can be shared across several
distinct client (application) systems.
As data volumes and transaction rates increase, users can
grow the system incrementally.
It causes less impact on ongoing operations when adding
new locations.
Distributed database system provides local autonomy.
Disadvantages of Distributed Database System
Recovery from failure is more complex in distributed
database systems than in centralized systems.
Client-Server DBMS
Client/Server architecture of database system has two logical
components namely client, and server. Clients are generally
personal computers or workstations whereas server is large
workstations, mini range computer system or a mainframe
computer system.
Advantages of Client/Server Database System
•Client/Server system has less expensive platforms.
•Client offer icon-based menu-driven interface.
•Client/Server environment facilitates better use of existing
data.
•Client/Server database system is more flexible as compared to
the Centralized system.
•Response time and throughput is high.
•The server machine can be custom-built to DBMS function
and thus can provide a better DBMS performance.
• The client machine might be a personnel workstation,
tailored to the needs of the end users and thus able to provide
better interfaces, high availability, faster responses and overall
improved ease of use to the user.
Disadvantages of Client/Server Database System
•Programming cost is high in client/server environments,
particularly in initial phases.
•There is a lack of management tools for diagnosis,
performance monitoring and tuning and security control, for
the DBMS, client and operating systems and networking
environments.
Parallel Processing
Parallel processing divides a large task into many smaller tasks
and executes the smaller tasks concurrently on several nodes.
As a result, the larger task completes more quickly. Some
tasks can be effectively divided and are good candidates for
parallel processing.
Example: Teller in Banking
Characteristics of a Parallel Processing
A parallel processing system has the following characteristics:
•Each processor in a system can perform tasks concurrently
•Tasks may need to be synchronized
•Nodes usually share resources, such as data, disks, and other
devices
Problems of Parallel Processing
Effective implementation of parallel processing involves two
challenges:
•Structuring tasks so some tasks execute at the same time "in
parallel"
•Preserving task sequencing for tasks that must execute
serially
Parallel Database Architectures
Parallel Database Systems are capable in handling large,
databases; distributed among multiple processors possibly
equally to perform the queries in parallel. Such a system
which share resources to handle massive data just to increase
the performance of the whole systems are called Parallel
Database Systems.
•Shared Memory Architecture
In Shared Memory architecture, single memory is shared
among many processors. Several processors are connected
through an interconnection network with main memory and
disk setup. Here interconnection network is usually a high
speed network (may be Bus, Mesh, or Hypercube) which
makes data sharing (transporting) easy among the various
components (Processor, Memory, and Disk).
Advantages:
•Simple implementation
•Establishes effective communication between processors
through single memory addresses space.
•Above point leads to less communication overhead.
Disadvantages:
•Higher degree of parallelism cannot be achieved due to the
reason that all the processors share the same interconnection
network to connect with memory.
•If any processor tries to read the data used or modified by
other processors, then we need to ensure that the data is of
latest version.
•Degree of Parallelism is limited. More number of parallel
processes might degrade the performance.
•Shared Disk Architecture
In Shared Disk architecture, single disk or single disk setup is
shared among all the available processors and also all the
processors have their own private memories.
Advantage
•Failure of any processors would not stop the entire system
(Fault tolerance)
•Interconnection to the memory is not a bottleneck. (It was
bottleneck in Shared Memory architecture)
•Support larger number of processors (when compared to
Shared Memory architecture)
Disadvantages:
•Interconnection to the disk is bottleneck as all processors
share common disk setup.
•Inter-processor communication is slow. The reason is, all the
processors have their own memory. Hence, the communication
between processors need reading of data from other
processors’ memory which needs additional software support.
•Shared Nothing Architecture
In Shared Nothing architecture, every processor has its own
memory and disk setup. This setup may be considered as set of
individual computers connected through high speed
interconnection network using regular network protocols and
switches for example to share data between computers. (This
architecture is used in the Distributed Database System). In
Shared Nothing parallel database system implementation, we
insist the use of similar nodes that are Homogenous systems.
Advantages:
•Number of processors used here is scalable. That is, the
design is flexible to add more number of computers.
•Unlike in other two architectures, only the data request which
cannot be answered by local processors need to be forwarded
through interconnection network.
Disadvantages:
•Non-local disk accesses are costly. That is, if one server
receives the request. If the required data not available, it must
be routed to the server where the data is available. It is slightly
complex.
•Communication cost involved in transporting data among
computers.
Distributed Database Management System
A Distributed Database Management System (DDBMS)
consists of a single logical database that is split into a number
of fragments. Each fragment is stored on one or more
computers under the control of a separate DBMS, with the
computers connected by a communications network. Each site
is capable of independently processing user requests that
require access to local data (that is, each site has some degree
of local autonomy) and is also capable of processing data
stored on other computers in the network.
Objectives of Distributed Databases:
1. Local autonomy
Each local site in a distributed system should be autonomous. All
operations at a give local site are controlled by that site.
2. No reliance on a central site
No site in the network relies on a central "master" site for some
central service.
3. Continuous operation
The system is not affected by node failures.
4. Local Independence (Location Transparency)
Users should not have to know where data is physically stored.
5. Fragmentation Independence (Fragmentation
Transparency)
Data fragmentation is transparent to the user. The user does not
need to know the name of the database fragment in order to
retrieve them.
6.Replication Independence (Replication Transparency)
7. Distributed Query Processing
Query optimization is performed transparently by the DDBMS.
8. Distributed Transaction Management
A transaction may update data at several different sites. The transaction
is transparently executed at several different DP sites.
9. Hardware Independence
10. Operating System Independence
11. DBMS Independence
Classification of DDBMS
A DDBMS may be classified as homogeneous or
heterogeneous.
•Homogeneous Distributed Database Systems
Homogeneous DDBMS
• All sites use same DBMS product.
• Much easier to design and manage.
• Approach provides incremental growth and
allows increased performance.
•Heterogeneous Distributed Database Systems
Heterogeneous DDBMS
• Sites may run different DBMS products, with
possibly different underlying data models.
• Occurs when sites have implemented their own
databases and integration is considered later.
• Translations required to allow for:
– Different hardware.
– Different DBMS products.
– Different hardware and different DBMS products.
• Typical solution is to use gateways.
Distributed Database Design
• Data fragmentation:
– How to partition the database into fragments
• Data replication:
– Which fragments to replicate
• Data allocation:
– Where to locate those fragments and replicas
Distributed Databases 30
Data Fragmentation
• Breaks single object into two or more segments or
fragments
• Each fragment can be stored at any site over a
computer network
• Information about data fragmentation is stored in
the distributed data catalog (DDC), from which it is
accessed by the TP to process user requests
Distributed Databases 31
Data Fragmentation Strategies
• Horizontal fragmentation:
– Division of a relation into subsets (fragments) of tuples
(rows)
• Vertical fragmentation:
– Division of a relation into attribute (column) subsets
• Mixed fragmentation:
– Combination of horizontal and vertical strategies
Distributed Databases 32
Horizontal and Vertical Fragmentation
41
Mixed Fragmentation
A Sample EMPLOYEE Table
EMP_ID EMP_NAME DEPT_ID EMP_SALARY
E101 Aryan 4 55000
E102 Satvik 5 65000
E103 Harpartap 5 50000
E104 Gurvandan 5 40000
E105 Arpit 4 60000
E106 Siddhant 2 50000
Now this relation can be fragmented into three fragments as follows:
FRAGMENTS EMPLOYEE AS
MUMBAI_EMP AT SITE ‘MUMBAI’ WHERE DEPT_ID = 2
DELHI_ EMP AT SITE ‘DELHI’ WHERE DEPT_ID = 4
MUDRAI_ EMP AT SITE ‘MUDRAI’ WHERE DEPT_ID = 5
35
Horizontal Fragmentation:
A horizontal fragment of a relation is a subset of the tuples
with all attributes in that relation. Horizontal fragmentation
splits the relation ‘horizontally’ by assigning each tuple or
group of tuples of a relation to one or more fragments, where
each tuples or a subset has a certain logical meaning. A
horizontal fragment is defined using the SELECT operation of
the relational algebra. The fragmentation is a horizontal
fragmentation and can be written in terms of relational algebra
as:
MUMBAI_EMP : σ DEPT_ID = 2 (EMPLOYEE)
DELHI_ EMP : σ DEPT_ID = 4 (EMPLOYEE)
MUDRAI_ EMP : σ DEPT_ID = 5 (EMPLOYEE)
36
FRAGMENTS : MUMBAI_EMP
EMP_ID EMP_NAME DEPT_ID EMP_SALARY
E106 Siddhant 2 50000
FRAGMENTS : DELHI_EMP
EMP_ID EMP_NAME DEPT_ID EMP_SALARY
E101 Aryan 4 55000
E105 Arpit 4 60000
FRAGMENTS : MUDRAI_EMP
EMP_ID EMP_NAME DEPT_ID EMP_SALARY
E102 Satvik 5 65000
E103 Harpartap 5 50000
E104 Gurvandan 5 40000 37
Vertical Fragmentation:
A vertical fragmentation splits the relation by decomposing
‘vertically’ by columns. A vertical fragment of a relation keeps only
certain attributes of the relations at a particular site, because each site
may not need all the attributes of a relation. In vertical fragment, it is
necessary to include the primary key attributes in every vertical
fragment so that the full relation can be reconstructed from the
fragments.
FRAGMENTS EMPLOYEE AS
MUMBAI_EMP (EMP_ID) AT SITE ‘MUMBAI’
DELHI_ EMP (EMP_ID, DEPT_ID) AT SITE ‘DELHI’
MUDRAI_ EMP (EMP_ID, DEPT_ID, EMP_SALARY)) AT SITE ‘MUDRAI’
Vertical fragmentation can be written in terms of relational algebra as:
MUMBAI_EMP : π EMP_ID (EMPLOYEE)
DELHI_ EMP : π EMP_ID, DEPT_ID (EMPLOYEE)
MUDRAI_ EMP : π EMP_ID, DEPT_ID, EMP_SALARY (EMPLOYEE)
38
FRAGMENTS : MUMBAI_EMP
EMP_ID EMP_NAME
E106 Siddhant
FRAGMENTS : DELHI_EMP
EMP_ID EMP_NAME DEPT_ID
E101 Aryan 4
E105 Arpit 4
FRAGMENTS : MUDRAI_EMP
EMP_ID EMP_NAME DEPT_ID EMP_SALARY
E102 Satvik 5 65000
E103 Harpartap 5 50000
39
Mixed Fragmentation
A mixed fragmentation is defined using the selection (SELECT) and
projection (PROJECT) operations of the relational algebra. The
original relation is obtained by a combination of JOIN and UNION
operations.
A mixed fragmentation is given as σ p (π a1,a2,a3…an)(R)) .
EMP_NAME DEPT_ID
Satvik 5
Harpartap 5
Gurvandan 5
40
Correctness of Fragmentation
• Three correctness rules:
– Completeness
– Reconstruction
– Disjointness.
Correctness of Fragmentation
• Completeness
– If relation R is decomposed into fragments R1, R2, ...
Rn, each data item that can be found in R must appear
in at least one fragment.
• Reconstruction
• Must be possible to define a relational operation
that will reconstruct R from the fragments.
• Reconstruction for horizontal fragmentation is
Union operation and Join for vertical .
Correctness of Fragmentation
• Disjointness
• If data item di appears in fragment Ri, then it
should not appear in any other fragment.
• Exception: vertical fragmentation, where primary
key attributes must be repeated to allow
reconstruction.
• For horizontal fragmentation, data item is a tuple
• For vertical fragmentation, data item is an
attribute.
Data Replication
• Storage of data copies at multiple sites served by a
computer network
• Fragment copies can be stored at several sites to
serve specific information requirements
– Can enhance data availability and response time
– Can help to reduce communication and total query costs
– Imposes additional processing overhead
• Which copy do you read when submitting a query
• All copies must be updated when a write occurs
Distributed Databases 44
Data Replication
Distributed Databases 45
Replication Scenarios
• Fully replicated database:
– Stores multiple copies of each database fragment at multiple sites
– Can be impractical due to amount of overhead
• Partially replicated database:
– Stores multiple copies of some database fragments at multiple sites
– Most DDBMSs are able to handle the partially replicated database well
• Unreplicated database:
– Stores each database fragment at a single site
– No duplicate database fragments
– Database size, usage frequency and costs (performance, overhead,
management) influence the decision to replicate
Distributed Databases 46
Data Allocation
• Deciding where to locate data
• Allocation strategies:
– Centralized data allocation
• Entire database is stored at one site
– Partitioned data allocation
• Database is divided into several disjointed parts (fragments)
and stored at several sites
– Replicated data allocation
• Copies of one or more database fragments are stored at
several sites
• Data distribution over a computer network is
achieved through data partition, data replication, or
a combination of both
Distributed Databases 47