0% found this document useful (0 votes)

13 views104 pages

Module 2

Uploaded by

samaymistry105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views104 pages

Module 2

Uploaded by

samaymistry105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MM-MCA-14

Advanced Database system

and Data Mining Techniques
Reference Books XP

Reference Reference Name

No
1 Ponniah, Paulraj, Data warehousing fundamentals: a comprehensive guide for IT
professionals, John Wiley & Sons, 2004.
2 Dunham, Margaret H, Data mining: Introductory and advanced topics, Pearson Education
India, 2006.
3 Gupta, Gopal K, Introduction to data mining with case studies, PHI Learning Pvt. Ltd.,
2014.
4 Han, Jiawei, Jian Pei, and Micheline Kamber, Data mining: concepts and techniques,
Second Edition, Elsevier, Morgan Kaufmann, 2011.
5 Ramakrishnan, Raghu, Johannes Gehrke, and Johannes Gehrke, Database management
systems, Vol. 3, McGraw-Hill, 2003.
6 Elmasri, Ramez, and Shamkant B. Navathe, Fundamentals of Database Systems, Pearson
Education, 2008, (2015).
7 Silberschatz, Abraham, Henry F. Korth, and Shashank Sudarshan,Database system
concepts, Vol. 5,McGraw-Hill, 1997.
XP
Web References:

Reference No Reference Name

1 [Link]

2 [Link]

3 [Link]

4 [Link]

mining-guide/

3
Unit 2: Parallel
Database Distributed
Database
Module 2: Architecture and Data Distribution in Distributed and Parallel Databases XP

[Link]. Topics 05

Architecture for Parallel Databases, Types of 1

1
Distributed Databases,
Distributed DBMS Architecture 1
2
Storing Data in a Distributed DBMS. 1
3

ORDBMS: Structured Data Types, Operations on 1

4
Structured Data,

Objects, OIDs and Reference Types, Object oriented 1

5
versus Object relational database.
Self Learning Topics: Mapping OODBMS to ORDBMS
6
XP
Distributed Databases
• Distributed database system is a system where data is stored
across several sites, and each site is typically managed by a
DBMS that can run independent of the other sites.
• A distributed database appears to a user as a single database
but is, in fact, a set of databases stored on multiple computers.
• Computers in such systems are called sites or nodes.
• Nodes are connected through some communication medium
like high-speed network or telephone lines.
• A distributed database management system (DDBMS) is a
centralized software system that manages a distributed
database in a manner as if it were all stored in a single location.
Distributed Databases XP
Distributed Database – User View XP
Distributed DBMS - Reality XP
Transparent Access XP
Features of Distributed Databases XP

• Sharing data:- Users at one site may be able to access data

residing at other sites.
• Availability:- If one site fails in a distributed system, the
remaining sites may be able to continue operating.
• Autonomy:- Each site is able to retain a degree of control over
data that are stored locally.
• Data is physically stored across multiple sites. Data in each site
can be managed by a DBMS independent of the other sites.
• Databases in the collection are logically interrelated with each
other. Often they represent a single logical database.
XP
Features of Distributed Databases

• Various sites co-operate in transparent manner.

– Transparent implies that each user within the system may access all
of the data within the system as if they were a single database.

• System possesses ‘location independence’ i.e. the user is

unaware of where the data is located, it is possible to move the
data from one physical location to another without affecting
the user.
Factors Encouraging DDBMS XP

• Distributed Nature of Organizational Units

• Need for Sharing of Data
• Support for Both OLTP and OLAP
• Database Recovery - Replication of data across different sites
automatically helps in data recovery if database in any site is
damaged.
Advantages of Distributed Databases XP

• Scalability− Expansion of system (to new locations or new units) simply

requires adding new computers and local data to the new site and
finally connecting them to the distributed system, with no interruption
in current functions.
• Improved Processing Power - Instead of one server handling the full
database, we now have a collection of machines handling the same
database.
• More Reliable − In distributed systems, when a component fails, the
functioning of the system continues may be at a reduced performance.
Removal of reliance on a central site.
• Better Response − If data is distributed in an efficient manner, then user
requests can be met from local data itself, thus providing faster
response.
• Lower Communication Cost − In distributed database systems, if data is
located locally where it is mostly used, then the communication costs
for data manipulation can be minimized.
Disadvantages of Distributed Databases XP

• Need for complex and expensive software − DDBMS demands

complex and often expensive software to provide data
transparency and co-ordination across the several sites.
• Processing overhead − Even simple operations may require a
large number of communications and additional calculations to
provide uniformity in data across the sites.
• Data integrity − The need for updating data in multiple sites pose
problems of data integrity.
• Overheads for improper data distribution − Responsiveness of
queries is largely dependent upon proper data distribution.
XP
Types of DDBMS
• Homogeneous distributed database
• Heterogeneous distributed database
Types of DDBMS XP
Homogeneous Distributed Database XP

• The sites use very similar software.

• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other
sites to process user requests.
• The database is accessed through a single interface as if it is a
single database.
Heterogeneous Distributed Database XP

• Different sites use dissimilar schemas and software.

• The system may be composed of a variety of DBMSs like
relational, traditional file system or different versions of
DBMSs.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited
co-operation in processing user requests.
Architecture of DDBMS XP

• Client-server Systems
• Collaborating server Systems
• Middleware Systems
Client-Server Architecture XP

• A client-server system has one or more client processes and

one or more server processes.
• A client process can send a query to any one server process.
• Clients are responsible for user-interface issues and servers
manage data and execute transaction.
• A client process could run on a personal computer and send
queries to a server running on mainframe.
Advantages of Client Server Architecture XP

• It is relatively simple to implement due to its clean separation

of functionality and because the server is centralized.
• Expensive Server machines are not underutilized by dealing
with mundane user-interactions.
• Users can run a graphical user interface that they are familiar
with, rather than the (possibly unfamiliar and unfriendly) user
interface on the server.
Disadvantages of Client Server Architecture XP

• It does not allow a single query to span multiple servers

because the client process would have to be capable of
breaking such a query into appropriate sub queries to be
executed at different sites and then piecing together the
answers to the sub queries.
• The client process would therefore be quite complex, and its
capabilities would begin to overlap with the server;
distinguishing between clients and servers become harder.
Collaborating Server XP

• Designed to run a single query on multiple servers.

• Servers break single query into multiple small queries and the
result is sent to the client.
• Collaborating server architecture has a collection of database
servers. Each server is capable for executing the current
transactions across the databases.
Middleware XP

• The middleware architecture is designed to allow a single query

to span multiple servers, without requiring all database servers to
be capable of managing such multi-site execution strategies.
• Here we need just one database server capable of managing
queries and transactions that spans over multiple servers; the
remaining servers need to handle only local queries and
transaction.
• This special server is a layer of software that coordinates the
execution of queries and transaction across one or more
independent database servers.
• Such software is known as middleware.
• Especially useful when trying to integrate several legacy systems
Middleware XP

• The middleware layer is capable of executing joins & other

relational operations on data obtained from the other servers
but, does not itself maintain any data.
Storing Data in a Distributed DBMS XP

• Fragmentation
▪ Relation is partitioned into several fragments stored in distinct sites
• Replication
▪ System maintains multiple copies of data, stored in different sites, for faster
retrieval and fault tolerance.
• Replication and fragmentation can be combined
▪ Relation is partitioned into several fragments: system maintains several identical
replicas of each such fragment.
Data Fragmentation XP

• The process of dividing the database into a smaller multiple

parts is called as fragmentation.
• These fragments may be stored at different locations.
• The data fragmentation process should be carried out in such
a way that the reconstruction of original database from the
fragments is possible.
• Types of Fragmentation
–Horizontal Fragmentation
–Vertical Fragmentation
–Mixed Fragmentation
XP

Vertical
Fragmentation

Horizontal
Fragmentation
Horizontal Fragmentation XP

• Horizontal fragmentation divides a relation(table)

horizontally into the group of rows to create subsets
of tables.
• The union of the horizontal fragments must be equal
to the original relation.
Vertical Fragmentation XP

• Divides a relation vertically into groups of columns to create

subsets of relations.
• Column wise : the schema for relation r is split into several
smaller schemas
▪ The collection of vertical fragments should be a lossless-join
decomposition.
▪ To ensure this property all schemas must contain a common
candidate key.
▪ A special attribute, the tuple-id attribute may be added to each
schema to serve as a candidate key.
• Example : relation account with following schema employee_info
= (customer_name, branch_name, account_number, balance )
Advantages of Fragmentation XP

• Allows a relation to be split so that tuples are located

where they are most frequently accessed to minimize
data transfer
• Since data is stored close to the site of usage, efficiency
of the database system is increased.
• Local query optimization techniques are sufficient for
most queries since data is locally available.
• Allows parallel processing on fragments of a relation
Data Replication XP

• A relation or fragment of a relation is said to be

replicated if it is stored redundantly in two or more
sites.
• Full replication of a relation is the case where the
relation is stored at all sites.
• Fully redundant databases are those in which every site
contains a copy of the entire database.
• Replication Schemes
– Full replication
– No replication
– Partial replication
Types of Data Replication XP

• Synchronous replication
– In synchronous replication, the replica will be modified immediately
after some changes are made in the relation table. So there is no
difference between original data and replica.

• Asynchronous replication
– In asynchronous replication, the replica will be modified after
commit is fired on to the database.
Advantages of Replication XP

▪ Availability: failure of site containing relation r does

not result in unavailability of r if replicas exist.
▪ Parallelism: queries on r may be processed by several
nodes in parallel.
▪ Reduced data transfer: relation r is available locally at
each site containing a replica of r. In turn it leads to
faster query evaluation.
Disadvantages of Replication XP

▪ Increased cost of updates: each replica of relation r

must be updated.
▪ Increased complexity of concurrency control:
concurrent updates to distinct replicas may lead to
inconsistent data unless special concurrency control
mechanisms are implemented.
▪ Additional requirement of storage space
Parallel vs. Distributed Databases XP

Parallel Systems Distributed Systems

Memory Tightly coupled shared Distributed memory (shared
memory (shared memory, nothing)
shared disk)
Control Global clock present No global clock. Synchronization
more difficult.
Processor Order of Tbps Order of Gbps
interconnection (Bus, mesh, tree, hypercube) (Ethernet, token ring)
Purpose Performance, Scientific Performance (cost & scalability),
computing reliability, availability,
information/ resource sharing
Geographical Nodes located at same location Nodes located at geographically
Location different location
XP

Parallel
Databases
Centralized Database Systems XP

• All the data is maintained at a single site

• Assumed that the processing of individual transactions is
essentially sequential
• Databases more prevalent in organizations
– Parallel databases
– Distributed databases
Parallel Database System XP

• Parallel databases improve processing and input/output speeds

by using multiple CPUs and disks in parallel.
• Improves performance through parallelization of various
operations, such as loading data, building indexes, and
evaluating queries.
• Can have parallel processing on a single
machine(multiprocessor computer having multiple disks
connected by a fast interconnection network)
• Distributed processing usually imply parallel processing (not
vise versa).
Why Parallel Processing? XP
Parallel Systems XP

• A coarse-grain parallel machine consists of a

small number of powerful processors.
• A massively parallel or fine grain parallel
machine utilizes thousands of smaller
processors.
Performance Measures of Database Systems XP

▪ Throughput - the number of tasks that can be completed in a

given time interval.
▪ Response time - the amount of time it takes to complete a single
task from the time it is submitted.

• Goals of parallel processing - Linear Scaleup & Linear

Speedup.
Speed-Up XP

• Running a given task in less time by increasing the degree

of parallelism is called speedup.
OR
• By increasing number of resources for a constant database
size, performance should increase linearly.
▪ Speedup = small system elapsed time (TS)/ large system elapsed time
(TL)
(Speedup = time_single_machine/time_parallel)
▪ Linear speedup is desired.
▪ Elapsed time is the amount of time that passes from the start of an
event to its finish
Speedup XP

Speedup
Scale-Up XP

• Scaleup is the ability to keep the same performance

levels (response time) when both workload
(transactions) and resources (CPU, memory) increase
proportionally.
• Factor m that expresses how much more work can be
done in the same time period by a system n times
larger.
• scaleup = small system small problem elapsed time/ big
system big problem elapsed time
• Scale up is linear if equation equals 1.
Scaleup XP

Scaleup
Factors Limiting Speedup and Scaleup XP

• Speedup and scaleup are often sublinear due to:

– Startup costs: Cost of starting up multiple processes may
dominate computation time, if the degree of parallelism is
high.
– Interference: Processes accessing shared resources
(e.g.,system bus, disks, or locks) compete with each other,
thus spending time waiting on other processes, rather than
performing useful work.
– Skew: Increasing the degree of parallelism increases the
variance in service times of parallelly executing tasks. Overall
execution time determined by slowest of parallelly executing
tasks.
XP
Physical Architectures for Parallel Database Systems
• Shared memory
• Shared disk
• Shared nothing
Shared Memory XP

• Multiple CPUs are attached to an interconnection network and

can access a common region of main memory
• Processors have large memory caches to avoid memory
references. If data is not in cache, shared memory references are
required.
Shared Memory XP

• Advantages
– Data is easily accessible to any processor.
– One processor can send message to other efficiently via memory writes.
• Disadvantages
– Waiting time of processors is increased due to more number of processors.
–System needs to maintain cache coherency, which
becomes overhead with increasing no of processors.
– Not scalable since the interconnection network or memory contention
becomes a bottleneck (Bandwidth problem. memory interference)
• Example - XPRS
[ Cache coherence refers to the problem of keeping the data in
these caches consistent.]
Shared Disk XP

• Each CPU has its own private memory

but can directly access all disks through
an interconnection network
• As each processor has its own memory
bus is not a bottleneck.
• The system built around this system are
called as clusters.
Shared Disk XP

• Advantages
– Fault tolerance - If a processor (or its memory) fails, the
other processors can take over its tasks, since the database
is resident on disks that are accessible from all processors.
• Disadvantage
– Interconnection channel becomes bottleneck; Limited
scalability though more than shared memory
– Communication across processors is slower, since it has to
go through a communication network
Shared Nothing XP

• Each processor has its own local

memory and local disk(s).
• Processors can communicate with
each other through
intercommunication channel (high
speed network and switches).
• A node functions as the server for the
data on the disk or disks that the node
owns.
Shared Nothing XP

• Advantages
– Only queries that access nonlocal disks, have to pass
through the network, so less I/O has to go through a single
interconnection network
– Can be scaled up to thousands of processors without
interference.
– More scalable as the interconnection networks for shared-
nothing systems are usually designed such that their
transmission capacity increases as more nodes are added.
• Disadvantages
– Cost of communication and accessing non-local disk is much
higher.
Hierarchical System XP

• Combines the characteristics of shared-memory, shared-disk, and

shared-nothing architectures. Also called nonuniform memory
architecture (NUMA).
• At the top level, the system consists of nodes that are connected
by an interconnection network and do not share disks or memory
with one another (shared nothing).
• Hierarchically below, each node of the system could be a system
with a few processors of type shared-memory/ shared-disk.
XP
Comparison - Architectures for Parallel Database Systems

• The basic problem with the shared-memory &

shared-disk architectures is interference.
• As more CPUs are added, existing CPUs are slowed
down because of the increased contention for
memory access and network bandwidth.
• Shared-nothing is the best architecture for large
systems.
Data Partitioning XP

• Distributes data over a number of processing elements

• Each processing element is then executed
simultaneously with other processing elements,
thereby creating parallelism
• Can be physical or logical data partitioning
Types of Partitioning XP

Round-Robin Range Partitioning Hash Partitioning

Partitioning
Comparison of Partitioning Techniques XP

• Evaluate how well partitioning techniques

support the following types of data access:
–Scanning the entire relation.
–Locating a tuple associatively – point queries. E.g., r.A
= 25.
–Locating all tuples such that the value of a given
attribute lies within a specified range – range queries.
E.g., 10  r.A < 25.
Round-Robin Partitioning XP

▪ Each record in turn is allocated to a processing element in a clockwise

manner
▪ If there are n processors, the ith tuple is assigned to processor i mod n in
round-robin partitioning.
▪ “Equal partitioning” or “Random-equal partitioning”
▪ Data evenly distributed, hence supports load balance
▪ But data is not grouped semantically
▪ Best suited for sequential scan of entire relation on each query
Hash Partitioning XP

▪ A hash function is used to partition the data

▪ Data is grouped semantically, that is data on the same group share the
same hash value

▪ Good for point queries on partitioning attribute

Range Partitioning XP

▪ Spreads the records based on a given range of the partitioning attribute

▪ Processing records on a specific range can be directed to certain
processors only
▪ Well suited for point and range queries
Comparison between Partitioning Techniques XP

• Round-robin partitioning is efficient for queries that access the entire

set of data.
• If only a subset of the tuples is required, hash partitioning and range
partitioning are better than round-robin partitioning.
• They enables the user to access only those disks that contain
matching tuples.
• Hash partitioning scheme is best suited for point queries based on
the partitioning attribute.
• Range partitioning is well suited for point and range queries.

• [Point queries. • Consider a selection query with a single equality in

the condition:]
Object Database
Systems
XP
Need for Complex Data Types
▪ Relational DBMSs are not good enough for today’s advanced
database applications
▪ RDBMS support a small, fixed collection of data types (e.g. int,
date, string), which are adequate for traditional application
domains such as administrative data processing.
▪ In many application domains, more complex kind of data must be
handled. Such complex data has to be stored in OS files or
specialized data structures, rather than in a DBMS. Examples of
domains with complex data are CAD/CAM, multimedia
repositories, document management, etc
▪ To support such applications, a DBMS must support complex data
types.
XP
Need for Complex Data Types
▪ E.g. addresses, an entire address can be viewed as an
atomic data item of string type, but this will hide details
such as street, city, zip code which can be of interest to
queries. If they are represented by breaking into parts then
queries will be complicated. Better alternative is to allow
structured data type, allow a type address with subparts
city, street etc.
▪ Multivalued attributes: solution to it is creating a new
relation but it is expensive
XP
Advanced Database Applications
▪ Computer-Aided Design (CAD)
▪ Computer-Aided Manufacturing (CAM)
▪ Computer-Aided Software Engineering (CASE)
▪ Multimedia applications: audio, graphic, video data need to handle.
▪ Real time control systems
▪ Network Management Systems
▪ Geographic Information Systems (GIS)
▪ Office Information Systems (OIS) and Multimedia Systems
▪ Digital Publishing
▪ Interactive and Dynamic Web sites
▪ Other applications with complex and interrelated objects and procedural data.
XP
Weaknesses of RDBMSs
▪ Poor Representation of "Real World" Entities
▪ Limitation in Encapsulating Data (Structure) with Operations
(Behavior)
▪ Limitation in Dealing with Composition, Aggregation,
Generalization-Specialization.
▪ Homogeneous data structure and Limited operations.
▪ Normalization sometimes lead to relations which do not exist, or
correspond, to entities in the real world. This compounds on the
‘join’ feature of query processing.
▪ The many to many relationship is difficult to express.
XP
Object Database Systems
▪ Object-oriented concepts have been evolved: to overcome
the limitations of RDBMS and to handle complex types of
applications which led to the development of object-
database systems.

▪ Object database systems have developed along 2 distinct

paths:
▪ Object-oriented database systems
▪ Object-relational database systems
XP
Evolution of Database Management Systems
▪ 1st Generation
▪ Hierarchical
▪ Network

▪ 2nd Generation
▪ Relational Database

▪ 3rd Generation
▪ Object Oriented
Database
▪ Object-Relational
Database
ORDBMS
RDBMS - ORDBMS XP
XP
ORDBMS
▪ An object-relational database is a database management system
similar to a relational database, but with an object-oriented
database model: objects, classes and inheritance are directly
supported in database schemas as well as within the query
language. Ex. Oracle, PostgreSQL
▪ Features of ORDBMS –
▪ User Data Types (Abstract Data Type)
▪ Inheritance
▪ Encapsulation
XP
User Data Types (Abstract/ Structured Data Type)
• ORDBMSs allow users to define datatypes, functions and operators.
• Object Type

create table rectangle (

length side;
breadth side;
);
XP
Structured Data Types
• use type constructors to generate new types
• Collection types
– set(T): multiset – array(T), T[][] – list(T)
• Row types (composite type)
– row (Col1 T1, ..., Colk Tk) – Named row type
• Reference Types – Ref(T)
• All first-class types!
Object Type XP

• Object types are abstractions of the real-world entities—for example -

customers, purchase orders—that application programs deal with. An
object type is a schema object with three kinds of components:
– A name, which serves to identify the object type uniquely within that
schema
– Attributes, which model the structure and state of the real-world
entity. Attributes are built-in types or other user-defined types.
– Methods, which are functions or procedures written in PL/SQL or Java
and stored in the database, or written in a language such as C and
stored externally. Methods implement operations the application can
perform on the real-world entity.
• An object type is a template(Can’t store data in it) . A structured data unit
that matches the template is called an object.
XP
Abstract Data Type
▪ Combination of an atomic data type & its associated methods is
called an abstract data type ADT.
▪ Traditional SQL has Built-in ADTs like int, float, string, etc.
▪ These types have simple methods associated with them
(arithmetic, comparison, equality, LIKE, etc.)
▪ ORDBMS allows the user to create their own ADTs if a type
cannot be naturally defined in terms of the built-in types.
▪ The label abstract is applied to these types because the DBMS
doesn’t need to know how an ADTs data is stored or how
methods work.
▪ It merely needs to know what methods are available & I/P & O/P
types for the methods.
Abstract Data Type XP

• Underneath the object layer, data is still stored in

columns and tables, but you can work with the data in
terms of the real-world entities. Ex – customer,
purchase order
• A client-side application can request a purchase order
from the server and receive all the relevant data in a
single transmission. The application can then, without
knowing storage locations or implementation details,
navigate among related data items without further
transmissions from the server.
XP
Encapsulation XP

• Hiding ADT internals is called as encapsulation (bind function + data)

• Methods are procedures or functions can be encapsulated within the
defined object so that applications can use to perform operations on
the attributes of the object type.
• Methods are optional. They define the behavior of objects of that type.
Inheritance XP

• In all ORDBMS, inheritance of user defined types is supported to

derive new sub-types or sub-classes which would therefore inherit
all attributes and methods from the parent type.

shape_tp

eshape_tp
Inheritance XP

▪ In object-database systems, inheritance is supported directly and allows

type definitions to be reused and refined very easily.
▪ It can be very helpful when defining similar but slightly different classes.
▪ Inheritance allows us to capture ‘specialization’ explicitly in the database
design.
▪ Inheritance creates explicit relationship in the database between the
sub-type and the super-type
▪ CREATE TYPE sub_type UNDER super_type ;
▪ Sub-type inherits attributes & methods of super-type
▪ Methods defined on super-type apply to objects of type sub-type, but
not vice versa.
Inheritance XP

▪ An object of the subtype is also considered to be an object of the

supertype.
▪ This means that any operations that apply to the supertype also apply
to the subtype. This is expressed in the following principle:
▪ The substitution Principle : Given a supertype A and a subtype B, it is
always possible to substitute an object of type B into a legal
expression written for objects of type A, without producing type
errors.
▪ This enables the code reuse; as queries & methods written for the
supertype can be applied to the subtype without modification.
Binding Methods XP

▪ Registering a new method with the same name as an old

method is called overloading the method name.
▪ Because of overloading, the system must understand which
method is intended in a particular expression.
▪ The process of deciding which method to invoke is called binding
the method to the object.
▪ Early Binding: Binding is done when an expression is parsed
▪ Late Binding: Binding is done at runtime
Object Identifier (OID) XP

• System generated identifier which is assigned, when a new

object is created.
• Every object has unique identity. In an object oriented system,
when object is created OID is assigned to it.
• Properties of OID
– Uniqueness: OID cannot be same to every object in the
system and it is generated automatically by the system.
– Invariant: OID cannot be changed throughout its entire
lifetime.
– Invisible: OID is not visible to user.
Object Identifier (OID) XP

• Oracle assigns to each row object a unique system-generated OID, 16

bytes in length, that is automatically indexed for efficient OID-based
lookups.
• The OID column is the equivalent of having an extra 16-byte primary
key column.
• The object identifier column is a hidden column that Oracle uses to
construct references to the row objects.
▪ OID value should be used only once, that is even the object is
removed from the database its OID should not be assigned to another
object.
▪ OID cannot be modified by the user.
▪ OID s are not dependent on physical location of data.
URLs vs OIDs XP

• OIDs uniquely identify a single object over all time, whereas

web resource pointed at by URL can change over time
• OIDs are simply identifiers & carry no physical information
about the objects they identify-storage location of object can be
changed without changing pointers to object. URLs include
network addresses & file system names, hence resource can’t
be moved.
• OIDs are automatically generated by DBMS for each object.
URLs are user generated.
• Deletion for URLs create problems- ‘404 page not found error’.
OID deletion handled by system (as per referential integrity)
XP

REFs
▪ Built-in data type that stores value of OID.
▪ A REF is a logical “pointer” to a row object which is a system built-
in datatype.
▪ You can use a REF to examine or update the object it refers to.
▪ The REF function in a SQL statement takes as an argument a
correlation name for an object table or view and returns a
reference (a REF) to an object instance from that table or view.
SELECT REF(a)
FROM address_tab a;
REF(A)
000028020928C64A64E9C54BB1BB42DB7BB1431B893E2BEE81A8084FC393AC13EA8856534E02
C01AA20000
0000280209B2B5E415A13F4EF8A3352F686E1C4DCE3E2BEE81A8084FC393AC13EA8856534E02
C01AA20001
REFs XP

▪ In the relational model, foreign keys express many-to-one

relationships.
▪ ORDBMS allows relationships between two entities to be modeled
in the database via the concept of a reference.
▪ Oracle provides REF to encapsulate references to row objects of a
specified object type. From a modelling perspective, REFs provide
the ability to capture an association between two row objects.
Dangling REFs XP

▪ It is possible for the object identified by a REF to become

unavailable through either deletion of the object or a change in
privileges. Such a REF is called dangling.
▪ Oracle SQL provides a predicate (called IS DANGLING) to allow
testing REFs for this condition.
Dereference REFs XP

▪ Accessing the object referred to by a REF is called

dereferencing the REF. Oracle provides the DEREF operator to
do this. Dereferencing a dangling REF results in a null object.
▪ The DEREF function in a SQL statement returns the object
instance corresponding to a REF.

SELECT deref(REF(a))
FROM address_tab a
DEREF(REF(A))(ADDRESS1, ADDRESS2, CITY, PINCODE, STATE)
ADDRESS_TY('Sea Woods Apt', 'Malad', 'Mumbai', 400101, 'Maharashtra')
ADDRESS_TY('river Woods Apt', 'Malad', 'Mumbai', 400101, 'Rajasthan')
XP
Advantages of ORDBMSs

▪ Enriched modelling capabilities

▪ Extensibility
▪ Reuse and Sharing
▪ Support for schema evolution
▪ Applicable for advanced database applications
▪ Improved performance
Disadvantages of ORDBMSs XP

▪ Complexity
▪ Increased costs
▪ Supporters of relational approach believe simplicity
and purity of relational model are lost
▪ Some believe RDBMS is being extended for what will
be a minority of applications
▪ SQL now extremely complex
XP
RDBMS Vs ORDBMS
• In object oriented model, the object has both state and behavior,
while with the relational model only the state is evidenced.
• Relational database is made up of relations, while an object-
oriented database is made up of classes
• In ORDBMS class hierarchy includes inheritance, while RDBMS
includes external keys
• In the relational model there is a primary key that refer the related
data, while object oriented model there is an object identifier
(OID) that refer the inherited data.
• RDBMS is easy to use as it has fewer features
• RDBMS has simple data model which makes it easier to optimize
queries for efficient execution.
• RDBMS is less versatile than an ORDBMS
End of Unit 1
Data is a precious thing and
will last longer
than the systems themselve
- Tim Berners-Lee
Practical (Defining Object Type ) XP

CREATE TYPE<object_type_name> AS CREATE TYPE address_ty AS OBJECT (

OBJECT street VARCHAR2(30),
( city VARCHAR2(20),
<attribute_l><datatype>, state CHAR(2),
. postal_code VARCHAR2(6) );
);
Practical (Creating Table) XP

CREATE TYPE address_ty AS OBJECT ( CREATE TABLE employee (

street VARCHAR2(30), employee_id NUMBER(6),
city VARCHAR2(20), person person_ty;
state CHAR(2),
);
postal_code VARCHAR2(6) );

CREATE TYPE person_ty AS OBJECT(

name VARCHAR2(20)
address address_ty) );

▪ Data cannot be inserted in addess_ty or peson_ty as they

Practical (Inserting Values) XP

• Insert incorporate calls to the constructors for object

types, to create instances of the types.

insert into employee values(101,person_ty('Rita,

address_ty(‘Thakur Marg', 'Mumbai', 'Maharashtra’,
400092));
Practical (Querying) XP

• Select * from employees;

Will display in format
employee person_ty (name, address_ty (street, city , state,
_id postal_code)
101 person_ty('Rita, address_ty(‘Thakur Marg', 'Mumbai',
'Maharashtra’, 400092))
Practical (Querying specific columns) XP

• SET DESCRIBE [DEPTH n | ALL]

• Sets the depth of the level to which you can recursively describe
an object.

• Describe employee Name Null? Type

EMPLOYEE_ID NUMBER
PERSON PERSON_TY

• set describe depth 3; Name Null? Type

EMPLOYEE_ID NUMBER
describe employee PERSON PERSON_TY NAME VARCHAR2(25)
ADDRESS ADDRESS_TY STREET VARCHAR2(30)
CITY VARCHAR2(20)
PINCODE NUMBER(8)
STATE VARCHAR2(15)
Practical XP

• To show the details of column_name, data_type for the table

you created.
SELECT COLUMN_NAME, DATA_TYPE FROM
USER_TAB_COLUMNS WHERE TABLE_NAME= ‘employee’;

• To show the details (attr_name,length,attr_type) of the

attributes for the ADT you have created.
SELECT ATTR_NAME, LENGTH, ATTR_TYPE_NAME FROM
USER_TYPE_ATTRS WHERE TYPE_NAME = ‘person_ty'
Sample questions XP

– Distributed & centralized database

– Parallel & distributed databases
– Synchronous and asynchronous replication
– Homogeneous and heterogeneous DDBMS
– Explain architecture of parallel DBMS. With real time example based on it. ---2016-KT, 2015-Rev
– Define speed up and scale up? (2 important issues related to parallel machines)
– Why shared memory and shared disk approaches suffer from the problem of interference?
Discuss speed up and scale up of shared nothing architecture.
– What are advantages and disadvantages of replication and fragmentation?
– RDBMS Vs ORDBMS
– OODBMS Vs ORDBMS
– What are various complex data types available in ORDBMS? Explain with suitable example
– Types of Data partitioning or Data Partitioned parallelism

Unit 4 Distributed DBMS by ANS
No ratings yet
Unit 4 Distributed DBMS by ANS
12 pages
Distributed DBMS Architecture
No ratings yet
Distributed DBMS Architecture
49 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
23 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
16 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
73 pages
Lecture3-Distributed Introduction
No ratings yet
Lecture3-Distributed Introduction
38 pages
Ddbms Unit 1 Part1
No ratings yet
Ddbms Unit 1 Part1
23 pages
Parallel & Distributed DBMS Guide
No ratings yet
Parallel & Distributed DBMS Guide
58 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
19 pages
ADT Unit 1 To 5
No ratings yet
ADT Unit 1 To 5
160 pages
Unit-Iii Distributed Database: System
No ratings yet
Unit-Iii Distributed Database: System
55 pages
Chapter 6 Distributed System Management
No ratings yet
Chapter 6 Distributed System Management
12 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
24 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
39 pages
Distributeddbms Er. Inderjeet Bal
No ratings yet
Distributeddbms Er. Inderjeet Bal
60 pages
Chapter - 6 Distributed Database System
No ratings yet
Chapter - 6 Distributed Database System
50 pages
1 Distributed DB
No ratings yet
1 Distributed DB
67 pages
ADBMS
No ratings yet
ADBMS
84 pages
DDB-distribution Database Important.
No ratings yet
DDB-distribution Database Important.
15 pages
Chapter 7
No ratings yet
Chapter 7
22 pages
Distributed Database Recovery Methods
No ratings yet
Distributed Database Recovery Methods
58 pages
Distribution Database
No ratings yet
Distribution Database
52 pages
Unit-1 Introduction To DDBMS
No ratings yet
Unit-1 Introduction To DDBMS
50 pages
Distributed Database Systems Guide
0% (1)
Distributed Database Systems Guide
54 pages
Understanding Database Administration Roles
No ratings yet
Understanding Database Administration Roles
4 pages
Distributed Database Management Guide
No ratings yet
Distributed Database Management Guide
55 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
22 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Distributed Database System
No ratings yet
Distributed Database System
9 pages
Database II: Distributed Databases
No ratings yet
Database II: Distributed Databases
15 pages
Distributed
No ratings yet
Distributed
83 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
33 pages
Topic 7 - Distributed Database Systems
No ratings yet
Topic 7 - Distributed Database Systems
44 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
9 pages
10-Distributed Databases Lecturer 3 Best
No ratings yet
10-Distributed Databases Lecturer 3 Best
55 pages
Distributed Systems
No ratings yet
Distributed Systems
25 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Distributed Database Design
88% (8)
Distributed Database Design
85 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
32 pages
Distributed Database: Source
No ratings yet
Distributed Database: Source
19 pages
Adbms Chapter 7 Ddbms
No ratings yet
Adbms Chapter 7 Ddbms
73 pages
Types of Distributed Data Base System - 49724
No ratings yet
Types of Distributed Data Base System - 49724
37 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Distributed Database Management Systems For Information Management and Access
No ratings yet
Distributed Database Management Systems For Information Management and Access
6 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
13-Distributed Databases
No ratings yet
13-Distributed Databases
12 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
58 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
Parallal Databases
No ratings yet
Parallal Databases
4 pages
Distributed Database Management System
No ratings yet
Distributed Database Management System
36 pages
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
No ratings yet
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
24 pages
MC4202 - Adavanced Database Technology
No ratings yet
MC4202 - Adavanced Database Technology
159 pages
Topic 5
No ratings yet
Topic 5
14 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Fernanda Carrera Matteo Alem Das Palavras Familia Fontana Livro
100% (1)
Fernanda Carrera Matteo Alem Das Palavras Familia Fontana Livro
2 pages
ASTM D891 Gravedad Específica de Líquidos Industriales
No ratings yet
ASTM D891 Gravedad Específica de Líquidos Industriales
4 pages
Wittur Eco Plus Drive - Manual
No ratings yet
Wittur Eco Plus Drive - Manual
80 pages
MD 04 MT RSSTA PD 025 Rev.05 Bolted Flange Joints Management
No ratings yet
MD 04 MT RSSTA PD 025 Rev.05 Bolted Flange Joints Management
46 pages
KAT-B 3717 MOK Edition2 01-06-2015 EN
No ratings yet
KAT-B 3717 MOK Edition2 01-06-2015 EN
11 pages
Maths Curriculum Year 7 2020 2021
100% (1)
Maths Curriculum Year 7 2020 2021
1 page
Sanskrit Hi
No ratings yet
Sanskrit Hi
11 pages
NOVEL DRUG DELIVERY SYSTEM Unit Wise Questions
100% (1)
NOVEL DRUG DELIVERY SYSTEM Unit Wise Questions
4 pages
KPMG UK - Quantifying Risk Culture
No ratings yet
KPMG UK - Quantifying Risk Culture
5 pages
Turkish Straits Passage Guidelines
No ratings yet
Turkish Straits Passage Guidelines
4 pages
Identifying Audit Universe of Auditable Activities - Executive Summar
No ratings yet
Identifying Audit Universe of Auditable Activities - Executive Summar
11 pages
Extrusion and Oilseed Processing: Expanders To Prepare Solvent Extraction
No ratings yet
Extrusion and Oilseed Processing: Expanders To Prepare Solvent Extraction
6 pages
Establishing a New Cashew Farm Guide
No ratings yet
Establishing a New Cashew Farm Guide
30 pages
hs486 Document OSHA Recording Work-Related Injuries and Illnesses
No ratings yet
hs486 Document OSHA Recording Work-Related Injuries and Illnesses
13 pages
Daily Pre-Start Safety Checklist
100% (1)
Daily Pre-Start Safety Checklist
10 pages
Schneider Electric SpaceLogic-KNX MTN6513-1203
No ratings yet
Schneider Electric SpaceLogic-KNX MTN6513-1203
7 pages
As 3891.2-2008 Air Navigation - Cables and Their Supporting Structures - Marking and Safety Requirements Mark
0% (2)
As 3891.2-2008 Air Navigation - Cables and Their Supporting Structures - Marking and Safety Requirements Mark
7 pages
VRLA Battery Equalization: Tests & Results
No ratings yet
VRLA Battery Equalization: Tests & Results
8 pages
Iit Research Papers+Computer Science
No ratings yet
Iit Research Papers+Computer Science
5 pages
Envicool Cabinet Air Conditioner Service Manual
100% (3)
Envicool Cabinet Air Conditioner Service Manual
22 pages
2059 s14 Ms 1 PDF
No ratings yet
2059 s14 Ms 1 PDF
11 pages
पारंगत प्रश्न पत्र-2020 व 2021
No ratings yet
पारंगत प्रश्न पत्र-2020 व 2021
8 pages
How to Create Client Subscriptions
No ratings yet
How to Create Client Subscriptions
9 pages
Klausula CyberIT
No ratings yet
Klausula CyberIT
1 page
Copperplate Guidesheets (Curious Medium)
No ratings yet
Copperplate Guidesheets (Curious Medium)
28 pages
Civil War Overview: Key Events & Figures
No ratings yet
Civil War Overview: Key Events & Figures
26 pages
Union Engineering CO2 Recovery RBU Plants
No ratings yet
Union Engineering CO2 Recovery RBU Plants
2 pages
Scissor Jack Report
77% (22)
Scissor Jack Report
26 pages
Pages From MAS FHR With Cabinets & Accessories - Beddown Project Rev01
No ratings yet
Pages From MAS FHR With Cabinets & Accessories - Beddown Project Rev01
2 pages
Key Management Functions & Strategies
No ratings yet
Key Management Functions & Strategies
2 pages

Module 2

Uploaded by

Module 2

Uploaded by

MM-MCA-14

Advanced Database system

Reference Reference Name

Reference No Reference Name

Architecture for Parallel Databases, Types of 1

ORDBMS: Structured Data Types, Operations on 1

Objects, OIDs and Reference Types, Object oriented 1

• Sharing data:- Users at one site may be able to access data

• Various sites co-operate in transparent manner.

• System possesses ‘location independence’ i.e. the user is

• Distributed Nature of Organizational Units

• Scalability− Expansion of system (to new locations or new units) simply

• Need for complex and expensive software − DDBMS demands

• The sites use very similar software.

• Different sites use dissimilar schemas and software.

• A client-server system has one or more client processes and

• It is relatively simple to implement due to its clean separation

• It does not allow a single query to span multiple servers

• Designed to run a single query on multiple servers.

• The middleware architecture is designed to allow a single query

• The middleware layer is capable of executing joins & other

• The process of dividing the database into a smaller multiple

• Horizontal fragmentation divides a relation(table)

• Divides a relation vertically into groups of columns to create

• Allows a relation to be split so that tuples are located

• A relation or fragment of a relation is said to be

▪ Availability: failure of site containing relation r does

▪ Increased cost of updates: each replica of relation r

Parallel Systems Distributed Systems

• All the data is maintained at a single site

• Parallel databases improve processing and input/output speeds

• A coarse-grain parallel machine consists of a

▪ Throughput - the number of tasks that can be completed in a

• Goals of parallel processing - Linear Scaleup & Linear

• Running a given task in less time by increasing the degree

• Scaleup is the ability to keep the same performance

• Speedup and scaleup are often sublinear due to:

• Multiple CPUs are attached to an interconnection network and

• Each CPU has its own private memory

• Each processor has its own local

• Combines the characteristics of shared-memory, shared-disk, and

• The basic problem with the shared-memory &

• Distributes data over a number of processing elements

Round-Robin Range Partitioning Hash Partitioning

• Evaluate how well partitioning techniques

▪ Each record in turn is allocated to a processing element in a clockwise

▪ A hash function is used to partition the data

▪ Good for point queries on partitioning attribute

▪ Spreads the records based on a given range of the partitioning attribute

• Round-robin partitioning is efficient for queries that access the entire

• [Point queries. • Consider a selection query with a single equality in

▪ Object database systems have developed along 2 distinct

create table rectangle (

• Object types are abstractions of the real-world entities—for example -

• Underneath the object layer, data is still stored in

• Hiding ADT internals is called as encapsulation (bind function + data)

• In all ORDBMS, inheritance of user defined types is supported to

▪ In object-database systems, inheritance is supported directly and allows

▪ An object of the subtype is also considered to be an object of the

▪ Registering a new method with the same name as an old

• System generated identifier which is assigned, when a new

• Oracle assigns to each row object a unique system-generated OID, 16

• OIDs uniquely identify a single object over all time, whereas

▪ In the relational model, foreign keys express many-to-one

▪ It is possible for the object identified by a REF to become

▪ Accessing the object referred to by a REF is called

▪ Enriched modelling capabilities

CREATE TYPE<object_type_name> AS CREATE TYPE address_ty AS OBJECT (

CREATE TYPE address_ty AS OBJECT ( CREATE TABLE employee (

CREATE TYPE person_ty AS OBJECT(

▪ Data cannot be inserted in addess_ty or peson_ty as they

• Insert incorporate calls to the constructors for object

insert into employee values(101,person_ty('Rita,

• Select * from employees;

select e.employee_id “Employee Id" ,[Link]

• SET DESCRIBE [DEPTH n | ALL]