0% found this document useful (0 votes)

21 views86 pages

Unit V

Uploaded by

aryakadam348

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views86 pages

Unit V

Uploaded by

aryakadam348

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Database Management System

D E PA RTM EN T O F I N F O R M AT I O N T E C H N O L O G Y

Database Management Systems 1

Unit V

Advanced techniques, Databases

and applications

Database Management Systems 2

Syllabus

 Advanced techniques, Databases and applications Database

Architecture (2): Centralized, Client-Server, Parallel, Distributed
and Database Connectivity
 Decision Support Systems(1): Data Warehousing, ,Data Mining
and Knowledge discovery, BI

 Big Data & NoSQL(3): Big Data Analytics: Introduction,

Application, Challenges Hadoop, XML, JSON , Structured vs
Unstructured Databases
 NoSQL(1): CAP Theorem and BASE Properties, NoSQL
Databases.

Database Management Systems 3

Database Architecture

 Following are different database architecture

• Centralized
• Client-Server
• Parallel
• Distributed
• Cloud Databases.

Database Management Systems 4

Centralized Database Architecture
 A centralized database is stored at a
single location such as a mainframe
computer.
 Maintained and modified from that
location only.
 Accessed using an internet
connection such as a LAN or WAN.
 General-purpose computer system:
one to a few CPUs and a number of
device controllers that are connected
through a common bus that provides
access to shared memory.

Fig: 1 Centralized Database Architecture

Database Management Systems 5

Centralized Database Architecture
(Contd.)
 Single-user system (e.g., personal computer or workstation):
desk-top unit, single user, usually has only one CPU and one
or two hard disks; the OS may support only one user.
 Multi-user system: more disks, more memory, multiple
CPUs, and a multi-user OS. Serve a large number of users
who are connected to the system via terminals. Often called
server systems.
 The centralized database is used by organizations such as
colleges, companies, banks etc.

Database Management Systems 6

Advantages
 The data integrity get maintained as the whole database is
stored at a single physical location.
 The data redundancy is minimal as all the data is stored
together and not scattered across different locations.
 The centralized database is much more secure as all data is
avail on single location only.
 Data is easily portable because it is stored at the same place.
 The centralized database is cheaper than other types of
databases as it requires less power and maintenance.

Database Management Systems 7

Disadvantages
 Searching data in database is time consuming as all data is at
centralized location.
 There is a lot of data access traffic for the centralized
database. This may create a bottleneck situation.
 If there are no database recovery measures in place and a
system failure occurs, then all the data in the database will be
destroyed.

Database Management Systems 8

Client-Server Systems
 Client - Server architecture of
database system has two logical
components namely client, and
server.
 Clients are generally personal
computers or workstations
whereas server is large
workstations, mini range computer
system or a mainframe computer
system.
 The applications and tools of
Fig: 2 Client Server Architecture
DBMS run on one or more client
platforms, while the DBMS
software's reside on the server.

Database Management Systems 9

Client-Server Systems (contd…)

 The server computer is called

backend and manages access
structures, query evaluation and
optimization, concurrency control and
recovery.
 The client's computer is called front
end which consists of tools such as
forms, report-writers, and graphical
user interface facilities.
 The interface between the front-end
and the back-end is through SQL or
through an application program
interface.
Fig: 3 Example of Client Server
Architecture  Above concept is depicted in figure:

Database Management Systems 10

Advantages
Better functionality for the cost
Flexibility in locating resources and expanding
facilities
Better user interfaces
Easier maintenance

Database Management Systems 11

Disadvantages
 Programming cost is high in client/server environments,
particularly in initial phases.
 There is a lack of management tools for diagnosis,
performance monitoring and tuning and security control, for
the DBMS, client and operating systems and networking
environments.

Database Management Systems 12

Parallel Database System
 It’s become impossible to manage voluminous data using
centralized system, as simplest of simple query will consume lots
of time.
 So the solution to such system is Parallel Database System.
 In this system database is distributed among multiple processors
possibly to perform queries in parallel.
 Uses share resources for handling massive data just to increase the
performance of the whole system.

Database Management Systems 13

Types of Parallel Database System
 Following figures shows various parallel database systems which
can able to handle data through distribution and parallel query
execution.
• Shared Memory Architecture
• Shared Disk Architecture
• Shared Nothing Architecture
• Hierarchical System Architecture

Database Management Systems 14

Shared Memory Database System
Architecture
 Single memory is shared
among many processors as
show in Figure.
 As shown in the figure, several
processors are connected
through an interconnection
network with Main memory and
disk setup.
 Here interconnection network
is usually a high speed network
(may be Bus, Mesh, or
Hypercube) which makes data
Fig: 4 Shared Memory Database System
sharing (transporting) easy
Architecture among the various components
(Processor, Memory, and Disk).

Database Management Systems 15

Shared Disk Database System
Architecture
 In Shared Disk architecture, single
disk or single disk setup is shared
among all the available processors
and also all the processors have
their own private memories as
shown in Figure.

Fig: 5 Shared Disk Database System

Architecture

Database Management Systems 16

Shared Nothing Database System
Architecture
Every processor has its own memory
and disk setup.
 This setup may be considered as
set of individual computers
connected through high speed
interconnection network using
regular network protocols and
switches for example to share data
between computers.

Fig: 6 Shared Nothing Database System

Architecture

Database Management Systems 17

Hierarchical Database System
Architecture
 It’s is combination of all the above
said Database System architecture.
 So it has all futures of Shared
Memory, Share Disk and Shared
Nothing architecture.
 In figure, we are using shared nothing
architecture at the top level, where
nodes are connected using
interconnection network, and do not
share any kind of memory or disk.
 Some nodes of the system are using
Shared Memory approach with few
processors.
Fig: 7 Hierarchical Database System  And some other nodes are using
Architecture
Shared Disk Architecture

Database Management Systems 18

Advantages
 Hierarchical Database System is more flexible system, as it is
combination of both Shared Disk and Shared Memory architecture.
 Greater performance as uses combination of Shared Disk and
Shared Memory, so data may be access directly from Memory and
not through disk to speed up the system.
 Less costly, as uses shared disk system and cost for disk is used
for other system enhancement.
 System is more persistent than other systems.

Database Management Systems 19

Distributed Database System

 It is collection of multiple, logically interrelated databases

distributed over a computer network.
 Logically it’s looks like:

Database Management Systems 20

Distributed Database System

 Increased Reliability and availability

• Reliability is basically defined as the probability that a system is
running at a certain time whereas Availability is defined as the
probability that the system is continuously available during a time
interval.
 Easier Expansion
• In a distributed environment expansion of the system in terms of
adding more data, increasing database sizes, or adding more data,
increasing database sizes or adding more processor is much easier.
 Improved Performance
• We can achieve inter-query and intra-query parallelism by executing
multiple queries at different sites by breaking up a query into a
number of sub-queries that basically executes in parallel which
basically leads to improvement in performance

Database Management Systems 22

Disadvantages
 The distributed database is quite complex
 This database is more expensive as it is complex and hence,
difficult to maintain.
 As it is distributed system it requires database to be more
secure and each and ever node should be secure as well.
 Data integrity and Data Redundancy will not be maintained.

Database Management Systems 23

Decision Support System
 DSS is a systems that help managers
•to optimize the process of making decision;
•by quickly retrieve efficient information from
multiple blocks of data (raw data, transactions,
…).
 DSS has 3 fundamental Components:
• ETL tools (Extract, Transformed and
Load)
• Which means Extract needed data from
different sources; Transform and adjust
the data; Load the selected data into the
system (database, data warehouse …).
• Storage tool
•Mainly Databases and Data warehouses.
Fig: 8 Components of Decision Support • OLAP ( Online Analytical Processing)
System • Which offers interactive and complex
multidimensional data queries with rapid
execution time.
Database Management Systems 24
Data Warehousing
 A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of
management's decision making process.
 Subject-Oriented: A data warehouse
is used to analyze specific subjects. For
example, “sales” can be a particular
subject.
 Integrated: A data warehouse
integrates data coming from many data
sources. Thus, there will be only one
way to identify a specific data.
 Time-Variant: Historical data is
maintained in a data warehouse, since
the history of data can help decision
making.
Fig: 9 Generic Data Warehouse
Architecture  • Non-volatile: Each data integrated in
the data warehouse, will never be
modified or changed.

Database Management Systems 25

Datamining

 It is process of examining large

amount of data (i.e. Big Data)
collected in systematic way.
 It analyzes the data from many
different dimensions or angles in
order to extract useful
information from it.
 It is also called as an Knowledge
Discovery.
 Following are different phases
of Data Mining Process.
Fig: 10 Data Mining Process

Database Management Systems 26

Phases of Data Mining
 Problem analysis : In this step, we must select and analyze a
complicated problem and whose resolution will provide
competitive advantages to the company.
 Data analysis : This phase is used to evaluate data quality, detect
inadequacies and analyze distributions and combinations.
 Data preparation: Once available data sources are identified, they
need to be cleaned in order to eliminate any noise.
 Modeling: during this step, modeling techniques should be
selected to create one or more models. Then, these model should
be tested for checking their validity.
 Evaluation: the created models should be interpreted and
evaluated to verify if they meet business needs.

Database Management Systems 27

Business Intelligence
 “The processes, technologies and
tools needed to turn data into
information and information into
knowledge and knowledge into plans
that support decision making.
 BI encompasses data warehousing,
business analytics and knowledge
management.”
 A term is used for to improve
business decision making by using
fact based support system with
concepts and methods.

Fig: 11 Business Intelligence Architecture

Database Management Systems 28

Benefits of Business Intelligence
 Improve Management
Processes – planning,
controlling, measuring and/or
changing resulting in increased
revenues and reduced costs
 Improve Operational Processes
–fraud detection, order
processing, purchasing..
resulting in increased revenues
and reduced costs
 Predict the Future

Fig: 12 Benefits of Business Intelligence

Database Management Systems 29

Difference between Two Tier and Three Tier System

Three Tier Architecture

Two Tier Architecture
Software Engineering and Project Management 30
Difference between Two Tier and Three Tier System

Architectur Client -Server Architecture Web -based

e Type application
Working Client will hit request directly to Here in between client
server and client will get and server middle ware
response directly from will be there, if client hits
server,The direct a request it will go to the
communication takes place middle ware and middle
between client and server. There ware will send to server
is no intermediate between and vice versa.
client and server. Because of
tight coupling a 2 tiered
application will run faster.
Layers 2-tier means 3-tier means
1.Design layer/Client Application 1.Design layer
(Client Tier) /presentation
2.Data layer/Database (Data 2.Business layer or Logic
Tier) layer / data access tier
3.Data layer / data tier.

Software Engineering and Project Management 31

Difference between Two Tier and Three Tier System

Architecture Client -Server Web -based application

Type Architecture

Security Less secured as Highly secured as client is not allowed

client can talk to to talk to database directly
database
directly
Scalability Poor Excellent as requests can be load
balanced between servers
Reusability Mostly clients Reusability more with services
are monolothic implementation
and thereby
reusability not
possible

Software Engineering and Project Management 32

Difference between Two Tier and Three Tier System

Architecture Type Client -Server Web -based application

Architecture
Advantages: 1.Easy to maintain and 1.Better Re-usability.
modification is bit 2.Improve Data Integrity.
easy. 3.Improved Security – Client is not
2.Communication is direct access to database.
faster. 4.Forced separation of user
interface logic and business logic.
5.Business logic sits on small
number of centralized machines
(may be just one).
6.Easy to maintain, to manage, to
scale, loosely coupled etc.
Disadvantages: 1.In two tier 1. Increase Complexity/Effort
architecture
application
performance will be
degrade upon
increasing the users.
2.Cost-ineffective.
Software Engineering and Project Management 33
Big Data- Introduction
 Data Sources
• The magnitude of data generated and shared by businesses,
public administrations numerous industrial and not-to-profit
sectors, and scientific research
 Data Formats
• These data include textual content (i.e. structured, semi-
structured as well as unstructured), to multimedia content (e.g.
videos, images, audio) on a multiplicity of platforms (e.g.
machine-to-machine communications, social media sites,
sensors networks, cyber-physical systems, and Internet of
Things [IoT]).

Database Management Systems 34

Big Data- Introduction
 Data Amount
 Every day the world produces around
• 2.5 quintillion bytes of data(i.e.1 exabyte equals 1 quintillion
bytes or 1 exabyte equals 1 billiongigabytes), with 90% of these
data generated in the world being unstructured.
• Gantz and Reinsel (2012)assert that by 2020, over 40
Zettabytes(or 40 trillion gigabytes)

Database Management Systems 35

Big Data Characteristics

VOLUME
Growing quantity of data
e.g. social media, behavioral, T Y
video I E
R
VA

Quickening speed of data

VELOCITY
e.g. smart meters, process
monitoring
Gartner, Feb 2001
Increase in types of data
e.g. app data, unstructured
data
Database Management Systems 36
Volume

Volume
• Petabytes,
Exabytes of
data
• Volumes too
great for typical
DBMS
Volume - Bytes Defined

eBay data warehouse (2010) = 10

eBay will increase this 2.5 times

by 2011

Teradata > 10 PB

Megabyte: 220 bytes or, loosely, one Gigabyte: 230 bytes or, loosely one billion
million bytes bytes
5-38
Velocity

Velocity
• Massive
amount of
streaming
data
Variety

Variety
• Massive sets of
unstructured/se
mi-structured
data from Web
traffic, social
media, sensors,
and so on
Big Data Challenges
 Some of these challenges are a function of the characteristics of
BD,
 Some, by its existing analysis methods and models, and some,
through the limitations of current data processing system
 Decision-making of what data are generated and collected
 Issues of privacy
 And ethical considerations relevant to mining such data
 Infrastructure's high costs

Database Management Systems 41

Big Data Challenges
 The broad challenges of Big Data can be grouped into three main
categories, based on the data life cycle(Big data characteristics):

Data challenges Process challenges and Management challenges

Velocity Data acquisition and warehousing Privacy

:
Variability Data mining and cleansing Security
:
Visualization Data aggregation and integration Data governance
Value Data analysis and modelling Data and information sharing
Data interpretation Cost/operational expenditures
Data ownership

Database Management Systems 42

Big Data Challenges
 Data challenges relate to the characteristics of the data itself (e.g. data
volume, variety, velocity, veracity, volatility, quality, discovery)
 Process challenges are related to series of how techniques: how to
capture data, how to integrate data, how to transform data, how to
select the right model for analysis and how to provide the results.
 Management challenges cover for example privacy, security,
governance and ethical aspects

Database Management Systems 43

Big Data Analytics

Database Management Systems 44

Big Data Analytics
 Descriptive analytics scrutinizes data and information to define the
current state of a business situation in away that developments,
patterns and exceptions become evident, in the form of producing
standard reports, ad hoc reports, and alerts.
 Inquisitive analytics is about probing data to certify/reject business
propositions, for example, analytical drill downs into data, statistical
analysis, factor analysis.
 Predictive analytics is concerned with forecasting and statistical
modelling to determine the future possibilities

Database Management Systems 45

Big Data Analytics
 Prescriptive analytics is the stage where the predictions are used to
prescribe (or recommend) the next set of things to be done. It is about
optimization and randomized testing to assess how businesses
enhance their service levels
 Pre-emptive analytics is about having the capacity to take
precautionary Actions on events that may undesirably influence the
organizational Performance, for example, identifying the possible
perils and Recommending mitigating strategies far ahead in time

Database Management Systems 46

Analytics Models
How can we
make it happen?
Prescriptive
What will
Analytics
happen?
Predictive n
Why did it
Analytics i z atio
VALUE

happen? tim
Op
What Diagnostic i ght
Analytics res
happened? Fo

Descriptive i ght
I ns
Analytics

s i ght
d
ti o
n Hin
a
fo rm
In

DIFFICULTY
47
47
Big Data Applications

Database Management Systems 48

Hadoop

Database Management Systems 49

Hadoop

Database Management Systems 50

Hadoop
 The core of Apache Hadoop consists of a storage part, known as
Hadoop Distributed File System (HDFS), and
a processing part which is a MapReduce programming model.
 Hadoop splits files into large blocks and distributes them across
nodes in a cluster.
 It then transfers packaged code into nodes to process the data in
parallel.

Database Management Systems 51

Hadoop
 Hadoop HDFS : [IBM has alternative file system for Hadoop with
name GPFS] where Hadoop stores data a file system that spans all
the nodes in a Hadoop cluster links together the file systems on
many local nodes to make them into one large file system that
spans all the data nodes of the cluster
 Hadoop MapReduce v1 : an implementation for large-scale data
processing.

Database Management Systems 52

Hadoop
 MapReduce engine consists of :
 JobTracker : receive client applications jobs and send orders to the
TaskTrackes who are nearest to the data as possible.
 TaskTracker: exists on cluster's nodes to receive the orders from
JobTracker
 YARN (it is the newer version of MapReduce):
each cluster has a Resource Manager, and each data node runs a
Node Manager.
 For each job, one slave node will act as the Application Master,
monitoring resources/tasks, etc.

Database Management Systems 53

Hadoop
 Hadoop is good for:
 Processing massive amounts of data through parallelism
 Handling a variety of data (structured, unstructured, semi-
structured)
 Using inexpensive commodity hardware
 Hadoop is not good for:
• Processing transactions (random access)
• When work cannot be parallelized
• Fast access to data
• Processing lots of small files
• Intensive calculations with small amounts of data

Database Management Systems 54

HDFS
 Hadoop Distributed File System (HDFS) principles
 Distributed, scalable, fault tolerant, high throughput
 Data access through MapReduce
 Files split into blocks (aka splits)
 3 replicas for each piece of data by default
 Can create, delete, and copy, but cannot update
 Designed for streaming reads, not random access
 Data locality is an important concept: processing data on or near
the physical storage to decrease transmission of data

Database Management Systems 55

HDFS: architecture

 Master / Slave architecture

 NameNode
• Manages the file system namespace and metadata
• Regulates access to files by clients
 DataNode
• Many DataNodes per cluster
• Manages storage attached to the nodes
• Periodically reports status to NameNode
• Data is stored across multiple nodes
• Nodes and components will fail, so for reliability data is
replicated across multiple nodes

Database Management Systems 56

Types of Data
Data can be broadly classified into four types:
• Structured Data:
 Have a predefined model, which organizes data into a form that is relatively easy to
store, process, retrieve
and manage
 E.g., relational data
• Unstructured Data:
 Opposite of structured data
 E.g., Flat binary files containing text, video or audio
 Note: data is not completely devoid of a structure (e.g., an audio file may still have
an encoding structure and some metadata associated with it)
• Dynamic Data:
 Data that changes relatively frequently
 E.g., office documents and transactional entries in a financial database
• Static Data:
 Opposite of dynamic data, E.g. Medical imaging data from MRI or CT scans
Database Management Systems 57
Why Classifying Data?
 Segmenting data into one of the following 4 quadrants can
help in designing and developing a pertaining storage
solution
Dynamic Static
StructuredUnstructured

Media Production, Media Archive,

eCAD, mCAD, Office Broadcast, Medical
Docs Imaging

Transaction Systems,
BI, Data Warehousing
ERP, CRM

 Relational databases are usually used for structured data

 File systems or NoSQL databases can be used for (static),
unstructured data (more on these later)
Database Management Systems 58
XML
 XML is case sensitive
 All start tags must have end tags
 Elements must be properly nested
 XML declaration is the first statement
 Every document must contain a root element
 Attribute values must have quotation marks
 Certain characters are reserved for parsing

Database Management Systems 59

Building Blocks of XML
 Elements (Tags) are the primary components of XML documents.

<AUTHOR id = 123>
<FNAME> JAMES</FNAME>
<LNAME> RUSSEL</LNAME>
</AUTHOR>
<!- I am comment ->
 Attributes provide additional information about Elements.
Values of the Attributes are set inside the Elements
 Comments stats with <!- and end with ->

Database Management Systems 60

Example of XML
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Book>

Database Management Systems 61

JSON
 JSON stands for “JavaScript Object Notation”, a simple data
interchange format.
 It began as a notation for the world wide web.
 Since JavaScript exists in most web browsers, and JSON is based
on JavaScript, it’s very easy to support there.
 However, it has proven useful enough and simple enough that it is
now used in many other contexts that don’t involve web surfing.

Database Management Systems 62

JSON data structures

• Object:  String:
{ "key1": "value1", "This is a string"
"key2": "value2" }
• Boolean:
• Array:
true
[ "first", "second",
"third" ] false
• Number: • null:
42 null
3.1415926

Database Management Systems 63

The CAP Theorem
 The limitations of distributed databases can be described in the so
called the CAP theorem
 Consistency: every node always sees the same data at any
given instance (i.e., strict consistency)
 Availability: the system continues to operate, even if
nodes in a cluster crash, or some hardware or software
parts are down due to upgrades
 Partition Tolerance: the system continues to operate in the
presence of network partitions

CAP theorem: any distributed database with shared data, can have at most two
of the three desirable properties, C, A or P

Database Management Systems 64

BASE
 BASE (Basically Available, Soft state, Eventual consistency).
BASE system gives up on consistency of a distributed system.

Database Management Systems 65

The BASE Properties
 The CAP theorem proves that it is impossible to guarantee strict
Consistency and Availability while being able to tolerate network
partitions
 This resulted in databases with relaxed ACID guarantees
 In particular, such databases apply the BASE properties:
 Basically Available: the system guarantees Availability
 Soft-State: the state of the system may change over time
 Eventual Consistency: the system will eventually
become consistent
Eventual Consistency
A database is termed as Eventually Consistent if:
 All replicas will gradually become consistent in the
absence of updates

Webpage-A
Webpage-A Webpage-A

Event: Update Webpage-

Webpage-A A
Webpage-A

Webpage-A
Eventual Consistency:
A Main Challenge
But, what if the client accesses the data from
different replicas?

Webpage-A
Webpage-A Webpage-A

Event: Update Webpage-

A
Webpage-A
Webpage-A

Webpage-A

Protocols like Read Your Own Writes (RYOW) can be applied!

No-SQL
 Stands for No-SQL or Not Only SQL??
 Class of non-relational data storage systems
• E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema nor do they use the
concept of joins
 Distributed data storage systems
 All NoSQL offerings relax one or more of the ACID properties
(will talk about the CAP theorem)

Database Management Systems 69

NoSQL Data Storage: Classification

 Uninterrupted key/value or ‘the big hash table’.

• Amazon S3 (Dynamo)
 Flexible schema
• BigTable, Cassandra, HBase (ordered keys, semi-structured
data),
• Sherpa/PNuts (unordered keys, JSON)
• MongoDB (based on JSON)
• CouchDB (name/value in text)

Database Management Systems 70

NoSQL Databases
To this end, a new class of databases emerged, which
mainly follow the BASE properties
 These were dubbed as NoSQL databases
 E.g., Amazon’s Dynamo and Google’s Bigtable

Main characteristics of NoSQL databases include:

 No strict schema requirements
 No strict adherence to ACID properties
 Consistency is traded in favor of Availability
Types of NoSQL Databases
Here is a limited taxonomy of NoSQL databases:

NoSQL Databases

Document Graph Key-Value Columnar

Stores Databases Stores Databases
Document Stores
 Documents are stored in some standard format or encoding (e.g., XML,
JSON, PDF or Office Documents)
 These are typically referred to as Binary Large Objects
(BLOBs)
 Documents can be indexed
 This allows document stores to outperform traditional file
systems
 E.g., MongoDB and CouchDB (both can be queried using MapReduce)
Types of NoSQL Databases
 Here is a limited taxonomy of NoSQL databases:

NoSQL Databases

Document Graph Key-Value Columnar

Stores Databases Stores Databases
Graph Databases
 Data are represented as vertices and edges
00
Id:1 : knows /03 Id: 2
l 0
Labe : 2001/1 Name: Bob
e
Sinc Age: 22

01 r
Id:1 : knows /03 be 4
el 1 0 m
Lab : 2001/ e
m / 02 /
1
Id: 1 nc e 5 _
Si 10 is 11 rs
Name: Alice Id:1 Id: bel: : 20 be
Age: 18 Lab 3
0
La nce 0 4 Mem
el: M i :1 :
emb S Id abel
ers L

Id:1 Id: 3
La b 0 2 Name: Chess
Sin el: is_ Type: Group
ce: m
2 0 0 e mb e
5/0 r
7/0
 Graph databases are powerful for graph-like queries (e.g., find the
1

shortest path between two elements)

 E.g., Neo4j and VertexDB
Types of NoSQL Databases
Here is a limited taxonomy of NoSQL databases:

NoSQL Databases

Document Graph Key-Value Columnar

Stores Databases Stores Databases
Key-Value Stores
 Keys are mapped to (possibly) more complex value (e.g., lists)
 Keys can be stored in a hash table and can be distributed easily
 Such stores typically support regular CRUD (create, read, update, and
delete) operations
 That is, no joins and aggregate functions
 E.g., Amazon DynamoDB and Apache Cassandra
Types of NoSQL Databases
Here is a limited taxonomy of NoSQL databases:

NoSQL Databases

Document Graph Key-Value Columnar

Stores Databases Stores Databases
Columnar Databases
 Columnar databases are a hybrid of RDBMSs and Key-Value stores
 Values are stored in groups of zero or more columns, but in Column-Order
(as opposed to Row-Order)

Record 1 Column A Column A = Group A

Alice 3 25 Bob Alice Bob Carol Alice Bob Carol

4 19 Carol 0 3 4 0 25 3 25 4 19
45 19 45 0 45
Column Family {B, C}
Row-Order Columnar (or Column-Order) Columnar with Locality Groups

 Values are queried by matching keys

E.g., HBase and Vertica
NoSQL Databases
 Key-value stores
• The simplest of the NoSQL databases, key-value stores have each item
in the database stored as an attribute name together with its value. Riak,
Voldemort, and Redis are examples.
 Wide-column stores
• Cassandra and HBase are examples of this type of database where data is
stored together in columns.
 Document databases
• MongoDB is the most well-known document database. These types of
databases store data in documents.
 Graph databases
• Neo4J and HyperGraphDB are popular examples of graph database
which are useful for data about networks.

Database Management Systems 80

NoSQL Products
 CassandraCassandra
 CouchDB
 Hadoop & Hbase
 MongoDB
 StupidDBStupidDB
 Etc.

Database Management Systems 81

Common Advantages of NoSQL
Systems
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore identical and fault-
tolerant) and can be partitioned
 When data is written, the latest version is on at least one node and
then replicated to other nodes
 No single point of failure
 Easy to distribute
 Don't require a schema

Database Management Systems 82

What does NoSQL Not Provide?
 Joins
 Group by
 But PNUTS provides interesting materialized view approach to
joins/aggregation.
 ACID transactions
 SQL
 Integration with applications that are based on SQL

Database Management Systems 83

NoSQL
 NoSQL Data storage systems makes sense for applications that
need to deal with very very large semi-structured data
• Log Analysis
• Social Networking Feeds
 Most of us work on organizational databases, which are not that
large and have low update/query rates
• regular relational databases are the correct solution for such
applications

Database Management Systems 84

Some Data and Metadata Standards
Relational XML JSON
Metadata Data Definition Language XML Schema XSD (W3C) , JSON Schema - IETF
(ISO) Namespaces (W3C)

Constraints Integrity Constraints in table Schematron (ISO),

definitions (ISO) RelaxNG (OASIS)
Triggers Relational triggers
Data Exchange SQL standard (ISO) defines an XML is a syntax widely JSON is a serialized
Serializations XML serialization but it is not used XML, Turtle for data format, there are
widely used – There is no exchange
Currently, (W3C) XMLforrepresentations
the role of JSON is mainly
agreed JSON serialization. of JSON too
data exchange between JavaScript
There are RDF serializations. clients and servers

Annotations Not part of the relational Many kinds of annotations

model are defined for XML
schemas and for XML data

Query & CRUD Data Manipulation Language XPath, XQuery (W3C) and JAQL, JSONiq
Languages (DML), SQL, SQL/XML (ISO) SPARQL (W3C) others for
CRUD

Database Management Systems 85

Some Data and Metadata Standards
Relational XML JSON
Query & CRUD Many for various Various, includes some of
APIs programming languages, e.g., the relational APIs and
JDBC, ODBC specific APIs, e.g., XQJ

Collection Table, View, Database (ISO) XML Collection Function

(W3C)
Transformation SQL (Tables to Tables) XSLT (XML to text, includes JavaScript
& Other - XML); XForms SQL
Languages XMLTABLE (XML to
relational)

Database Management Systems 86

Notes - 1071 - MCA-20-23 Unit - 4.1
No ratings yet
Notes - 1071 - MCA-20-23 Unit - 4.1
48 pages
Unit - I DBMS
No ratings yet
Unit - I DBMS
74 pages
Parallel & Distributed DBMS Guide
No ratings yet
Parallel & Distributed DBMS Guide
58 pages
Distributed Database Systems Guide
0% (1)
Distributed Database Systems Guide
54 pages
ADBMS
No ratings yet
ADBMS
31 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
23 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Chapter - 6 Distributed Database System
No ratings yet
Chapter - 6 Distributed Database System
50 pages
Parallal Databases
No ratings yet
Parallal Databases
4 pages
Distributed Database Management Guide
No ratings yet
Distributed Database Management Guide
55 pages
Database Management Systems
No ratings yet
Database Management Systems
73 pages
Distributed DBMS Architecture
No ratings yet
Distributed DBMS Architecture
49 pages
DBMS Full Notes
No ratings yet
DBMS Full Notes
99 pages
Distributed Database Design: Basics
No ratings yet
Distributed Database Design: Basics
18 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
73 pages
Lecture3-Distributed Introduction
No ratings yet
Lecture3-Distributed Introduction
38 pages
10-Distributed Databases Lecturer 3 Best
No ratings yet
10-Distributed Databases Lecturer 3 Best
55 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
Chapter One-Spatial Database Mangment 4th Yaer
No ratings yet
Chapter One-Spatial Database Mangment 4th Yaer
22 pages
Understanding Database Management Systems
No ratings yet
Understanding Database Management Systems
45 pages
Week 02 - Creating DB Env 2020
No ratings yet
Week 02 - Creating DB Env 2020
45 pages
DMC Theory Study Material 2
No ratings yet
DMC Theory Study Material 2
50 pages
Distributed Database Design
88% (8)
Distributed Database Design
85 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
63 pages
Parallel and Distributed Databases
No ratings yet
Parallel and Distributed Databases
7 pages
Database Coursework
No ratings yet
Database Coursework
9 pages
Distributeddbms Er. Inderjeet Bal
No ratings yet
Distributeddbms Er. Inderjeet Bal
60 pages
Database System
No ratings yet
Database System
59 pages
Parallel Databases
No ratings yet
Parallel Databases
23 pages
DBMS NOTES - Docx MINI
No ratings yet
DBMS NOTES - Docx MINI
15 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
Unit 01
No ratings yet
Unit 01
73 pages
P24CDMCA4 Unit2
No ratings yet
P24CDMCA4 Unit2
15 pages
Database Management System Guide
No ratings yet
Database Management System Guide
9 pages
Chapter 7
No ratings yet
Chapter 7
22 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
17 pages
Distributed Database Management System
No ratings yet
Distributed Database Management System
36 pages
Module 2 Cont...
No ratings yet
Module 2 Cont...
38 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
24 pages
Understanding Distributed Database Management
No ratings yet
Understanding Distributed Database Management
24 pages
Dbms Unit-1 and 2, 3,4,5 Notes
No ratings yet
Dbms Unit-1 and 2, 3,4,5 Notes
229 pages
ECEG-4191 Database Systems
No ratings yet
ECEG-4191 Database Systems
28 pages
Course Material Faculty PDF
No ratings yet
Course Material Faculty PDF
19 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
33 pages
Adbms Chapter 7 Ddbms
No ratings yet
Adbms Chapter 7 Ddbms
73 pages
DataBase Types
No ratings yet
DataBase Types
23 pages
WEEK 4 DP 3rd Term Year 11 Parallel
No ratings yet
WEEK 4 DP 3rd Term Year 11 Parallel
29 pages
Book Summary
No ratings yet
Book Summary
31 pages
9 - The Database Design Part-5
No ratings yet
9 - The Database Design Part-5
15 pages
Unit 1 2
No ratings yet
Unit 1 2
76 pages
RDBMS Notes - 3
No ratings yet
RDBMS Notes - 3
18 pages
Distributed Database Management Systems For Information Management and Access
No ratings yet
Distributed Database Management Systems For Information Management and Access
6 pages
DBMS
No ratings yet
DBMS
22 pages
Fdbms Unit 1
No ratings yet
Fdbms Unit 1
78 pages
Database Systems Overview
No ratings yet
Database Systems Overview
3 pages
Understanding Database Administration Roles
No ratings yet
Understanding Database Administration Roles
4 pages
DB Lec 07
No ratings yet
DB Lec 07
20 pages
Pricing Strategy Assignment
No ratings yet
Pricing Strategy Assignment
17 pages
Pega Certifications - Pega Academy
No ratings yet
Pega Certifications - Pega Academy
2 pages
CRM Research Methodology Overview
No ratings yet
CRM Research Methodology Overview
15 pages
Rajesh
No ratings yet
Rajesh
28 pages
Rugged Security Cameras for Extremes
No ratings yet
Rugged Security Cameras for Extremes
29 pages
Thivya Ramesh Resume
No ratings yet
Thivya Ramesh Resume
2 pages
Data Analytics and AI Strategy Toolkit - Overview
100% (1)
Data Analytics and AI Strategy Toolkit - Overview
23 pages
Manage KPIs and Reports
No ratings yet
Manage KPIs and Reports
43 pages
CHRMP Framework Documents v3
No ratings yet
CHRMP Framework Documents v3
79 pages
AWS Solutions for Telco and Media Industries
No ratings yet
AWS Solutions for Telco and Media Industries
15 pages
Pawan Internship Synopsis Report
No ratings yet
Pawan Internship Synopsis Report
10 pages
IBM Netcool Operations Insight Version 1.4 Deployment Guide
100% (1)
IBM Netcool Operations Insight Version 1.4 Deployment Guide
292 pages
Business Management System Research Paper Publish
No ratings yet
Business Management System Research Paper Publish
17 pages
Kavita Bhatt Resume
No ratings yet
Kavita Bhatt Resume
1 page
2609 BDA Final
No ratings yet
2609 BDA Final
23 pages
Civil Service HR Transformation
No ratings yet
Civil Service HR Transformation
21 pages
2024 Sales Operations Leadership Vision
No ratings yet
2024 Sales Operations Leadership Vision
16 pages
Generative AI Use Cases for Enterprise Data
No ratings yet
Generative AI Use Cases for Enterprise Data
10 pages
Master’s Programs at Westerdals Oslo ACT
No ratings yet
Master’s Programs at Westerdals Oslo ACT
17 pages
Corporate Actions Case Study 2018
No ratings yet
Corporate Actions Case Study 2018
2 pages
SAP HANA Studio & Data Services Guide
No ratings yet
SAP HANA Studio & Data Services Guide
69 pages
DandA Go-To-Market Strategy - Crawl, Walk & Run
No ratings yet
DandA Go-To-Market Strategy - Crawl, Walk & Run
7 pages
Ashutosh Kumar CV
No ratings yet
Ashutosh Kumar CV
1 page
Chapter 8 Technology-Based Training Techniques Part 3 Submit - Tagged
No ratings yet
Chapter 8 Technology-Based Training Techniques Part 3 Submit - Tagged
17 pages
Final Research Article
No ratings yet
Final Research Article
20 pages
Quality Management with Statistical Methods
No ratings yet
Quality Management with Statistical Methods
18 pages
Erp System Literature Review
100% (3)
Erp System Literature Review
6 pages
DAT Freight and Analysis (Detail of Its Each Page)
No ratings yet
DAT Freight and Analysis (Detail of Its Each Page)
9 pages
Wa0000.
No ratings yet
Wa0000.
2 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
45 pages

Unit V

Uploaded by

Unit V

Uploaded by

Database Management System

Database Management Systems 1

Advanced techniques, Databases

Database Management Systems 2

 Advanced techniques, Databases and applications Database

 Big Data & NoSQL(3): Big Data Analytics: Introduction,

Database Management Systems 3

 Following are different database architecture

Database Management Systems 4

Fig: 1 Centralized Database Architecture

Database Management Systems 5

Database Management Systems 6

Database Management Systems 7

Database Management Systems 8

Database Management Systems 9

 The server computer is called

Database Management Systems 10

Database Management Systems 11

Database Management Systems 12

Database Management Systems 13

Database Management Systems 14

Database Management Systems 15

Fig: 5 Shared Disk Database System

Database Management Systems 16

Fig: 6 Shared Nothing Database System

Database Management Systems 17

Database Management Systems 18

Database Management Systems 19

 It is collection of multiple, logically interrelated databases

Database Management Systems 20

 Data stored at a number of sites

 Increased Reliability and availability

Database Management Systems 22

Database Management Systems 23

Database Management Systems 25

 It is process of examining large

Database Management Systems 26

Database Management Systems 27

Fig: 11 Business Intelligence Architecture

Database Management Systems 28

Fig: 12 Benefits of Business Intelligence

Database Management Systems 29

Three Tier Architecture

Architectur Client -Server Architecture Web -based

Software Engineering and Project Management 31

Architecture Client -Server Web -based application

Security Less secured as Highly secured as client is not allowed

Software Engineering and Project Management 32

Architecture Type Client -Server Web -based application

Database Management Systems 34

Database Management Systems 35

Quickening speed of data

eBay data warehouse (2010) = 10

eBay will increase this 2.5 times

Database Management Systems 41

Data challenges Process challenges and Management challenges

Velocity Data acquisition and warehousing Privacy

Database Management Systems 42

Database Management Systems 43

Database Management Systems 44

Database Management Systems 45

Database Management Systems 46

Database Management Systems 48

Database Management Systems 49

Database Management Systems 50

Database Management Systems 51

Database Management Systems 52

Database Management Systems 53

Database Management Systems 54

Database Management Systems 55

 Master / Slave architecture

Database Management Systems 56

Media Production, Media Archive,

 Relational databases are usually used for structured data

Database Management Systems 59

Database Management Systems 60