Database Management System
• Modules:
• Database System Concepts & Architecture
• Data Modeling-Relational and ER modeling
• DB Design-Schema Refinement
• Physical DB Design & Query Processing
• Transaction Processing
• Concurrency control & Recovery Techniques
• NOSQL DB management
• Recent Trends
Database Management System
Module-1
Database system concepts and Architecture
Module-1: Database system
concepts and Architecture
• Need for database systems
• Characteristics of Database Approach
• Advantages of using DBMS approach
• Actors on the Database Management Scene
• Database Administrator
• Classification of database management systems
• Data Models
• Schemas and Instances
• Three-Schema Architecture
• The Database System Environment
• Centralized and Client/Server Architectures for DBMSs
• Overall Architecture of Database Management Systems
Database System Applications
• DBMS contains information about a particular enterprise
• Collection of interrelated data
• Set of programs to access the data
• An environment that is both convenient and efficient to use
• Database Applications:
• Banking: all transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Online retailers: order tracking, customized recommendations
• Manufacturing: production, inventory, orders, supply chain
• Human resources: employee records, salaries, tax
deductions
• Databases touch all aspects of our lives
What Is a DBMS?
• A very large, integrated collection of data.
• Models real-world enterprise.
• Entities (e.g., students, courses)
• Relationships (e.g., Madonna is taking CS564)
• A Database Management System (DBMS) is a software
package designed to store and manage databases.
Why Use a DBMS?
• Data independence and efficient access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
Files vs. DBMS
• Application must stage large datasets
between main memory and secondary
storage (e.g., buffering, page-oriented
access, 32-bit addressing, etc.)
• Special code for different queries
• Must protect data from inconsistency
due to multiple concurrent users
• Crash recovery
• Security and access control
Purpose of Database Systems
• In the early days, database applications were built
directly on top of file systems
• Drawbacks of using file systems to store data:
• Data redundancy and inconsistency
• Multiple file formats, duplication of information in
different files
• Difficulty in accessing data
• Need to write a new program to carry out each
new task
• Data isolation — multiple files and formats
• Integrity problems
• Integrity constraints (e.g. account balance > 0)
become “buried” in program code rather than
being stated explicitly
• Hard to add new constraints or change existing
ones
Purpose of Database Systems (Cont.)
• Drawbacks of using file systems (cont.)
• Atomicity of updates
• Failures may leave database in an inconsistent state
with partial updates carried out
• Example: Transfer of funds from one account to
another should either complete or not happen at all
• Concurrent access by multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to
inconsistencies
• Example: Two people reading a balance and updating it at the same time
• Security problems
• Hard to provide user access to some, but not all, data
• Database systems offer solutions to all the above problems
Summary
• DBMS used to maintain, query large datasets.
• Benefits include recovery from system crashes,
concurrent access, quick application
development, data integrity and security.
• Levels of abstraction give data independence.
• A DBMS typically has a layered architecture.
• DBAs hold responsible jobs
and are well-paid!
• DBMS R&D is one of the broadest,
most exciting areas in CS.
View of Data/Data Abstraction
External Level
Logical Level
DATA Physical Level
Which song,
which album
mmmmm ?
Physical Level
Logical Level
External Level
Can listen only
one song at an
instance…
but not all songs
together
Three-Schema Architecture
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe physical
storage structures and access paths (e.g indexes).
• Typically uses a physical data model.
• Conceptual schema at the conceptual level to describe the
structure and constraints for the whole database for a
community of users.
• Uses a conceptual or an implementation data model.
• External schemas at the external level to describe the various
user views.
• Usually uses the same data model as the conceptual schema.
Physical Level:
• Describes HOW data are actually stored
• It deals with the data structures used in physical
• It is otherwise called as Internal level
Logical Level:
• Describes WHAT data is stored in database and their relationships
among data
• This level is usually designed by using ER diagrams
• It is otherwise called as Conceptual Level, Schema level
External Level:
• Only part of the database can be viewed at a time but not the entire
database
• Database has many views
• It is otherwise called as Subschema level, View level
Levels of Abstraction
• Physical level: describes how a record (e.g., customer)
is stored.
• Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : string;
end;
• View level: application programs hide details of data
types. Views can also hide information (such as an
employee’s salary) for security purposes.
Data Independence
• Changing data or structure in one level without effecting
higher levels
• Applications insulated from how data is structured and
stored.
• Logical data independence: Protection from changes in
logical structure of data.
• Physical data independence: Protection from changes in
physical structure of data.
One of the most important benefits of using a DBMS
Physical Data Independence
External Level
Logical Level
Physical Level
Logical Data Independence
External Level
Logical Level
Physical Level
Data Models
• A collection of tools for describing
• Data
• Data relationships
• Data semantics
• Data constraints
• Relational model
• Entity-Relationship data model (mainly for
database design)
• Object-based data models (Object-oriented and
Object-relational)
• Semi structured data model (XML)
• Other older models:
• Network model
• Hierarchical model
Reg.No
First Name of Last Name of
Student Student
Middle Name of
Student
• Both network and hierarchical models are
involved with more number of pointers.
• A single misplaced pointer corrupts the entire
database.
• Relational Model is free from pointers
Data Models
The relational model of data is the most
widely used model today.
• Main concept: relation, basically a table with
rows and columns.
• Every relation has a schema, which describes
the columns, or fields.
Table name Attribute names
Relational Model
Product
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
Photograph
SingleTouch $149.99 Canon
y
MultiTouch $203.99 Household Hitachi
Tuples or rows
• A tuple/ row is a combination of different column values of
one real world object.
• Number of columns represents degree of the table
• Number of tuples represents cardinatily of the table
• Tuple is identified uniquely by using primary key
Note: The term “Relation” is abstract and “Table” is concrete
Instances and Schemas
• Similar to types and variables in programming
languages
• Schema – the logical structure of the database
• Example: The database consists of information
about a set of customers and accounts and the
relationship between them)
• Analogous to type information of a variable in a
program
• Physical schema: database design at the
physical level
• Logical schema: database design at the
logical level
• Instance – the actual content of the database at
a particular point in time
Example of a Database Schema
Example of a database state
Example: University Database
• Conceptual schema:
• Students(sid: string, name: string, login:
string,
age: integer, gpa:real)
• Courses(cid: string, cname:string,
credits:integer)
• Enrolled(sid:string, cid:string,
grade:string)
• Physical schema:
• Relations stored as unordered files.
• Index on first column of Students.
• External Schema (View):
• Course_info(cid:string,enrollment:integer )
DATA BASE LANGUAGE
Data Manipulation Language (DML)
• Language for accessing and manipulating the data organized by
the appropriate data model
• DML also known as query language
• Two classes of languages
• Procedural – user specifies what data is required and how to
get those data
• Declarative (nonprocedural) – user specifies what data is
required without specifying how to get those data
• SQL is the most widely used query language
Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table account (
account_number char(10),
branch_name char(10),
balance integer)
• DDL compiler generates a set of tables stored in a data
dictionary
• Data dictionary contains metadata (i.e., data about data)
• Database schema
• Data storage and definition language
• Specifies the storage structure and access methods
used
• Integrity constraints
• Domain constraints
• Referential integrity (e.g. branch_name must
correspond to a valid branch in the branch table)
• Authorization
Relational Model
• Example of tabular data in the relational model
Attributes
A Sample Relational Database
Slide No:L3-4
SQL
• SQL: widely used non-procedural language
• Example: Find the name of the customer with
customer-id 192-83-7465
select customer.customer_name
from customer
where customer.customer_id = ‘192-83-7465’
• Example: Find the balances of all accounts held by the
customer with customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer_id = ‘192-83-7465’
and
depositor.account_number =
account.account_number
Slide No:L3-5
Data Base System Structure
Slide No:L6-2
Database Users /Actors on the Scene
Users are differentiated by the way they expect to interact
with
the system
• Application programmers – interact with system through
DML calls
• Sophisticated users – form requests in a database query
language
• Specialized users – write specialized database applications
that do not fit into the traditional data processing framework
• Naïve users – invoke one of the permanent application
programs that have been written previously
• Examples, people accessing database over the web, bank
tellers, clerical staff
Slide No:L4-1
Database Administrator
• Coordinates all the activities of the database
system
• has a good understanding of the enterprise’s
information resources and needs.
• Database administrator's duties include:
• Storage structure and access method definition
• Schema and physical organization modification
• Granting users authority to access the database
• Backing up data
• Monitoring performance and responding to
changes
• Database tuning
Slide No:L4-2
Tools Used
Data dictionary / repository:
• Used to store schema descriptions and other
information such as design decisions,
application program descriptions, user
information, usage standards, etc.
• Active data dictionary is accessed by DBMS
software and users/DBA.
• Passive data dictionary is accessed by
users/DBA only.
Slide 2- 42
Data storage and Querying
• Storage management
• Query processing
• Transaction processing
Slide No:L5-1
Storage Management
Storage manager is a program module that provides
the interface between the low-level data stored in the
database and the application programs and queries
submitted to the system.
•The storage manager is responsible to the following
tasks:
• Interaction with the file manager
• Efficient storing, retrieving and updating of data
•Issues:
• Storage access
• File organization
• Indexing and hashing
Slide No:L5-2
Query Processing
1.Parsing and translation
2. Optimization
3. Evaluation
Slide No:L5-3
Query Processing (Cont.)
• Alternative ways of evaluating a given query
• Equivalent expressions
• Different algorithms for each operation
• Cost difference between a good and a bad way of evaluating a
query can be enormous
• Need to estimate the cost of operations
• Depends critically on statistical information about relations
which the database must maintain
• Need to estimate statistics for intermediate results to compute
cost of complex expressions
Slide No:L5-4
Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component
ensures that the database remains in a
consistent (correct) state despite system
failures (e.g., power failures and operating
system crashes) and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions,
to ensure the consistency of the database.
Slide No:L5-5
Database Architecture
The architecture of a database systems is greatly
influenced by
the underlying computer system on which the
database is running:
• Centralized
• Client-server
• Parallel (multiple processors and disks)
• Distributed
Slide No:L6-1
Centralized and
Client-Server DBMS Architectures
Centralized DBMS
• Combines everything into single system
including- DBMS software, hardware,
application programs, and user interface
processing software.
• User can still connect through a remote
terminal – however, all processing is done at
centralized site.
Slide 2- 49
A Physical Centralized Architecture
Slide 2- 50
Basic 2-tier Client-Server
Architectures
• Specialized Servers with Specialized functions
• Print server
• File server
• DBMS server
• Web server
• Email server
• Clients can access the specialized servers as
needed
• Ex. Games and Music Players
• Fast to access
Slide 2- 51
Logical two-tier client server
architecture
Slide 2- 52
Clients
• Provide appropriate interfaces through a client
software module to access and utilize the
various server resources.
• Clients may be diskless machines or PCs or
Workstations with disks with only the client
software installed.
• Connected to the servers via some form of a
network.
• (LAN: local area network, wireless network,
etc.)
Slide 2- 53
DBMS Server
• Provides database query and transaction
services to the clients
• Relational DBMS servers are often called SQL
servers, query servers, or transaction servers
• Applications running on clients utilize an
Application Program Interface (API) to access
server databases via standard interface such as:
• ODBC: Open Database Connectivity standard
• JDBC: for Java programming access
• Client and server must install appropriate client
module and server module software for ODBC or
JDBC
Slide 2- 54
Two Tier Client-Server Architecture
• A client program may connect to several DBMSs,
sometimes called the data sources.
• In general, data sources can be files or other
non-DBMS software that manages data.
• Other variations of clients are possible: e.g., in
some object DBMSs, more functionality is
transferred to clients including data dictionary
functions, optimization and recovery across
multiple servers, etc.
Slide 2- 55
Three Tier Client-Server
Architecture
• Common for Web applications
• Intermediate Layer called Application Server or
Web Server:
• Stores the web connectivity software and the
business logic part of the application used to
access the corresponding data from the database
server
• Acts like a conduit for sending partially processed
data between the database server and the client.
• Three-tier Architecture Can Enhance Security:
• Database server only accessible via middle tier
• Clients cannot directly access database server
Slide 2- 56
Three-tier client-server architecture
Slide 2- 57
Database Application Architectures
(web browser)
Old Modern
Classification of DBMSs
• Based on the data model used
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
• Other classifications
• Single-user (typically used with personal
computers)
vs. multi-user (most DBMSs).
• Centralized (uses a single computer with one
database)
vs. distributed (uses multiple computers,
multiple databases)
Slide 2- 59
Variations of Distributed DBMSs
(DDBMSs)
• Homogeneous DDBMS
• Heterogeneous DDBMS
• Federated or Multidatabase Systems
• Distributed Database Systems have now come
to be known as client-server based database
systems because:
• They do not support a totally distributed
environment, but rather a set of database
servers supporting a set of clients.
Slide 2- 60
Cost considerations for DBMSs
• Cost Range: from free open-source systems to
configurations costing millions of dollars
• Examples of free relational DBMSs: MySQL, PostgreSQL,
others
• Commercial DBMS offer additional specialized modules, e.g.
time-series module, spatial data module, document module,
XML module
• These offer additional specialized functionality when
purchased separately
• Sometimes called cartridges (e.g., in Oracle) or blades
• Different licensing options: site license, maximum number of
concurrent users (seat license), single user, etc.
Slide 2- 61
Classification of Database
Systems
centralized database system: the DBMS and
database are stored at a single site that is used by
several other systems too
distributed database system: the actual database
and the DBMS software are distributed from various
sites that are connected by a computer network
heterogeneous distributed database
system: different sites might use different DBMS
software, but there is additional common software to
support data exchange between these sites
homogeneous distributed database systems: use
the same DBMS software at multiple sites
Classification of Database
Systems
multiuser database system: a database
management system which supports multiple users
concurrently
object-oriented data model: a database
management system in which information is
represented in the form of objects as used in object-
oriented programming
single-user database system: a database
management system which supports one user at a time
traditional models: data models that preceded the
relational model