Database Management
System
Computer Science, XII
St. Xavier’s College
DATA & INFORMATION
Computer Science
Data
▪ Collection of raw facts and
figures.
▪ Isolated, un-interpreted,
unrelated and unorganized.
▪ Building blocks of
information.
▪ Contains alphabets, numbers,
symbols, images.
Information
▪ Processed, organized,
structured form of data.
▪ Data with context.
▪ Must be useful.
Features of a Good Information
▪ Meaningful
▪ Accurate
▪ Relevant
▪ Complete
▪ Accessible
▪ Up-to-Date
Example: At a Restaurant
▪ A single customer’s bill amount is data.
▪ After collecting and interpreting multiple bills
over time:
○ Most popular items on menu.
○ Number of customers on specific days.
○ Profit/Loss.
DATABASE
Computer Science
Database
▪ Systematic collection of
interrelated data.
▪ Must be organized and structured.
▪ Can be easily accessed and
managed.
▪ Example: Customer Records, Phone
Directory, Result Sheet, Banking
Records.
Why Database Systems?
▪ To overcome Flat file system/File
processing system.
▪ Data security issues.
▪ Data sharing difficulties.
▪ Same data stored in multiple files.
▪ Interpretation complexity.
Purpose of a Database
▪ To store data.
▪ To access data.
▪ To maintain data.
▪ To update data.
▪ To secure data.
TERMINOLOGIES IN DATABASE
Computer Science
Some terminologies used in Database:
▪ Table: Arrangements of rows & columns.
▪ Field: Property or attribute of a table.
▪ Record (Tuple): Collection of interrelated
fields.
Some terminologies used in Database:
▪ Objects: Refers to various database components that
performs specific functions.
○ Table – Basic unit of storage; composed
rows and columns
○ View – Logically represents subsets of data
from one or more tables
○ Sequence – Generate unique numeric
values
Some terminologies used in Database:
▪ Objects: Refers to various database components that
performs specific functions.
○ Index – Improves data retrieval speed
○ Stored Procedure – Precompiled SQL tasks
○ Triggers – Automatically execute procedure
on events
○ Constraints – Enforce data integrity rules
Some terminologies used in Database:
▪ Keys: An attribute that helps to identify the records.
○ Candidate Key: All attributes that can provide
unique record.
○ Primary Key: Unique value in the table. Ex: Roll
Number
○ Alternate Key: A candidate key which is not
selected to be the primary key.
Some terminologies used in Database:
▪ Keys: An attribute that helps to identify the records.
○ Foreign Key: Value that provides link between two
tables.
○ Super Key: Combination of multiple fields to make
unique value.
Some terminologies used in Database:
▪ Data Types
○ Numeric: int, float
○ Date & Time: Date, Time, Datetime
○ Character & String: char, varchar, text
○ Boolean (True/False): Boolean
▪ Data Dictionary: Contains metadata i.e. data
about the database.
DATABASE MANAGEMENT SYSTEM
(DBMS)
Computer Science
DBMS
▪ Set of programs that manages
database files. (create, access,
update, delete)
▪ Acts as interface between user
and data in the database.
▪ Examples: MySQL, SQL Server,
Oracle, DB Weaver,
Workbench, MS Access.
Traditional Database System
▪ Developed independently for
different organizations.
▪ Example: DB2, FOXPRO.
▪ Features:
○ Redundant Data
○ Inconsistent Data
○ Program Dependent
○ No integrity
Objective of DBMS
▪ Provide mass storage.
▪ Effective and efficient data storage and retrieval.
▪ Remove data redundancy.
▪ Provide prompt response to user’s request of data.
▪ Allow multiple users to access data concurrently.
▪ Protect data from unauthorized access.
Advantages of DBMS
▪ Data sharing
▪ Data integrity (consistent, complete, accurate,
up-to-date)
▪ Data security
▪ Data independence
▪ Reduce data redundancy
▪ Data availability
▪ Concurrent access of data
▪ Backup and Recovery
Disadvantages of DBMS
▪ Expensive (Hardware, Software, Maintenance)
▪ Changing technology
▪ Needs technical training
▪ Complex to understand and implement
▪ Issues faced if not properly designed database
▪ Chances of database failure
TYPES OF DATABASE MODEL
Computer Science
Database Model
▪ Describes the structure of a database.
▪ Defines how the data are stored and accessed.
▪ Characteristics:
○ Simplicity
○ Accuracy
○ Security
○ Scalability
○ Consistency
○ Flexibility
We will be discussing three types:
▪ Hierarchical Database Model
▪ Network Database Model
▪ Relational Database Model
Hierarchical Database Model
▪ Records logically organized into hierarchy of relationship.
▪ All the records in the hierarchy are called nodes.
▪ Each node is related to others in a parent-child relationship.
▪ Each parent may have one or more child records.
▪ Each child can have only one parent.
Inverted tree-like structure
Advantages of Hierarchical Model
▪ Easiest Model
▪ Searching is fast and easy if parent is known.
▪ Supports one-to-one and one-to-many relationships.
▪ Much secure, modification of child is not possible
without knowing parent.
Disadvantages of Hierarchical Model
▪ Outdated model.
▪ Not flexible for complex designs.
▪ Does not support many-to-many relationships.
▪ Too dependent on parent node.
▪ Increases data redundancy (If same data represents
different nodes)
Network Database Model
▪ Modified version of
Hierarchical Model.
▪ Each node may have several
parents.
▪ Higher flexibility.
Advantages of Network Model
▪ Accepts many-to-many relationship
▪ Easy and fast searching
▪ Reduce data redundancy
Disadvantages of Network Model
▪ One of the complex database models.
▪ Increases processing overhead.
▪ Less security.
Homework
▪ Do a research on differences between Hierarchical
Database Model and Network Database Model.
Relational Database Model
▪ Data organized into two-dimensional tables (relations)
with rows and columns.
▪ Displays data in tabular form.
▪ Implementation known as RDBMS.
▪ Relationships between multiple tables can be
established using Foreign Keys.
▪ Ex: Oracle, MySql.
Relational Database Model
Properties of RDBMS
▪ ACID Properties:
▪ Atomicity: The transaction takes place at once or
doesn’t happen at all.
▪ Consistency: Must be consistent before and after the
transaction.
▪ Isolation: Multiple transactions occurs
independently without interference.
▪ Durability: Changes of a successful transaction even
if system failure occurs.
Advantages of Relational Model
▪ Easy to use.
▪ One table can be linked to other tables easily.
▪ Faster data processing.
▪ Allows normalization of tables.
▪ Less redundancy
▪ Integrity rules easy to implement.
Disadvantages of Relational Model
▪ More complex than other models.
▪ Complex databases become isolated and system
dependent.
▪ Too many rules to make database.
▪ Maintenance issues.
ER Diagram
▪ Used for relational database structure
visualization.
▪ Useful for planning and organizing the database.
▪ Components:
▪ Entity
▪ Attribute
▪ Relationship
Entity
▪ Object or component of data.
▪ Represented as rectangle in ER Diagram.
▪ Ex: Student, College
Attribute
▪ Describes the property of an entity.
▪ Represented as oval in ER Diagram.
Side note: simple, composite, single-valued,
multi-valued, derived attributes.
Relationships
▪ Shows relationships among
entities.
▪ Represented by diamond in ER
diagram.
▪ Four types of relationships:
▪ One-to-One
▪ One-to-Many
▪ Many-to-One
▪ Many-to-Many
Example of ER Diagram
INTEGRITY CONSTRAINTS
Computer Science
Integrity Constraints
▪ Set of rules used to maintain quality of data.
▪ Ensures that data manipulation does not affect data
integrity.
▪ Used to guard against accidental damage to the
database.
Types
▪ Domain Constraints
▪ Entity Constraint
▪ Referential Constraint
▪ Key Constraint
Domain Constraints
▪ Definition of a valid set of values for an attribute.
▪ Value of attribute must be available in corresponding
domain.
Entity Integrity Constraints
▪ Primary key value can’t be null.
▪ A table can contain null value other than primary key field.
Referential Integrity Constraints
▪ If a Foreign Key refers to the
Primary Key of another
table, then every value of
Foreign Key must be
available as Primary Key in
other table.
Key Constraints
▪ Must be able to identify a record
uniquely.
NORMALIZATION
Computer Science
Normalization
▪ Rules introduced by E.F. Codd.
▪ Database design process.
▪ Complex database table is broken down into simple
separate tables.
▪ Used to reduce redundancy from the database table.
Why Normalization?
▪ To eliminate redundant data.
▪ To ensure data dependencies make sense. (only storing
related data in a table)
▪ To make data model more flexible and easier to maintain.
▪ To reduce the complexity of the database.
▪ To remove data modification anomalies. (update and delete)
Normal Forms
▪ First Normal Form (1NF)
▪ Second Normal Form (2NF)
▪ Third Normal Form (3NF)
▪ Fourth Normal Form (4NF)
▪ Fifth Normal Form (5NF)
First Normal Form (1NF)
▪ The table is said to be in First
Normal Form when:
▪ The data field is an atomic
value. (Table does not
contain any multivalued
attribute.)
Second Normal Form (2NF)
▪ The table is said to be in Second Normal Form when:
▪ The table must be in first normal form.
▪ Non key attributes are functionally dependent on
key attribute. (Partial dependency is removed.)
Second Normal Form (2NF)
Third Normal Form (3NF)
▪ The table is said to be in Third Normal Form when:
▪ The table must be in second normal form.
▪ There should be no transitive dependency in
table.
Third Normal Form (3NF)
Benefits of Normalization
▪ Dependency between data fields is identified.
▪ Redundancy in database in minimized.
▪ Data model is made more flexible and easier to maintain.
▪ Improves the performance of the database system.
▪ Simplifies the structures of tables.
▪ Avoids loss of information.
Disadvantages of Normalization
▪ You cannot start building the database before knowing
the requirement.
▪ Difficult to normalize complex relationships.
▪ Careless decomposition may lead to a bad database
design.
Let’s Normalize:
Let’s Normalize:
Let’s Normalize:
Some more practice:
Some more practice:
Some more practice:
CENTRALIZED VS DISTRIBUTED DATABASE
Computer Science
Centralized Database System
▪ Type of database that is stored,
located and maintained at a single
location only.
▪ Centralized location is accessed
via an internet connection (LAN,
WAN).
▪ Mainly used by institutions or
organizations.
▪ Suitable for small organization
and operations.
Advantages
▪ Easier to access and coordinate data.
▪ Data integrity is maximized.
▪ Minimal data redundancy.
▪ Much more secure.
▪ Relatively cheaper as it require less power and
maintenance.
▪ Easy maintenance.
Disadvantages
▪ Failure of database server makes the whole system down.
▪ No feature of load balancing.
▪ High data access traffic rate causing bottleneck.
▪ High chance of data loss during a system failure if no
database recovery measure in place.
Distributed Database System
▪ Complex type of database system.
▪ Consists of multiple logically interrelated databases
connected with each other through internet.
▪ Spread across different geographical locations.
▪ Data stored in various locations can be managed
independently of other locations.
▪ Suitable for large organizations.
Distributed Database System
Advantages
▪ Failure of one database server does not affect the whole
system. Hence reliable.
▪ Feature of load balancing.
▪ Data traffic rate is low.
▪ Easily expandable database.
▪ Can be easily accessed from different networks and
locations.
Disadvantages
▪ Very costly.
▪ Difficult to maintain.
▪ Not easy maintenance.
▪ Large overhead with constant synchronization.
▪ Improper data distribution.
▪ High risk of data theft and hacking.
Comparison between Centralized and Distributed System
Basis Centralized Distributed
Access Time More access time for Less access time for multiple
multiple users. users.
Management of Data Easier Difficult
Data Consistency More data consistency. Less data consistency.
Failure Whole system crashes. Served by database from
another location.
Cost Less costly. High cost.
Maintenance Easy maintenance. Difficult maintenance.
Efficient Less efficient. More efficient.
DATABASE SECURITY
Computer Science
Database Security
▪ Misuse of data may occur by intentional or accidental.
▪ Protection of data in database system.
▪ Prevent unauthorized access, modification, failure,
corruption, losses or physical destruction.
Why data security is
necessary?
▪ Data is a valuable asset.
▪ Data loss results in financial loss and reputation
damage.
▪ To maintain data privacy.
Common Database Threats
and Challenges
▪ Human Error
▪ Denial of Service Attacks
▪ Malwares
▪ Backup failures
Data Security Measures
▪ Authentication and access control.
▪ Strong passwords.
▪ Network security and firewall.
▪ Data flow control.
▪ Data encryption for sensitive data.
▪ Backup and recovery.
Database Administrator (DBA)
▪ Responsible for performing all activities related to
maintaining a successful database environment.
▪ Responsible for making strategic decisions regarding the
database.
▪ Responsible for overall control of the system at the technical
level.
Roles of Database Administrator
▪ Design the database.
▪ Liaising with the users.
▪ Define security and integrity rules.
▪ Execute backup and recovery procedure.
▪ Monitor performance of the database.
▪ Respond to the changing environment.
LAB
Computer Science
STRUCTURED QUERY LANGUAGE
(SQL)
Computer Science
SQL ▪ Standardized query language for relational
database.
▪ Developed by IBM since 1970s.
▪ Used to add, modify or retrieve data from
database tables.
▪ Easy to use and highly effective.
▪ Idea used by: MySQL, Oracle, Sybase, SQL
Server, Postgres, etc.
SQL command types:
▪ Data Definition Language (DDL)
▪ Data Manipulation Language (DML)
▪ Data Query Language (DQL)
▪ Data Control Language (DCL)
▪ Transaction Control Language (TCL)
Data Types:
▪ Numeric: int, float, bigint
▪ Date & Time: date, time, datetime
▪ Character & String: char(0-255),
varchar(0-255), text(0-65535)
▪ Boolean: boolean
Data Definition Language
(DDL)
▪ Commands used to define database and tables.
▪ Commands:
○ CREATE: Used to create database or table.
○ ALTER: Used to add, remove or modify a column in a
table.
○ DROP: Used to delete database, table or column.
○ TRUNCATE: Used to empty the table.
Data Definition Language
(DDL) Syntax
Data Constraints
▪ Not Null: Data cannot be null (empty).
▪ Unique: Data is unique.
▪ Primary Key: Data is primary key.
▪ Foreign Key: Identifies record in another table.
▪ Check: Ensures all values in column satisfies a specific
condition.
▪ Default: Used to provide a default value for a column.
▪ Auto Increment: Data auto increments.
Data Constraints Examples
Data Constraints Examples
Data Manipulation Language
(DML)
▪ Commands used to manipulate data.
▪ Add, change or remove data in existing tables.
▪ Commands:
○ INSERT: Used to insert records in the table.
○ UPDATE: Used to update records in the table.
○ DELETE: Used to delete records in the table.
Data Manipulation Language
(DML) Syntax
Data Query Language (DQL)
▪ Command used to get specific
data from table.
▪ Sometimes grouped with DML
commands.
▪ Command:
○ SELECT: Used to fetch the
data from the table.
Where Clause
▪ Where clause is used to filter records.
▪ Used to extract only those records that fulfill a specified
condition.
AND, OR, NOT Operators
▪ Where clause can be combined with AND, OR and NOT
operators.
▪ AND and OR operators are used to filter records based on
more than one condition:
○ AND: Displays a record if all conditions are TRUE.
○ OR: Displays a records if any conditions is TRUE.
○ NOT: Displays a record if the condition(s) is NOT TRUE.
AND, OR, NOT Operators
Syntax
ORDER BY Keyword
▪ ORDER BY keyword is used to sort the result-set in ascending
or descending order.
LIKE Operator
▪ LIKE operator is used in a WHERE clause to search for a
specified pattern in a column.
▪ Two wildcards often used in conjunction with the LIKE
operator:
○ Percent sign (%): represents zero, one or multiple
characters.
○ Underscore sign (_): represents one, single character.
Aggregate Functions
▪ Some commonly used aggregate functions in SQL:
▪ MIN() - returns the smallest value in selected column
▪ MAX() - returns the largest value in selected column
▪ COUNT() - returns the number of rows in a set
▪ SUM() - returns the total sum of a numerical column
▪ AVG() - returns the average value of a numerical column
Aggregate Functions
Aliases
▪ Used to give a table and column a temporary name.
▪ Often used to make column names more readable.
▪ Only exists for the duration of that query.
▪ Used with AS keyword.
JOIN
▪ Used to combine rows from two or more tables, based on a
related column between them.
JOIN Example
JOIN Example
Data Control Language
(DCL)
▪ Commands used to grant or revoke user access
privileges.
▪ Commands: GRANT, REVOKE.
Transaction Control
Language (TCL)
▪ Commands used to change the state of data.
▪ Commands: COMMIT, ROLLBACK.
Exercise:
▪ Create a database. Drop a database.
▪ Create a test table.
▪ Alter table columns.
▪ Create a student table.
▪ Student: roll number, name, address, phone number, batch.
▪ Insert, Update, Select and Delete values in table.
▪ Create a book table.
▪ Books: book id, student id, book name, author, edition,
published date.
▪ Insert, Update, Select and Delete values in table.
THANK YOU!
Computer Science THE END