RDBMS Notes
1. Database Applications Examples
● Enterprise Information: Sales, Accounting, HR data.
● Manufacturing: Production, inventory, supply chain management.
● Banking & Finance: Customer info, transactions, loans, credit cards,
financial instruments.
● Universities: Registration, grades.
● Airlines: Reservations, schedules.
● Telecommunication: Call, text, data usage records, billing.
● Web Services: Online retail orders, ads, document databases.
● Navigation Systems: Locations, routes, transport systems.
2. Purpose of Database Systems
● Built on file systems initially, leading to issues like:
○ Data redundancy/inconsistency: Duplication across multiple
files.
○ Data access difficulty: New programs required for every task.
○ Data isolation: Different file formats.
○ Integrity problems: Constraints buried in code, making them hard
to manage.
○ Atomicity issues: Incomplete transactions cause inconsistency.
○ Concurrency issues: Multiple users can cause inconsistencies.
○ Security: Hard to control access to specific data.
● Database systems solve these problems by:
○ Ensuring atomic updates.
○ Providing concurrent access.
○ Securing data access.
3. Data Models
● Tools for describing data, relationships, semantics, constraints:
○ Relational Model: Data in tables.
○ Entity-Relationship (ER) Model: For database design.
○ Object-based Models: Object-oriented and object-relational.
○ Semi-structured Model: XML.
○ Older Models: Network and hierarchical models.
4. Relational Model
● Data in Tables: Stored in rows and columns.
● Example: Database records like Turing Award winners.
5. View of Data
● Database architecture involves multiple levels of abstraction (physical,
logical, view levels).
6. Instances and Schemas
● Logical Schema: Overall logical structure, like customer and account
information.
● Physical Schema: Physical structure of data storage.
● Instance: The actual data at a specific point in time.
7. Physical Data Independence
● Ability to modify the physical schema without affecting the logical
schema, ensuring that the system remains flexible.
8. Data Definition Language (DDL)
● Used to define the schema.
● Example: Creating tables with attributes like ID, name, etc.
● Data Dictionary: Stores metadata (data about data), schema,
constraints, and access authorizations.
9. Data Manipulation Language (DML)
● Accesses and updates data.
● Two types:
○ Procedural DML: Specifies what data and how to get it.
○ Declarative DML: Specifies only what data is needed (easier to
use).
○ Example of Declarative DML: SQL.
10. SQL Query Language
● Non-procedural language (you define what you want, not how to get
it).
● Example query: SELECT name FROM instructor WHERE dept_name =
'Comp. Sci.'
● SQL can’t handle complex functions and needs a host language for
more advanced tasks (e.g., C++, Java, Python).
11. Database Design
● Logical Design: Deciding on the schema (what data to store).
○ Business decisions: What attributes to track.
○ Computer Science decisions: How to organize the schema.
● Physical Design: Deciding on how data is physically stored.
12. Database Engine
● A database system is divided into modules responsible for:
○ Storage Manager
○ Query Processor
○ Transaction Management
13. Storage Manager
● Acts as the interface between low-level data and application
programs.
● Responsibilities include:
○ Interacting with the OS file manager.
○ Efficient data storage, retrieval, and updating.
● Components:
○ Authorization and Integrity Manager: Handles access control
and data validity.
○ Transaction Manager: Manages transaction execution.
○ File Manager: Manages files on the disk.
○ Buffer Manager: Controls data in main memory.
● Data Structures:
○ Data files: Store the actual database.
○ Data dictionary: Metadata storage, including the schema.
○ Indices: Fast access pointers to data items.
14. Query Processor
● Interprets and processes user queries.
● Components:
○ DDL Interpreter: Interprets schema definitions.
○ DML Compiler: Translates queries into low-level instructions and
optimizes them for efficient execution.
○ Query Evaluation Engine: Executes the instructions generated by
the DML compiler.
● Query Processing:
. Parsing and translation
. Optimization
. Evaluation
15. Transaction Management
● Ensures database consistency and handles failures (system crashes,
transaction failures).
● Concurrency-control Manager: Manages multiple transactions to
avoid conflicts.
16. Database Architecture
● Centralized Databases: Single machine handles the database.
● Client-server Databases: Client interacts with the server, which
manages the database.
● Parallel Databases: Multiple processors handling tasks in parallel.
● Distributed Databases: Spread across geographically distant
locations, supporting schema and data heterogeneity.
17. Database Application Architectures
● Two-tier Architecture: Client runs applications, which directly invoke
database functions on the server.
● Three-tier Architecture: Client interacts with an application server,
which in turn communicates with the database server.
18. Database Users
● Naive Users: Use prewritten application programs.
● Application Programmers: Develop application programs.
● Sophisticated Users: Use query languages or analytical tools.
● Specialized Users: Write non-standard applications (e.g., CAD,
multimedia databases).
19. Database Administrator (DBA)
● Central control over the database.
● Responsibilities include:
○ Schema definition.
○ Managing storage and access methods.
○ Granting access authorizations.
○ Performing routine maintenance (backups, ensuring disk space,
monitoring performance).
20. History of Database Systems
● 1950s-1960s: Magnetic tapes for storage, punched cards for input.
● 1970s: Hard disks enabled direct access, leading to the development
of relational models by Ted Codd (ACM Turing Award winner).
● 1980s: SQL became an industry standard. Parallel and distributed
databases emerged.
● 1990s: Focus on large-scale data warehouses, decision support, and
Web commerce.
● 2000s: Rise of big data systems (Google BigTable, Yahoo PNuts) and
NoSQL systems.
● 2010s: Integration of SQL with big data processing (e.g., MapReduce),
emergence of massively parallel databases, and multi-core main-
memory databases.
History of Database Management System:
● 1950s and early 1960s:
○ Data processing used magnetic tapes for storage, which allowed
only sequential access.
○ Punched cards were used for input.
● Late 1960s and 1970s:
○ Hard disks enabled direct access to data.
○ Network and hierarchical data models became popular.
○ Ted Codd introduced the relational data model, which won him the
ACM Turing Award.
○ IBM began working on the System R prototype, while UC Berkeley
started the Ingres prototype.
○ High-performance transaction processing was developed.
● 1980s:
○ Research relational prototypes evolved into commercial systems.
○ SQL became the industry standard.
○ Parallel and distributed database systems emerged.
○ Object-oriented database systems were introduced.
● 1990s:
○ Large-scale decision support and data-mining applications were
created.
○ Multi-terabyte data warehouses emerged.
○ Web commerce started to develop.
● Early 2000s:
○ XML and XQuery standards were developed.
○ Automated database administration became a focus.
● Later 2000s:
○ Massive data storage systems emerged, with platforms like Google
BigTable, Yahoo PNuts, and Amazon leading the way.
Basics of Database Management System (DBMS):
● A DBMS holds information about a specific enterprise, consisting of:
○ A collection of interrelated data.
○ A set of programs for accessing the data.
○ An environment that is both convenient and efficient for use.
● Database Applications:
○ Banking (transactions)
○ Airlines (reservations, schedules)
○ Universities (registration, grades)
○ Sales (customers, products, purchases)
○ Online retailers (order tracking, recommendations)
○ Manufacturing (production, inventory, supply chain)
○ Human resources (employee records, salaries, tax deductions)
DBMS vs. File Systems:
● Drawbacks of File Systems:
○ Data redundancy and inconsistency.
○ Difficulty in accessing data due to needing new programs for each
task.
○ Data isolation in multiple files and formats.
○ Integrity problems, where integrity constraints are buried in
program code.
○ Atomicity of updates might leave the system in an inconsistent
state after failures.
○ Concurrent access could lead to inconsistencies, such as multiple
users accessing and updating the same balance.
○ Security is harder to manage, making selective data access
difficult.
Database System Architectures:
● Centralized Systems:
○ Run on a single machine and do not interact with other systems.
○ Single-user systems are personal computers or workstations for
one user.
○ Multi-user systems serve many users connected via terminals
and are often called server systems.
● Client-Server Systems:
○ The back-end manages access structures, query evaluation, and
optimization.
○ The front-end provides tools such as forms, report writers, and
○
graphical user interfaces.
● Parallel Systems:
○ Consist of multiple processors and disks connected by a fast
network.
○ Coarse-grain systems have fewer, more powerful processors,
while fine-grain systems use thousands of smaller processors.
○ Performance is measured by throughput (tasks per time interval)
and response time (time to complete a single task).
● Distributed Systems:
○ Data is spread across multiple machines, connected via a network.
○ Homogeneous distributed databases have the same software
and schema across all sites.
○ Heterogeneous distributed databases use different software
and schema across different sites.
Levels of Abstraction:
. Physical Level:
○ Describes how records are stored.
. Logical Level:
○ Describes the data and relationships in the database (e.g.,
schema).
. View Level:
○ Provides user-friendly interfaces and hides data details for
security.
Instances and Schemas:
● Schema:
○ The logical structure of a database (e.g., relationships between
customers and accounts).
● Instance:
○ The actual content of the database at any given point in time.
Data Independence:
● Physical Data Independence:
○ The ability to modify the physical schema without changing the
logical schema.
● Logical Data Independence:
○ The ability to change the conceptual schema without altering
external APIs or programs. This is more difficult to achieve than
physical data independence.
Data Models:
● Relational Model
● Entity-Relationship (ER) Data Model for database design.
● Object-based Models (Object-oriented and Object-relational).
● Semistructured Data Model (XML).
● Older models include the Network and Hierarchical Models.
Database System Languages:
● Data Definition Language (DDL):
○ Used to define database schemas (e.g., CREATE TABLE
statements).
○ DDL compiler generates table templates stored in a data
dictionary.
● Data Manipulation Language (DML):
○ Language for accessing and manipulating data (e.g., SQL).
Database System Components:
● Database Users:
○ Naive users, application programmers, sophisticated users, and
database administrators.
● Query Processing:
○ Involves parsing, translation, optimization, and evaluation.
● Storage Management:
○ Manages file organization, indexing, and hashing.
● Transaction Management:
○ Ensures the database remains consistent despite system or
transaction failures.
○ Includes a transaction manager and concurrency control manager
Keys
● Super Key: A set of one or more attributes that uniquely identify each
entity in a set.
● Candidate Key: A minimal super key.
○ Example: ID is the candidate key for the instructor, and course_id
is the candidate key for the course.
● Primary Key: A selected candidate key to uniquely identify records.
Relationship Sets
● Relationship: An association between multiple entities.
● Relationship Set: A mathematical relation among entities, represented
by a set of tuples.
○ Example: A relationship set Advisor relates students and
instructors, e.g., (44553, 22222).
E-R Diagrams
● Rectangles: Represent entity sets.
● Diamonds: Represent relationship sets.
● Attributes: Listed inside entity rectangles.
○ Primary key attributes are underlined.
Relationship Sets with Attributes
● Attributes can also belong to relationship sets, e.g., the advisor
relationship between instructor and studentmay include the attribute
date to track when the advising relationship started.
Attributes
● Entity: Represented by a set of attributes.
○ Example: instructor = (ID, name, street, city, salary), course =
(course_id, title, credits).
● Domain: The set of permitted values for each attribute.
● Attribute Types:
○ Simple vs. composite attributes.
○ Single-valued vs. multivalued attributes.
○ Derived attributes: Can be calculated from other attributes, e.g.,
age from date_of_birth.
Degree of Relationship Sets
● Binary Relationship: Involves two entity sets (e.g., between instructor
and student).
● Ternary Relationship: Involves three entities, e.g., proj_guide relates
instructor, student, and project.
Mapping Cardinality Constraints
● Describes the number of entities involved in a relationship.
○ Types: One-to-one, One-to-many, Many-to-one, Many-to-many.
ER Notation for Cardinality Constraints
● One-to-One: One entity is associated with one other entity.
● One-to-Many: One entity is associated with several other entities.
● Many-to-One: Many entities are associated with one other entity.
● Many-to-Many: Many entities are associated with many others.
Weak Entity Sets
● Weak Entity: An entity that does not have a primary key and relies on
a strong entity.
○ It is associated with an identifying relationship (total, one-to-
many).
○ The discriminator is the partial key used to distinguish between
weak entities.
Extended E-R Features: Specialization, Generalization, and
Aggregation
● Specialization: Top-down design, where a higher-level entity is
divided into lower-level entities.
● Generalization: Bottom-up design, where lower-level entities are
combined into higher-level entities.
● Aggregation: Treats a relationship as an entity to eliminate
redundancy.
Reduction to Relational Schemas
● Entity Set: Each entity set reduces to a schema with attributes.
● Relationship Set: Many-to-many relationship sets are represented
with schemas containing the primary keys of the related entities and
any additional descriptive attributes.
○ Example: advisor = (s_id, i_id) for the advisor relationship set.
Redundancy of Schemas
● One-to-one and many-to-one relationships can be represented by
adding an extra attribute to the many side to avoid redundancy.
Composite and Multivalued Attributes
● Composite Attributes: Flattened out by creating a separate attribute
for each component.
● Multivalued Attributes: Represented by a separate schema
containing the primary key of the entity and the multivalued attribute.
1. Introduction to SQL
● SQL (Structured Query Language): A standard language for
managing relational databases, developed by IBM in the 1970s, and
standardized by ANSI in 1986.
● Types of SQL Statements:
. DML (Data Manipulation Language): Used for querying and
modifying tables (e.g., SELECT, INSERT, UPDATE, DELETE).
. DDL (Data Definition Language): Used for defining or altering the
database schema (e.g., CREATE, ALTER, DROP).
. TCL (Transaction Control Language): Controls transactions in a
database (e.g., COMMIT, ROLLBACK).
. DCL (Data Control Language): Deals with rights, permissions,
and other controls of the database system (e.g., GRANT, REVOKE).
● SQL Processing:
○ The SQL Engine handles SQL queries and determines how to
execute them.
○ Components include the Query Dispatcher and Optimization
Engines.
2. SQL Data Types
● CHAR (n): Fixed-length string, right-padded with blanks to the full
length.
● VARCHAR2 (n): Variable-length string, up to 4000 characters.
● NUMBER (o, d): Numeric type. o is the total digits, d is the number of
digits to the right of the decimal.
● DATE: Date and time. Format: DD-MM-YY.
● LONG: Stores up to 2GB of character data. Only one LONG column
allowed per table.
● NULL: Represents unknown data. It is distinct from an empty string or
zero.
Data Abstraction
Data abstraction is the process of simplifying complex data systems by hiding
the details of how data is stored, managed, and maintained, while providing
users with a simplified view. It allows users to interact with data without
needing to understand the underlying complexities of its structure, storage, or
operations. The main goal of data abstraction is to reduce complexity and make
data management more efficient.
Levels of Data Abstraction
There are three levels of data abstraction in a database system, each
providing a different view of the data:
1. Physical Level (Lowest Level)
● Definition: This level defines how the data is physically stored in the
database. It deals with data storage details, such as file structures,
storage devices, indexing, and data access methods.
● What it hides: The physical level hides the complexities of the actual
data storage from users.
● Users: Database administrators (DBAs) typically work at this level to
manage storage and optimize performance.
● Example: The storage of employee data as files on a disk.
2. Logical Level (Conceptual Level)
● Definition: This level provides a higher-level view of the entire
database and focuses on how the data is logically organized and how
relationships are defined between different data entities.
● What it hides: It hides the complexities of the physical data storage
from end-users while showing what data is stored and the
relationships between different pieces of data.
● Users: Database designers use the logical level to define the schema
and data relationships.
● Example: Viewing employee details, departments, and their
relationships (without seeing how the data is stored physically).
3. View Level (Highest Level)
● Definition: This level presents the data to users in a simplified and
customized way. Different users may have different views of the same
data based on their roles, access levels, and requirements.
● What it hides: The view level hides the complexities of both physical
and logical levels from end-users. Users only see the data they need
to interact with.
● Users: Application users and end-users typically interact with the data
at this level.
● Example: A user may only see employee names and departments in a
simplified report without needing access to salary details or how the
data is organized.
Data Independence
Data Independence refers to the ability to modify the schema at one level of a
database system without altering the schema at the next higher level. It ensures
that changes in data structure or storage details do not affect the application or
user interfaces. Data independence is a key feature of database management
systems (DBMS) that promotes flexibility and minimizes the impact of changes
on users and applications.
There are two types of data independence:
1. Logical Data Independence
● Definition: The ability to change the logical schema (the conceptual
level) without affecting the external schema (user views or application
programs).
● Example: Adding a new field (e.g., adding an email attribute to an
employee table) or changing a relationship between tables without
requiring users or applications to update their queries or views.
● Purpose: It allows for the modification of the logical structure of the
database (like tables, attributes, or relationships) without affecting
user applications.
2. Physical Data Independence
● Definition: The ability to change the physical schema (the physical
storage of data) without impacting the logical schema or application
programs.
● Example: Changing data storage techniques (e.g., changing from one
indexing method to another or from hard drives to SSDs) without
modifying the logical structure of tables or user queries.
● Purpose: It ensures that physical storage changes do not require
changes in the logical design of the database.
Summary of Data Independence:
● Logical Data Independence: Protects applications from changes in
the logical structure.
● Physical Data Independence: Protects the logical structure from
changes in the physical storage.
Data independence helps in maintaining and upgrading databases without
causing major disruptions to applications or users.
3. SQL Commands
Data Definition Language (DDL)
. CREATE: Creates tables, views, indexes, sequences, etc.
○ Example:sql
Copy code
○
CREATE TABLE Person (
○ LastName VARCHAR2(30),
○ FirstName VARCHAR2(20),
○ Age NUMBER(3)
○ );
○
. ALTER: Modifies an existing table by adding or deleting columns.
○ Example:sql
Copy code
ALTER TABLE Person ADD City VARCHAR2(10);
○
. DROP: Deletes a table or an index.
○ Example:sql
Copy code
DROP TABLE Person;
○
. TRUNCATE: Removes all rows from a table but retains its structure.
○ Example:sql
Copy code
TRUNCATE TABLE Person;
○
. RENAME: Renames a table.
○ Example:sql
Copy code
RENAME Person TO Person1;
○
○
Data Manipulation Language (DML)
. INSERT: Inserts new data into a table.
○ Example:sql
Copy code
INSERT INTO Person (LastName, FirstName, City) VALUES ('Smith',
'John', 'Pune');
○
. UPDATE: Updates existing data in a table.
○ Example:sql
Copy code
UPDATE Person SET FirstName = 'Jane' WHERE LastName =
'Smith';
○
. DELETE: Deletes data from a table.
○ Example:sql
Copy code
DELETE FROM Person WHERE LastName = 'Smith';
○
Data Control Language (DCL)
. GRANT: Gives a user privileges to execute certain commands.
○ Example:sql
Copy code
GRANT SELECT ON Person TO user123;
○
. REVOKE: Removes previously granted privileges.
○ Example:sql
Copy code
○
REVOKE SELECT ON Person FROM user123;
○
Transaction Control Language (TCL)
. COMMIT: Saves all changes made in the current transaction.
○ Example:sql
Copy code
COMMIT;
○
. ROLLBACK: Reverses changes made in the current transaction.
○ Example:sql
Copy code
ROLLBACK;
○
4. SQL Operators
● Arithmetic Operators: +, -, *, /
● Comparison Operators: =, !=, >, <, >=, <=
● Logical Operators: AND, OR, NOT
● Set Operators: UNION, INTERSECT, MINUS
● Other Operators:
○ BETWEEN/NOT BETWEEN: For ranges.
○ LIKE/NOT LIKE: Pattern matching.
○ IN/NOT IN: For matching a list of values.
○ IS NULL/IS NOT NULL: Checks for NULL values.
5. SQL Functions
Single-Row Functions
● Character Functions:
○ LOWER(), UPPER(), INITCAP(), LTRIM(), RTRIM()
● Numeric Functions:
○ ABS(), CEIL(), FLOOR(), ROUND(), TRUNC()
● Date Functions:
○ ADD_MONTHS(), MONTHS_BETWEEN(), NEXT_DAY(),
LAST_DAY()
● Conversion Functions:
○ TO_CHAR(), TO_DATE(), TO_NUMBER()
Aggregate Functions
● AVG(): Returns the average value.
● COUNT(): Counts the number of rows.
● SUM(): Returns the sum of a column.
● MAX(): Returns the maximum value.
● MIN(): Returns the minimum value.
6. SQL Clauses
SELECT Statement
● Basic Syntax:sql
Copy code
SELECT column_name(s) FROM table_name WHERE condition;
●
● DISTINCT: Returns unique [Link]
Copy code
SELECT DISTINCT column_name FROM table_name;
●
● WHERE Clause: Filters records based on conditions.
● GROUP BY: Groups rows sharing the same property, often used with
aggregate functions.
● HAVING: Filters groups formed by GROUP BY.
● ORDER BY: Sorts the result in ascending or descending order.
JOINs
● INNER JOIN: Returns rows with matching values in both tables.
○ Example:sql
Copy code
SELECT [Link], [Link]
○ FROM Employees
○ INNER JOIN Orders ON Employees.Employee_ID =
Orders.Employee_ID;
○
● LEFT JOIN: Returns all rows from the left table, with NULLs for no
●
matches in the right table.
● RIGHT JOIN: Returns all rows from the right table, with NULLs for no
matches in the left table.
7. Transaction Processing (ACID Properties)
● Atomicity: A transaction is either fully completed or not executed at
all.
● Consistency: Ensures the database moves from one valid state to
another.
● Isolation: Transactions do not interfere with each other.
● Durability: Once committed, changes are permanent.
8. Constraints in SQL
● PRIMARY KEY: Uniquely identifies each row in a table.
● FOREIGN KEY: Links two tables together.
● UNIQUE: Ensures all values in a column are unique.
● NOT NULL: Ensures a column cannot have NULL values.
● CHECK: Ensures a condition is met before data is inserted or updated.
9. SQL Operators for Range and Pattern Matching
● LIKE: Searches for a pattern using % (wildcard for zero or more
characters) or _ (single character).
Example:sql
Copy code
SELECT * FROM Person WHERE FirstName LIKE 'J%';
10. SQL Group By and Having Clauses
● GROUP BY: Used to group data by a specific column.
Example:sql
Copy code
SELECT Company, SUM(Amount) FROM Sales GROUP BY Company;
● HAVING: Used to apply a condition on groups created by the GROUP
BY clause.
Example:sql
Copy code
SELECT Company, SUM(Amount) FROM Sales GROUP BY Company
HAVING SUM(Amount) > 10000;
○
Normalization (CODD's Rule)
● Definition: A database design technique that organizes tables to
reduce redundancy and dependency.
● Purpose: To divide larger tables into smaller ones and link them using
relationships, ensuring logical data storage.
Objective:
To create relations where every dependency is on the key, the whole key, and
not a partial key.
Normal Forms:
. 1NF:
○ Ensures all values are atomic (single-valued).
○ No repeating groups or arrays are allowed.
○ Values must be simple and indivisible.
. 2NF:
○ A relation is in 2NF if it's in 1NF and every non-key attribute is fully
dependent on all parts of the candidate key.
○ No partial dependencies (i.e., non-key attributes depend on the
whole key, not a part of it).
. 3NF:
○ A relation is in 3NF if it’s in 2NF and all attributes depend directly
on the primary key, not transitively through another attribute.
○ No transitive dependencies allowed.
. BCNF (Boyce-Codd Normal Form):
○ A stronger form of 3NF where for every functional dependency A
→ B, A must be a super key.
○ Any relation in BCNF is also in 3NF.
Functional Dependencies
● Definition: Attribute B has a functional dependency on A if the value
of A determines a unique value for B.
● Notation: A → B
● Example:
Employee Number → Email Address (Each employee has one unique
email).
Transitive Dependency
● Definition: If A → B and B → C, then A → C is a transitive dependency.
● Example:
If Employee Number → Department, and Department → Manager, then
Employee Number → Manager is transitively dependent.