Introduction To Database Concepts-1
Introduction To Database Concepts-1
INTRODUCTION
In this course we will study the concept of data, database and other related terms
used in Database Management.
THE OBJECTIVES
1. Database
2. Management System
What is a Database?
To understand the concept of database we must first look at the fundamental principles
of database.
Here we consider data which is the building block of every database and other
terms circled around data.
Data
Data is any raw or unprocessed fact; it is the building block of any database. The word data
covers the collection of facts stored in a database, it is distinct pieces of information that
may not make sense independently, example; 1, Tom, Computer science, 18, which
are usually formatted in a special way.
All software is divided into two general categories: data and programs. Programs
are collections of instructions for manipulating data.
Data is not only ubiquitous and pervasive, but also essential for organizations to survive
and prosper (By ubiquitous and pervasive we are highlighting that data is present
everywhere and impacts nearly all aspects of life, data has become a fundamental part
of our daily life, business operations, personal routines and technology etc.). Imaging
trying to operate a business without knowing who your customers are, what products
you are selling, who is working for you, who owes you money, and to whom you owe
money. All businesses must keep this type of data and much more. Just as important
they must have that data available to decision makers when necessary.
1. Data can exist in a variety of forms -- as numbers (0-9) or text in a piece of paper, as bits (0,
1) and bytes (10001100) stored in electronic memory, or as facts stored in a person’s mind.
Strictly speaking, data is the plural of datum, a single piece of information. In practice,
however, people use data as both the singular and plural form of the word.
1. Manual record Keeping: Early data management involved physical records like
books, ledgers and files. Examples like census records, financial accounts etc.
Manual systems were slow, prone to error and inefficient for large data sets.
2. File-Based systems: As computers emerged, data was stored in flat files using
basic storage methods e.g. magnetic tapes or disks. Though an improvement
from the previous method, this system leads to redundancy, inconsistency and
difficulty in accessing related data.
Key advantages
• Reduced redundancy.
• Easier to access data.
• Easier to retrieve data.
5. Big Data and NoSQL: With the rise of the internet, social media and IOT
(Internet of Things), unstructured and semi-structured data gained prominence.
E.g. videos, images, social media post, XML, JSON etc. NoSQL databases like
MongoDB and Casandra were developed to handle these datatypes.
6. Cloud database: A cloud database is a database that runs on a cloud platform
(e.g. AWS, Azure, Google cloud). Instead of being stored on your own hardware,
the database is managed and accessed over the internet.
Example: Google drive storing your photos is like a cloud database for your
files. Benefits:
• Easily scalable to handle more data.
• Accessible from anywhere
• Managed by cloud providers so you don’t worry about maintenance.
Data Organization
Data organization is the way data is structured, stored and managed to make it easy
to access, analyze and use effectively. In data organization, bits, bytes, fields,
records, files and databases are hierarchical building blocks that define how data is
stored and managed.
Bit: A bit is the smallest unit of data in computing. This term is curled from Binary digit.
Examples include 0 or 1, T or F, High or Low, On or Off. In database, this data type is
known as Boolean.
Bytes: A byte is a unit of data which can be 8 bits, 16bits, 32bits or 64 bits based on
the system. The term is curled from the word (By)eith(te). It can represent characters
such as letters, numbers or symbols.
Field: Field is a group of related characters or bytes. A space allocated for a particular
item of information. It is a single piece of related information in a database. A student
form for example, contains several fields: S/n, Name, Department, Age and so on
In database systems, fields, also known as Attributes or Domain are the smallest units
of information you can access. Most fields have certain properties associated with
them. For example, some fields are numeric whereas others are textual, some are
long, while others are short. In addition, every field has a name, called the field name.
Record
File
A collection of data records or information that has a name, called the filename. Almost
all information stored in a computer must be in a file. There are many different types of
files: data files, text files, program files, directory files, and so on. Different types of files
store different types of information. For example, program files store programs, whereas
text files store text. In database, Tables or Relations which comprise of Fields and
Records created are also regarded as files.
Database
• A phonebook
• Shopping list
• Facebook user base
• Students in a department (Details).
• On piece of paper
• On your mind
• On a computer
•
Typical Applications of Database
➢ Student Records
➢ Hotel Booking
➢ Library
➢ Maintenance Information System
➢ Banking System
➢ Sales Records
storing data in a centralized or distributed system that multiple users can access. By
having a single source of truth, databases minimize inconsistencies across
different data copies.
2. Data Storage and Organization: Databases provide structured storage
solutions, allowing vast amounts of data to be stored in an organized way,
making it easier to access, sort, and manage.
3. Efficient Data Retrieval: With indexing, caching, and querying capabilities,
databases allow for fast and efficient retrieval of data, even when handling
millions of records.
4. Data Integrity and Validation: Databases enforce data integrity constraints, such
as primary keys, foreign keys, and unique constraints, ensuring data accuracy and
validity. This prevents invalid data from being stored in the system.
5. Concurrent Access and Multi-user Support: Databases allow multiple
users to access and modify data simultaneously without conflicts, using
transaction management and concurrency controls like locking mechanisms.
6. Security and Access Control: Databases provide robust security measures,
allowing organizations to control who can access, modify, or view certain data.
This is critical for protecting sensitive information.
7. Backup and Recovery: Databases provide backup and recovery solutions to
prevent data loss in case of system failures, ensuring that data can be restored
to a previous state if needed.
8. Scalability: Databases can be scaled to handle growing data needs, whether
through vertical scaling (adding more power to a single server) or horizontal
scaling (adding more servers). This is especially relevant for applications that
need to handle large volumes of data and high traffic.
9. Data Sharing and Collaboration: Databases enable data sharing between
different systems and users, facilitating collaboration and interoperability across
departments and applications.
10. Data Analytics and Decision-making: Many databases, especially analytical
databases, support complex data analysis and reporting. This helps businesses
make data-driven decisions by providing insights into trends, patterns, and
business metrics.
11. Data Compliance and Auditing: Databases help organizations meet
regulatory requirements by storing logs and audit trails of data access and
modifications, which is important for compliance and auditing purposes.
Overall, databases address the needs for structured data management, secure
access, data integrity, and operational efficiency
Database management system (DBMS) is a special software that helps users create
and maintain a database.
The primary goal of a DBMS is to offer a more convenient and effective way of storing and
retrieving database information. The management system is important because it provides
the necessary rules and regulations without which the maintenance of database cannot be
possible. Issues like selecting the attributes for a particular table; the common attributes to
create relationship between two tables; handling of insertion and deletion of records from a
table must be guided by rule in order to maintain the integrity of the database.
Advantages of DBMS:
Disadvantages of DBMS
Despite its numerous advantages, a Database Management System (DBMS) also has
some disadvantages, especially in specific scenarios. Here are the key drawbacks:
1. High Initial Cost:
• Hardware and Software Costs: Setting up a DBMS requires robust
hardware and software, which can be expensive.
• Licensing Fees: Commercial DBMS software often comes with
significant licensing costs.
2. Complexity:
• DBMSs are complex systems that require skilled personnel to
manage, maintain, and optimize.
• Learning and mastering a DBMS can be time-consuming.
3. Performance Issues:
• In systems with large volumes of transactions or complex queries, the
DBMS may experience performance bottlenecks.
• Indexing and optimization are required to maintain performance, which
can be resource intensive.
6. Overhead:
• A DBMS consumes considerable system resources, such as memory,
CPU, and storage.
• Resource-intensive applications may slow down overall
system performance.
8. Scalability Challenges:
• Scaling a DBMS to handle very large databases or high user concurrency
can be challenging without significant upgrades.
• Distributed DBMSs can address scalability but add complexity.
Database Administrator Manage and maintain DBMS and Database fundamentals, SQL,
databases vendor courses.
Database Analyst Develop databases for decision SQL, query optimization, data
support reporting warehouses.
Database Architect Design and implementation of DBMS fundamentals, data
database environments (conceptual, modelling, SQL, hardware
logical and physical) knowledge, etc.
Database Consultant Help companies leverage database Database fundamentals, data
technologies to improve business modelling, database design, SQL,
process and achieve specific goals DBMS, hardware, vendor-specific
technologies etc.
Database Security Officer Implement security policies for data DBMS fundamentals, database
administration administration, SQL data security
technologies, etc.
Cloud Computing Data Design and implement the Internet technologies, cloud
Architect infrastructure for next-generation storage technologies, data
cloud database systems security, performance tuning,
large databases, etc.
NORMALIZATION
Normalization in database is a systematic approach to organizing data to
improve its efficiency, integrity and scalability.
It involves decomposing large tables into smaller and manageable ones and
defining relationships among them.
The normalization process involves dividing large tables into smaller tables and
defining relationships between them. This is typically done through a series of normal
forms, which indicate the level of normalization achieved by a database. There are
several normal forms including: First Normal Form (1NF), Second Normal Form (2NF),
Third Normal Form (3NF), and higher forms: Boyce Codd Normal Form (BCNF),
Fourth Normal Form (4NF), Fifth Normal Form (5NF).
Here’s a simple example
Imagine you run a small school, and you are keeping records of students and their
courses in one table. It might look like this:
Student ID Name Course 1 Course 2 Course 3
101 Alice Math Science English
102 Bob Math English Null
1. Repetition: The same course names (like “math”) are repeated for different studets.
2. Empty space: If a student takes fewer than 3 courses, you will have empty
(null) fields.
3. Hard to update: If you change “math” to “Further math”, you will have to find
and update it everywhere it appears.
Normalization solves these problems by breaking the data into small, related tables.
Let’s fix the table:
101 Alice
102 Bob
1 Math
2 Science
3 English
101 1
101 2
101 3
102 1
102 3
Benefits
i. Insertion Anomalies: Inability to add data without already existing related data.
a. Example 1: You cannot add a new student if they are not enrolled
on a course.
b. Example 2: You can add a new product to a database without assigning
it to an existing supplier.
Normalization Process/Rules
Normalization is achieved through a series of stages called Normal Forms (NF’s).
Each normal form builds on the previous one.
1) First Normal Form (1NF):
Goals/Requirements:
• To make sure each column contains only one value.
• Ensure each row must be unique and identified by a primary key.
• Ensures each cell in a table contains a single value.
ID Name Hobbies
1 David Painting/Travelling
This table violates the 1NF because, “Hobbies” contains multiple values
Fixed Table:
ID Name Hobbies
1 David Painting
2 David Travelling
To normalize:
Example:
1 101 HR
2 102 IT
1 101
2 102
4) Boyce-Codd Normal Form (BCNF): Handles rare cases where even the NF
is not enough.
5) Fourth Normal Form (4NF) and Fifth Normal Form (5NF): Handles multivalued
and complex dependencies.
Advantages of Normalization
Disadvantages of Normalization
Summary
A database model or data model is a logical construct used to define or represent the
data structure and data relationships found within the database. Data model is not just
a way of structuring data, it also defines or specifies a set of operations that can be
performed on the data.
Classifying Relationships
1. Degree of a relationship:
The degree of relationship in a database refers to the number of entities (or
tables) involved in the relationship. It describes the participation of entities in
the relationship and is a fundamental concept in the Entity-Relationship (ER)
Model used for database design. Types of degree of a relationship:
i. Unary Relationship (Degree 1): Involves only a single entity type
participating in the relationship. This is also called a recursive
relationship because an entity is related to itself.
ii. Binary Relationship (Degree 2): Involves two different entity
types participating in the relationship. This is the most common
type of relationship in database design.
iii. Ternary Relationship (Degree 3): Involves three different entity
types participating in the relationship simultaneously.
iv. N-ary Relationship (Degree N): Involves more than three entities. Such
relationships are rare but are sometimes used when more than three
entities must participate simultaneously in a meaningful way.
4. Type:
An identifying relationship is one in which one of the child entities is also a
dependent entity. A non-identifying relationship is one in which both entities
are independent.
5. Existence:
Existence denotes whether to existence of an entity instance is dependent upon
the existence of another, related, entity instance.
Generalization Hierarchies
A generalization hierarchy is a form of abstraction that specifies that two or more entities that
share common attributes can be generalized into a higher level entity type called a supertype
or generic entity. The lower level of entities become the subtype, or categories, to the
supertype. Subtypes are dependent entities. Generalization occurs when two or more entities
represent categories of the same real-world object. For example, Wages_Employees and
Classified_Employees represent categories of the same entity, Employees. In this example,
Employees would be the supertype; Wages_Employees and Classified_Employees would be
the subtypes. Subtypes can be either mutually exclusive (disjoint) or overlapping (inclusive). A
mutually exclusive category is when an entity instance can be in only one category. The
above example is a mutually exclusive category. An employee can either be wages or
classified but not both. An overlapping category is when an entity instance may be in two or
more subtypes. An example would be a person who works for a university could also be a
student at that same university. The completeness constraint requires that all instances of the
subtype be represented in the supertype. Generalization hierarchies can be nested. That is, a
subtype of one hierarchy
can be a supertype of another. The level of nesting is limited only by the constraint of
simplicity. Subtype entities may be the parent entity in a relationship but not the child.
ER Notation
ER notation is a way of showing how things (entities) are related to each other
in a database. It is like a diagram to plan or explain the structure of data.
ER Model: The ER model is a conceptual data model that views the real world as
consisting of entities and relationships. It is a way of showing how things (entities) are
related to each other in a database. Here’s a simple breakdown:
a. Entities: Entities are the things or objects you want to store information
about in a database (e.g. Student, Company). Represented as rectangles.
Properties or characteristic of an
Attribute entity. E.g. Name of Student, Course
Code etc.
Phone number of a student,
Multivalued Attribute Students offering a course.
The diagram helps to visualize how data will be stored and linked.
Example: A rectangle for student entity might connect to another rectangle for
course entity via a diamond labelled enroll.
This is concerned with how data are represented in the database. It refers to how
the database is physically structured and managed on a computer system, it focuses
on the technical details of storing, retrieving and updating data efficiently.
- Hierarchical Model
- Network Model
- Relational Model
Example.
Parent: Company
A Root-segment/Parent Segment
B
C D
Level 1
D1 D2 Level 2
B1 B2 C1 C2
This structure stores data as records which are connected through links. A record
is a collection of fields with each field containing only one value.
Root-segment
Department
Level 1
Name Course
MatNo
Level 2
SLT Chemistry
FOS/…. FOS/… Alice Bob
.
Data flows in one direction and searching requires moving down the tree.
Advantages
Disadvantages
a. Requires knowledge of the physical level of data storage
b. Rigid structure, changes in relationships require a redesign
c. Difficult to query without predefined paths
Example: An employee can work on multiple projects, and each project can have
multiple employees.
Network Model Examples
- A Sales Rep may have written many invoices. Each invoice is written by a
single Sales Rep.
- A customer may have made purchases in many occasions. Each
occasion corresponds to one invoice.
- An invoice may have many invoice lines. Each invoice line is found in an
invoice ticket.
- A product may appear on several invoice lines. Each invoice line contains
only a single product.
SalesRep Customer
Product Payment
Invoice
Invoice Line
Network Model Example
Network Model is like Hierarchical model only that a member (child) can have more
than one owner (Parent).
Advantages
Disadvantages
a. Complex structure: with many pointers (links) can make it hard to manage.
b. Requires skilled users to navigate and query.
c. Less popular compared to relational models reducing support.
-Relational Model: Data is stored and organized in tables (rows and columns). It is
flexible and easy to query using SQL.
Properties of Relational Model:
- Values are atomic: Values cannot be broken down further into smaller
meaningful components. INDIVISIBLE values.
- Each row is unique.
- Column values are of the same kind.
- Each column has a unique name.
- The sequence of columns is insignificant.
- The sequence of rows is insignificant.
Advantages
Disadvantages
a. Slower for very large databases compared to the hierarchical and network model.
b. Requires more storage due to the table structures.
c. Ease of use can be a liability i.e. there could be possible misuse.
In summary: Hierarchical model is simple and rigid, Network model is flexible but
complex and Relational model is user-friendly but may struggle with massive datasets.
Dependencies in DBMS
In a DBMS, dependencies are relationships between data that determine how data in
one part of the database is connected to another part of the database. Basically,
dependency is the relationship between two or more attributes in a database. They are
crucial for organizing, managing and ensuring the database operates correctly.
Types of Dependencies
- Functional Dependency
- Transitive Dependency
- Multivalued Dependency
- Partial Dependency
- Fully Functional Dependency
MatNumber Name
• If you know the MatNumber (001), then you can find the name (John).
Example:
<Student score>
MatNo Ccode Score
001 Csc203 70
002 Csc206 45
The above tables state that Matno, Ccode, Cunit → Score. However, this is not
fully functional dependent. Whereas the subset {Matno, Ccode} can easily
determine the score of a student on a course. Thus {Matno, Ccode →(score)
gives fully functional dependency.