1.
Why Databases
2. Ubiquity and Pervasiveness of Data
Data is described as abundant, global, and everywhere in today's world.
Data is unescapable, prevalent, and persistent nature and it exists from birth to death.
Individuals continuously generate and consume a lot of data throughout their lives.
It starts with birth certificates and extends to death certificates, highlighting the lifelong
data generation process.
3. Importance of Databases
Databases are the optimal solution for storing and managing data effectively.
Databases make data persistent, shareable, and secure, addressing the challenges posed
by the sheer volume of generated data.
4. Business Necessity for Data
Data is considered essential for the survival and prosperity of organizations.
It is impossible to operate a business without crucial data on customers, products,
employees, and financial transactions.
5. Role of Business Information Systems:
Help businesses use information as an organizational resource.
Help in collection, storage, aggregation, manipulation, dissemination, and management
of data.
6. Challenges in Data Management
Businesses are not able to store and retrieve huge collections of data
Databases are the solution to efficiently process, store, and retrieve vast amounts of
data for timely decision-making.
7. Databases as Specialized Structures
Databases are specialized structures allowing computer-based systems to store,
manage, and retrieve data very quickly.
2. Data verses information
Data- raw facts (facts that have not been processed to reveal their meaning)
Information-is the processed raw data
Importance of processing data- helps in decision making
Database- is shared integrated computer structure that stores a collection of data
End user data- raw facts interested to the end user
Metadata- the data characteristics and the set of relationship that links the data found
within the database.
Database management systems- is the collection of programs that manages the
database structure and controls access to the data stored in the database
Roles of DBMS-serves as an intermediary between the user and the database
Advantages of DBMS
It improves data sharing
It improves data security
It helps in better data integration
It minimizes data inconsistency
It improves Data access
It improves decision making
It increases end user productivity
Types of databases
Single user database- supports only one user at a time
Desktop database- single user database that runs on personal computer
Multiuser database- support multiple users at a time
Workgroup database- when multiuser database supports relatively small number of users
Enterprise database –when the database is used by the entire organization
Centralized database – Database that supports data located on the single site
Distributed database – database that supports data distributed across several different sites
Cloud database- Database that is created and maintained using cloud data services
General purpose database- contains a wide variety of data used in different disciplines
Discipline specific database- contains data focused on specific subject.
Operational database (online transaction processing database, transactional database, or
production database) – It supports day-to day company operation.
Analytical database- stores historical data and business metrics used exclusively for tactical or
strategic decision making.
Database warehouse: stores data in format optimized for decision support.
Online analytical processing: is set of tools that work together to provide an advanced data
analysis environment for retrieving, processing and modelling data from the data warehouse
Unstructured data- data that exists in its raw state
Structured data: formatted raw data to facilitate storage, use and generation of information.
Semi-structured data-data that has already been structured to some extent
Extensible markup Language - is a specialized language used to manipulate data elements in
textual formats
Database design- activities that focus on the design on the design of the database structure that
will be used to store and manage end-user data.
Problems with file data management
Lengthy development time
Difficulty in getting quick answers
Complex system administration
Lack of security and limited data sharing
Extensive programing
Structure and Data independence-
Structural dependence- access to the file is dependent on the structure
Structural independence- exist when you can change the file structure without affecting the
applications’ ability to access data
Logical data format – how human being view data
Physical data format- How computers must work with the data
Data redundancy: occurs when the same data is stored unnecessarily at different places
Effects of data Redundancy
Poor data security
Data inconsistency
Data entry errors
Data integrity problems
Data anomalies- develops when not all of the required changes in the redundant data are made
successfully
Types of data anomalies
Update anomalies
Insertion anomalies
Deletion anomalies
Database system environment- is an organization of components that define and regulate the
collection, storage, management and use of data within database environment.
Components of system environments:
Hardware i.e. routers, Pcs, tablets, supercomputers
Software i.e. Linus, UNIX
People
Procedure
Data
Functions of DBMS
Data dictionary management
Data storage management
Data transformation
Data presentation
Disadvantages of database systems
o Increased cost
o Management complexities
o Maintaining currency
o Vendor dependence
o Needs frequent upgrades
Chapter 2
Data model: is the simple representation, usually graphical of more complex real world data
structure
Importance of Data Models
It facilitates interaction among the designer, the application programmer and end user
Balding blocks of data modelling
Entity- a person, place, thing or event about which data will be collected and stored
Attributes – characteristic of an entity
Relationships – describes association among entities. Designers usually use shorthand notations
to represent one-to-many, many-to-many and one-to-one [1: M or M:N or *..* and 1:1 or 1..1
respectively]
Constraints- itis the restriction placed on data. They ensure data integrity and are expressed
inform of rules.
Business rules: is a brief, precise and unambiguous description of policy, procedure, or principle
within specific organization
The main source of business rules are company managers, policy makers, department managers
and written documentation such as company’s procedures, manuals and standards.
Importance of documenting business rules
It helps to standardize the company’s view of data
It can be a communication tool between users and designers
It allows designers to understand the nature, role and scope of data
It allows designers to understand business processes
It allows designers to develop appropriate relationship participation rule and constraints
and to create accurate data models
Evolution of data models
The quest for better a better data management model led to the development of several models
Hierarchical model – developed in 1960 to manage large amount of complex data
-the model is represented by an upside down tree
-used by Apollo rocket to land to the moon in 1969
-Has level or segments
Segment: it is the equivalent of the files system’s record type
Network model: represents complex data relationship more effectively than hierarchical model
to improve data performance and impose a database standard
Schema- is the conceptual organization of the entire database as viewed by the database
administrator.
Subschema – it defines the portion of the database “seen” seen by the application programs
that actually produce the desired information from the data within the
database.
Schema data definition language- enables the database administrator to define the schema
components.
A data manipulation language- defines environment in which data can be managed and it is
used to work with the data within the database.
Relational model- introduced in 1970 by E.F Codd
-it was the major breakthrough the designers
-has the ability to hide complexities
Relational diagram- is the representation of the relational database’s entities,
the attributes within those entities and the relationship within those entities.
Parts of any SQL relational database application
The end user interface
A collection of tables stored in Database
SQL engine
Entity relationship model (Components)
Entity –anything about which data will be collected and stored
Entity instance or occurrence- Each raw in relational table
Entity set- a collection of like entities
Object oriented Model (components) – also called sematic data model because it indicates
meaning
Inheritance- is the ability of the object within the class hierarchy to inherit the attributes and
methods of the classes above it.
UML class diagram- are used to represent data and its relationship within the upper UML object
oriented systems modeling language
Object/relational and XMl
Emerging data models [Big data and NoSQL]
Big Data: is the movement to find new and better ways to manage large amount of web-and
sensor generated data and drive business insight from it while simultaneously providing high
performance and scalability at a reasonable cost.
Volume, Velocity and variety (3V’’s format)
Volume- amount of data being stored
Velocity- it is the speed with which data grows and the need to process this data quickly in order
to generate information and insight.
Variety- refers to the data being collected comes in multiple different data formats.
Big data technologies –Hadoop- is java based, open source, high speed, fault tolerant
distributed storage and computational framework
-Uses low cost hardware and it originated from google
- Has two components; Hadoop distributed file system and
MapReduce
a. Hadoop distributed files system-highly distributed, fault tolerant,
high speed, uses write once read many model. Has three nodes; name
node, data node and client node
b. Map reduce – is an open source application programing interface
that provides first data analytics
- works with structure and nanostructured data
-Provides two functions- map and reduce
-NoSQL- a large scale distributed database system
- stores both structural and nonstructural data
-refers to the new generation of databases that addresses
specific challenges of the big data era
Characteristics of No SQL Database
Not based on the relational model or SQL hence name NoSQL
Support highly distributed databases
Highly available, scalable, fault tolerant
Supports very large amount of sparse data
Chapter 3
Tables and their characteristics
A table- is a two dimensional structure composed of rows and columns
- Also called relation because the relational model creator E.F Codd used the two terms as
synonyms.
Characteristics of relational tables
- perceived as s two dimensional structure composed of rows and columns
- Each row represent a single entity
- Each column represent an attribute and each column has distinct name
- Each intersection of rows and column represent a single data value
- All values in the column must conform to the same data format
- Each column has specific range of values known as attribute domain
- the order of the rows and column is immaterial to the DBMS
- Each table must have an attribute or combination of attributes that uniquely identify each
row
Primary key (PK) - is an attribute or combination of attributes that uniquely identifies any given row.
Keys: - ensure each row in a table is uniquely identifiable
- also used to establish relationship in tables
Dependencies:
Determination is the in which knowing the value of the attribute makes it possible to determine the
value of another.
Functional dependencies- relationship based on determination
Determinant- is the attribute in a functional dependency whose value determines another.
Dependent- is the attribute whose value is determined by the other attribute.
Full functional dependence –is a functional dependencies in which the entire collection of attributes in
the determinant is necessary for the relationship.
Types of Keys
Composite key – key composed of one attribute
Key attributes – an attribute that is part of the key
Super key – a key that can uniquely identify any row in a table
Candidate key- one specific type of super key
Primary key- a candidate key selected to identify all other attribute value in a a given row, cannot
contain null entry
Foreign key- an attribute or combination of attributes in one table whose value must either much the
primary key in other table
Entity integrity- is the condition in which each row in a table has its own unique identity
To ensure entity integrity, the primary key has two requirements:
- All values in the primary key must be unique
- No key attribute in the primary key can contain null
Relational algebra- defines the theoretical way of manipulating table content using relational operators.