1 To 5 Introduction To DBMS
1 To 5 Introduction To DBMS
Database is a collection of related data. Database management system(DBMS) is the software that
manages and controls access to the database.
A database application is simply a program that interacts with the database at some point in its
execution. Database system is a collection of application programs that interact with the database along
with the DBMS and database itself.
AREAS OF APPLICATION OF DATABASE
Purchases from the supermarket.
Purchases using your credit card.
Booking a holiday at the travel agents.
Using the local library.
Taking out insurance.
Renting a video.
Using the internet.
TRADITIONAL FILE- BASED SYSTEMS
File based system is a collection of application programs that perform services for the end-users such as
the production of reports. Each program defines and manages its own data. File-based systems were an
early attempt to computerize the manual filing system.
DATABASE APPROACH
All the limitations of the file based approach can be attributed to two factors.
i. The definition of the data is embedded in the application programs rather than being stored
separately and independently.
ii. There is no control over the access and manipulation of data beyond that imposed by the
application programs.
Database Is a shared collection of logically related data, and a description of this data, designed to
meet the information needs of an organization.
An entity is a distinct object (a person, place, thing, concept or event) in the organization that is
to be represented in the database.
An attribute is a property that describes some aspects of the object that we wish record.
Relationship is an association between entities.
Page 1 of 40
Prepared by J phillis
DBMS
DBMS is a software that enables users to define creates, maintain, and control access to the database.
The DBMS is the software interacts with the users application programs and the database. DBMS
provides the following facilities:
i. It allows users to define the database, usually through a data definition language. The DDL
allows users to specify the data types, and structures and the constraints on the data to be
stored in the database.
ii. It allows users to insert, update, date, delete and retrieve data from the database, usually
through a data manipulation language. DML provide a general inquiry facility using query
language like structured query language.
iii. It provides controlled access to the database.eg
- A security system which prevents unauthorized users accessing the database.
- An integrity system, which allows shared access of the database.
- A recovery control system which restores the database to a previous consistent state
following hardware of software failure.
- A user-accessible catalogue, which contains descriptions of the data in the database.
Page 2 of 40
Prepared by J phillis
There are various categories of database users
Responsible for managing the database system, authorizing access, coordinating & monitoring uses,
acquiring resources. This resources can be
Database
DBMs
2. Database designers:
Responsible for designing the database, identifying the datato be stored, choosing the structures to
represent and store this data.
3. Application Programmers:
Application programmer is the person who is responsible for implementing the required functionality of
database for the end user. Application programmer works according to the specification provided by the
system analyst.
4. End Users:
End users are those persons who interact with the application directly. They are responsible to insert,
delete and update data in the database. They get information from the system as and when required.
i) Casual end users occasionally access the database, but they may need different information
each time. They use a sophisticated database query language to specify their requests and are
typically middle- or high-level managers or other occasional browsers.
ii) Naive or Parametric end users make up a sizable portion of database end users. Their main job
function revolves around constantly querying and updating the database, using standard types
of queries and updates-called canned transactions-that have been carefully programmed and
tested. The tasks that such users perform are varied
Prepared by J phillis
5. Sophisticated end users
Sophisticated end users include engineers, scientists, business analysts, and others who thoroughly
familiarize themselves with the facilities of the DBMS so as to implement their applications to meet their
complex requirements.
6 . Stand-alone users
Stand-alone users maintain personal databases by using ready-made program packages that provide
easy-to-use menu-based or graphics-based interfaces. An example is the user of a tax package that
stores a variety of personal financial data for tax purposes.
Design and implement tools that facilitate the use of the DBMS software. Tools include design
tools, performance tools, special interfaces, etc.
Work on running and maintaining the hardware and software environment for the database
system.
Advantages of DBMS
i. Control of data redundancy.
ii. Data consistency.
iii. Sharing of data.
iv. Improved data integrity.
v. Improved security.
vi. Increased productivity.
vii. Improve backup and recovery services.
viii. Economy of scale.
Page 4 of 40
Prepared by J phillis
Disadvantages of DBMS
i. Complexity
ii. Size complexity and breadth of functionality makes the DBMS an extremely large piece of
software.
iii. Cost of DBMS- they are costly and they require recurrent annual maintenances cost.
iv. Additional hardware costs.
v. Cost of conversion.
vi. Performance.
vii. Higher impact of a failure.
Page 5 of 40
Prepared by J phillis
COMPARISON OF TRADITIONAL FILE-BASED APPROACH AND DATABASE APPROACH
At the beginning, you should understand the rationale of replacing the traditional file-based system with
the database system.
File-based System
File-based systems were an early attempt to computerize the manual filing system. File-based system is
a collection of application programs that perform services for the end-users. Each program defines and
manages its data.
However, five types of problem are occurred in using the file-based approach:
Duplication is wasteful
Duplication can lead to loss of data integrity
Page 6 of 40
Prepared by J phillis
v. Fixed queries / proliferation of application programs
File-based systems are very dependent upon the application programmer. Any required queries
or reports have to be written by the application programmer. Normally, a fixed format query or
report can only be entertained and no facility for ad-hoc queries if offered.
File-based systems also give tremendous pressure on data processing staff, with users'
complaints on programs that are inadequate or inefficient in meeting their demands.
Documentation may be limited and maintenance of the system is difficult. Provision for
security, integrity and recovery capability is very limited.
Database Approach
In order to overcome the limitations of the file-based approach, the concept of database and the
Database Management System (DBMS) was emerged in 60s.
Advantages
A number of advantages of applying database approach in application system are obtained including:
Page 7 of 40
Prepared by J phillis
iv. Sharing of data
Database belongs to the entire organization and can be shared by all authorized users.
Page 8 of 40
Prepared by J phillis
xiii. Increased concurrency
Database can manage concurrent data access effectively. It ensures no interference between
users that would not result any loss of information nor loss of integrity.
Disadvantages
In split of a large number of advantages can be found in the database approach, it is not without any
challenge. The following disadvantages can be found including:
i. Complexity
Database management system is an extremely complex piece of software. All parties must be
familiar with its functionality and take full advantage of it. Therefore, training for the
administrators, designers and users is required.
ii. Size
The database management system consumes a substantial amount of main memory as well as a
large number amount of disk space in order to make it run efficiently.
v. Performance
As the database approach is to cater for many applications rather than exclusively for a
particular one, some applications may not run as fast as before.
DATABASE ENVIRONMENT
Page 9 of 40
Prepared by J phillis
THE THREE-LEVEL ANSI-SPARC ARCHITECTURE
ANSI-SPARC The American national standard institute (ANSI)standards planning and requirements
committee (SPARC) recognized the need for a three-level approach with a system catalogue.
Although ANSI-SPARC model did not become a standard, it still provides a basis for understanding some
of the functionality of a DBMS.
There are three level of abstraction, i.e. levels at which data items can be described are.
i. The way users perceive the data is called the external level
ii. The way the DBMS and the operating system perceive the data is the internal levelwhere the
data is actually stored using the data structures and file organizations
iii. The conceptual level provides both the mapping and the desired independence between the
external and internal levels..
Conceptual
Conceptual
schema
level
Internal Internal
level schema
Physical
level database
The objective of the three-level architecture is to separate each users view of the database from the
way the database is physically represented. Reasons for separation:
i. Users should be able to access the same data, but have a different customized view of the data.
ii. Users should not have to deal directly with physical database storage details such as indexing.
iii. DBA should be able to change the database storage structures without affecting the users
views.
iv. The internal structure of the database should be unaffected by changes to the physical aspects
of storage, such as the change over to a new storage device.
Page 10 of 40
Prepared by J phillis
EXTERNAL LEVEL
The users view of the database. The level describes that part of the database that is relevant to each
user. The external level consists of a number of different external views of the database. Different views
may have different representations of the same data. For example, one user may view dates in the
form(year, month, day) while another may view dates as (year, month, and day).
CONCEPTUAL LEVEL
The community view of the database. This level describes what data is stored in the database and the
relationship among the data. This level contains the logical structure of the entire database as seen by
the DBA. It is a complete new of the data requirements of the organization that is independent of any
storage considerations.
The conceptual level represents:
i. All entities, their attributes, and their relationship.
ii. The constraints on the data.
iii. Semantic information about the data.
iv. Security and integrity.
Any data available to a user must be contained in conceptual level or derivable from, the conceptual
level. However, this level must not contain any storage dependent detail.
INTERNAL LEVEL
The physical representation of the database on the computer. This level describes how the data is stored
in the database. The internal level covers the physical implementation of the database to achieve
optimal runtime performance and storage space utilization. It covers the data structures and file
organizations used to store data on storage devices. It interfaces with the operating system access
methods to place the data on the storage devices, build the indexes, retrieve the data etc.
i. Storage space allocation for data and indexes.
ii. Record descriptions for storage.
iii. Record placement.
iv. Data compression and encryption techniques.
SCHEMES, MAPPINGS AND INSTANCES
The overall description of the database is called the database scheme.
At the highest level, we have multiple external schemes (also called sub schemes) that correspond to
different views of the data.
At the conceptual level, we have the conceptual scheme, which describes all the entities, attributes, and
relationships together with integrity constraints. At the lowest level of abstraction we have the internal
scheme, which is a complete description of the internal model, containing the definitions of stored
records, the methods of representation, the data fields and the indexes and storage structures used.
There is only one conceptual scheme and one internal scheme per database.
Page 11 of 40
Prepared by J phillis
The DBMS is responsible for mapping between these three types of scheme. It must check the schemes
for consistency i.e. the DBMs must check that external scheme is derivable from the conceptual scheme,
and it must use the information in the conceptual scheme to map between each external scheme and
the internal scheme.
The conceptual scheme is related to the internal scheme through a conceptual /internal mapping. This
enables DBMS to find the actual record. Each external scheme is related to the conceptual scheme by
the external /conceptual mapping. This enables the DBMS to map names in the users view on to the
relevant part of the conceptual scheme.
DIFFERENCES BETWEEN THE THREE LEVELS
External view 1
External view 2
Conceptual
Two different external views of staff details exist. These external views are merged into one conceptual
view. In this merging process the major different is that the age field has been changed in to a date of
birth field, DOB. The DBMS maintains the external/conceptual mapping e.g. it maps the S/NO field of the
first external view to the staff NO field of conceptual record.
The conceptual level is then mapped to the internal level which contains a physical description of the
structure for the conceptual record.
The data in the database at any particular point in time is called database instance. The scheme is
sometime is called database intension of the database, while an instance is called an extension (or state)
of the database.
Page 12 of 40
Prepared by J phillis
DATA INDEPENDENCE
A major objective for the three level architecture is to provide data independence which means that
upper levels are unaffected by changes to lower levels. There are two kinds of data independence;
logical and physical.
i. Logical data independence
Refers to the immunity of the external schemes to changes in the conceptual scheme.E.g. change to
the conceptual scheme, such as the addition or removal of new entities, attributes, or relationships
should be possible without having to change existing external schemes or having to re write
application programs.
Conceptual schema
Page 13 of 40
Prepared by J phillis
HISTORICAL EVOLUTION OF DBMS
Databases have been in use since the earliest days of electronic computing. Unlike modern systems
which can be applied to widely different databases and needs, the vast majority of older systems were
tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally
DBMSs were found only in large organizations with the computer hardware needed to support large
data sets.
As computers grew in speed and capability, a number of general-purpose database systems emerged; by
the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to
grow, and Charles Bachman, author of one such product, Integrated Data Store (IDS), founded the
"Database Task Group" within CODASYL, the group responsible for the creation and standardization of
COBOL. In 1971 they delivered their standard, which generally became known as the "Codasyl
approach", and soon there were a number of commercial products based on it available.
The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into
a large network. When the database was first opened, the program was handed back a link to the first
record in the database, which also contained pointers to other pieces of data. To find any particular
record the programmer had to step through these pointers one at a time until the required record was
returned. Simple queries like "find all the people in India" required the program to walk the entire data
set and collect the matching results. There was, essentially, no concept of "find" or "search". This might
sound like a serious limitation today, but in an era when the data was most often stored on magnetic
tape such operations were too expensive to contemplate anyway.
IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software
written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but
used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both
concepts later became known as navigational databases due to the way data was accessed, and
Bachman's 1973 Turing Award award presentation was The Programmer as Navigator. IMS is classified
as a hierarchical database. IMS and IDMS, both CODASYL databases, as well as CINCOMs TOTAL
database are classified as network databases.
Page 14 of 40
Prepared by J phillis
1970s Relational DBMS
Edgar Codd worked at IBM in San Jose, California, in one of their offshoot offices that was primarily
involved in the development of hard disk systems. He was unhappy with the navigational model of the
Codasyl approach, notably the lack of a "search" facility which was becoming increasingly useful. In
1970, he wrote a number of papers that outlined a new approach to database construction that
eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.
In this paper, he described a new system for storing and working with large databases. Instead of
records being stored in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use
a "table" of fixed-length records. A linked-list system would be very inefficient when storing "sparse"
databases where some of the data for any one record could be left empty. The relational model solved
this by splitting the data into a series of normalized tables, with optional elements being moved out of
the main table to where they would take up room only if needed.
In the relational model, related records are linked together with a "key".
For instance, a common use of a database system is to track information about users, their name, login
information, various addresses and phone numbers. In the navigational approach all of these data would
be placed in a single record, and unused items would simply not be placed in the database. In the
relational approach, the data would be normalized into a user table, an address table and a phone
number table (for instance). Records would be created in these optional tables only if the address or
phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model, some bit of
information was used as a "key", uniquely defining a particular record. When information was being
collected about a user, information stored in the optional (or related) tables would be found by
searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers
for that user would be recorded with the login name as its key. This "re-linking" of related data back into
a single collection is something that traditional computer languages are not designed for.
Just as the navigational approach would require programs to loop in order to collect records, the
relational approach would require loops to collect information about any one record. Codd's solution to
the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous
SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could
support all the operations of normal databases (inserting, updating etc.) as well as providing a simple
system for finding and returning sets of data in a single operation.
Codd's paper was picked up by two people at the Berkeley, Eugene Wong and Michael Stonebraker.
They started a project known as INGRES using funding that had already been allocated for a
geographical database project, using student programmers to produce code. Beginning in 1973, INGRES
delivered its first test products which were generally ready for widespread use in 1979. During this time,
a number of people had moved "through" the group perhaps as many as 30 people worked on the
project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a
"language" for data access, known as QUEL QUEL was in fact relational, having been based on Codd's
Page 15 of 40
Prepared by J phillis
own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts
of the relational model as SQL itself.
IBM itself did one test implementation of the relational model, PRTV, and a production one, Business
System 12, both now discontinued. Honeywell did MRDS for Multics, and now there are two new
implementations: Alphora Dataphor and Rel. All other DBMS implementations usually called relational
are actually SQL DBMSs. In 1968, the University of Michigan began development of the Micro DBMS
relational database management system. It was used to manage very large data sets by the US
Department of Labor, the Environmental Protection Agency and researchers from University of Alberta,
the University of Michigan and Wayne State University. It ran on mainframe computers using Michigan
Terminal System. The system remained in production until 1996.
IBM started working on a prototype system loosely based on Codd's concepts as System R in the early
1970s. The first version was ready in 1974/5, and work then started on multi-table systems in which the
data could be split so that all of the data for a record (much of which is often optional) did not have to
be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978
and 1979, by which time a standardized query language, SQL, had been added. Codd's ideas were
establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true
production version of System R, known as SQL/DS, and, later, Database 2 (DB2).
Many of the people involved with INGRES became convinced of the future commercial success of such
systems, and formed their own companies to commercialize the work but with an SQL interface. Sybase,
Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES
product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus,
INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R,
and beat IBM to market when the first version was released in 1978.
Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is
now known as PostgreSQL. PostgreSQL is often used for global mission critical applications (the .org
and .info domain name registries use it as their primary data store, as do many large companies and
financial institutions).
In Sweden, Codd's paper was also read and Mimer SQL was developed from the mid-70s at Uppsala
University. In 1984, this project was consolidated into an independent enterprise. In the early 1980s,
Mimer introduced transaction handling for high robustness in applications, an idea that was
subsequently implemented on most other DBMS.
Page 16 of 40
Prepared by J phillis
1980s Object Oriented Databases
The 1980s, along with a rise in object oriented programming, saw a growth in how data in various
databases were handled. Programmers and designers began to treat the data in their databases as
objects. That is to say that if a person's data were in a database, that person's attributes, such as their
address, phone number, and age, were now considered to belong to that person instead of being
erroneous data. This allows for relationships between data to be relation to objects and their attributes
and not to individual fields.
Another big game changer for databases in the 1980s was the focus on increasing reliability and access
speeds. In 1989, two professors from the University of Michigan at Madison, published an article at an
ACM associated conference outlining their methods on increasing databse performance. The idea was to
replicate specific important, and often queried information, and store it in a smaller temporary database
that linked these key features back to the main database. This meant that a query could search the
smaller database much quicker, rather than search the entire dataset.This eventually leads to the
practice of indexing, which is used by almost every operating system from Windows to the system that
operates Apple iPod devices.
As the speeds of consumer internet connectivity increase, and as data availability and computing
become more ubiquitous / universal, databases are seeing migration to web services. Web-based
languages such as Extensible Markup Language (XML) and Hypertext Preprocessor (PHP) are used to
Page 17 of 40
Prepared by J phillis
process databases. They allow databases to live in The cloud e.g. products like Googles, Gmail, Ms
office 2010 and carbonates online backup services.
The move is due to increasing internet reliability, data storage efficiency and lack of a need for dedicated
IT staff to manage the hardware.
BUILDING BLOCKS
COMPONENTS
i. DBMS Engine
Accepts logical requests from various other DBMS subsystems, converts them into physical equivalents,
and actually accesses the database and data dictionary as they exist on a storage device.
Page 18 of 40
Prepared by J phillis
MODELING LANGUAGE
This is a data modeling language to define the schema of a database hosted in the DBMS, according to
the DBMS database model. DBMS are designed to use one of five database structures to provide
simplistic access to information stored in database. The optimal structure depends on the natural
organization of the applications data and on the applications requirements, which include transaction
rate (speed), reliability, maintainability, scalability and costs.
Database Structures
Records relationships form a treelike model. This structure is simple but non-flexible because the
relationships are confined to a one-to-many relationship. There are multiple hierarchies over the same
data e.g. RDM mobile is a newly designed embedded databases for a mobile computer system. Today
this structure is used mainly for storing geographical information and file systems.
Page 19 of 40
Prepared by J phillis
ENTITY INSTANCE
Is a single occurrence of an entity time. In one entity time, we may have many entity instances
ENTITIES TYPES
i. Strong entity:
Exists independently .e.g. when analyzing students marks, students forms strong entity
ii. Weak entity
Its existence depends on other entity (dependent). It has no business meaning in the system
when removed system continues as normal. It depends on another entity referred to as an
identifying entity.
iii. Associative entity
An entity resulting from a relationship among or between other entities also known as gerund
TYPES OF DATABASES
(a) Operational database
This database stores detailed data needed to support the operations of an entire organization
(banking ATM). Also subject area transaction and production database e.g. consumer database,
personal database, inventory database etc.
(b) Analytical database
Are primary static read only databases which store archived historical data used for analysis
Page 20 of 40
Prepared by J phillis
(c) Data warehouse (store) database
Store data from current and previous years which is extracted from various operational databases of
an organization
Its central source of data that has been screened, edited, standardized and integrated so that it can
be used by managers and other end users and professionals throughout an organization. They are
characterized by being slow to insert into but fast to retrieve from.
(d) Distributed database
They are database local groups and department of regional offices, branch offices, manufacturing
plants and other worksite. They include segment of both common operational and common user
databases as well as generated used only at the users own site.
(e) End-user database
Consists of a variety data files developed by end users at their work stations e.g. collection of
document in spreadsheets, word processing, and downloaded files.
(f) External database
They provide access to external privately owned data online. Available for free to the end-user and
organization from commercial sources
(g) Hypermedia-database on the web
This is a set of inter-connected multimedia pages at a website. They consists of homepages and
other hyperlinked pages of multimedia or mixed media e.g. texts, graphic, photographic images,
video clips, audio etc.
(h) Navigational database
Queries/find objects primarily by following reference from other objects (use of queries).
Traditionally, navigational interfaces are procedural through on world characterized some modern
system like partly being navigational.
(i) In memory database
They rely on main memory for computer data storage. They contrast with DBMS main memory.
These databases are faster than disk optimized databases since the internal optimization algorithms
are simpler and execute fewer CPU instructions. Accessing of data in memory provide faster and
more performance than disk.
(j) Document-oriented database
Are computer programs designed for document oriented applications. This system may be
implemented as a large above a relational or object database. They do not store data in table with
uniform sized fields for (a) Record but store a record as a document that has certain characteristics.
(k) Relational database
They use tables to structure information so that it can be readily and easily reached.
(l) Real time database
Page 21 of 40
Prepared by J phillis
Is a processing system designed to handle workload whose state may change constantly. They are
useful for accounting, banking, law, medical record and scientific analysis.
SYSTEMS
Is a coherent set of interdependent component which exists for some purpose, have some stability and
can be viewed as a whole.
Types of systems
(a) Information system:
Is a system which provides information for some organizations or part of an organization. Systems
are generally portrayed in terms of an input, output, and process. Model existing within a given
environment. The environment of a system is anything outside a system which has an effect on the
way the system operates. The outputs from the system are those things which it supplies back to its
environment or users system while inputs are the resources it gains from environment or other
systems.
(b) Human Activity Systems:-
Consists of people convention and artifacts designed to serve human needs. The purpose of such is
to support and enable the effective management of human activity system. Organizations normally
need a number of information systems to work effectively.
(c) Information technology (IT) Systems
Provide means of constructing aspects of modern day information systems. Computers and
communication networks are primarily used to support aspects of an organization information
system.
LAYERS OF AN IT SYSTEM
i. Interface subsystem
Its responsible for managing all interruptions with the end users, hence its frequently referred
as user interface.
ii. Rules subsystem
It manages the application logic in terms of a defined model of business rules.
iii. Transaction subsystem
Acts as the link between the data subsystem and the rules and interface subsystem. Querying
insertion and update actively is triggered at the interface, validated by the rules system and
packaged as unit that will initiate action (response or changes) in the data subsystem.
iv. Data subsystem
Its responsible for managing the underlying data needed by the application.
Page 22 of 40
Prepared by J phillis
Levels of data models
1. Corporate Data Models:- Specifies the data requirement for the whole model.
2. Business Area Data Models:- specifies data requirement for the business area
3. Application Data Models: Specifies data requirements for a particular information systems
application.
Layers of an IS
i. Information Architecture
This consists of activities involved in collection, storage, dissemination and use of information in the
organization.
ii. Information Systems Architecture:-
Consist of information system needed to support organization activity in areas of collection, storage,
dissemination and use.
iii. Information Technology Architecture:-
Consists of hardware, software communication facilities and IT knowledge and skills available to the
organization
Page 23 of 40
Prepared by J phillis
CONTEMPORARY/MODERN DATABASE TYPES
i. Production Database:- Used to collect operational or production data used to support
organization function by providing reliable, timely and valid data.
ii. Decision support database:- used as repositories/warehouse from which to retrieve information
for the support of organization decision making.
iii. Mass-deployment database:- single user tools running under some part-based DBMS i.e. Ms-
access.
Data warehouse
Is a type of contemporary databases system designed to fulfill decision support needs.
Data marts:- Is the access layer of the data warehouse environment that is used to get data out to the
users. Its a subject of the data warehouse which is usually oriented to a specific business line or team.
Application areas discussed in association with data warehouse and data marts are:-
(i) Online analytical processing (OLAP)
Comprises the dynamic synthesis, analysis and consolidation of large volumes of multi-
dimensional. Data mining: Comprises the process of extracting hidden pattern from large
databases and using it to make decision critical to some organization.
Page 24 of 40
Prepared by J phillis
DATA BASE ORGANIZATION APPROACHES
1. DISTRIBUTED APPROACH
This consists of single logical database that is spilt into a number of fragments. Each fragment is stored
on one or more computer under the control of a separate DBMS with computers connected by a
communication networks.
Users access the distributed database via applications which are classified as those that do not require
data from other sites (local application) and those that do require data from other sites (global
application).
Distributed databases:- A logically interrelated collections of shared data physically distributed over a
computer warehouse.
Distributed DBMS:- Its the software system that permit the management of the distributed database
and make the distributed databases transparent to user.
Distributed processing:- Its a centralized databases that can be accessed over a computer network.
Page 25 of 40
Prepared by J phillis
DBMS are designed to continue to function despite such failure.
v. Improved reliability
The failure of a node or a communication link doesnt necessarily make the data inaccessible.
vi. Improved performance
Speed of a databases access may be better than that achievable from a remote centralized
database. Since a site handles only a part of the entire database, there may not be the same
content for CPU and 1/0 services.
vii. Modular growth
In a distributed environment, its much easier to handle expansion. New site can be added to
the network without affecting the operation of the other sites.
viii. Economic cost effective
ix. Integration
Its important for an organization to be able to integrate software component from different
vendors to meet the specific requirement to allow their legally systems to co-exist with their
more modern systems.
Page 26 of 40
Prepared by J phillis
Additional Notes
User view:- Defines what is required of a databases system from the perspective of a particular job role
(e.g manager) or enterprise application area (e.g. marketing).
Database holds not only the organizations operational data, but also a description of this data. For this
reason, databases are also defined as a self-describing collection of integrated records.
The description of data is known as the system catalog (or data dictionary or metadata the data about
data).
Data abstraction:- This is providing of internal definition of an object and a separate external definition
and users of an object see only the external definition and are unaware of how the object is defined how
it functions.
Advantage of abstraction
i. We can change the internal definition of an object without affecting the users of an object,
provided the external definition remains the same.
ii. Data structure are separated from application programs and stored in databases and if new
structures are added or existing ones modified, the application programs are unaffected.
An application program is a computer program that interacts with the databases by issuing an
appropriate request (an SQL statement) to the DBMS.
View mechanism: is a facility that allows each user to have his or her own view of the databases.
View: Some subset of the databases
Benefits of View
i. Provide a security level:- can be set up to exclude data from some users.
ii. Provide a mechanism to customize the appearance of databases e.g. rent instead of monthly
rent.
iii. Can present a consistent, unchanging picture of the structure of the databases.
Page 27 of 40
Prepared by J phillis
2. CLIENT-SERVER ARCHITECTURE
Definition of Terms
Constraints: are consistency rules that the databases is not permitted to violate
Integrity: Validity and consistency of stored data.
Client-Server Architecture
It consists of a network of minicomputers, with a central computer located elsewhere running the
backend part of DBMS that manages and controls accesses to the databases.
It also shows several computers at various locations running the frontend part of the DBMS that
interfaces with the user.
NB: The backend is the server and the frontends are the client (client-server).
Paradigm shift: is a change in databases approach from file based system, where the work was driven by
the application needs of individual departments to where we think of data first and the application 2 nd.
Procedural Data Manipulation Languages (DML): A language that allows the user to tell the system
what data is needed and exactly how to retrieve the data.
Non-procedural DML: A language that allows the user to state what data is needed rather than how its
to be retrieved.
Page 28 of 40
Prepared by J phillis
TYPES OF FOURTH GENERATION LANGUAGES
i. Form generators: An interactive facility for rapidly creating data input and display layouts for
screen forms.
ii. Report Generators: a Facility for creating reports from data stored in databases
iii. Graphics generators:- A facility to retrieve data from databases and display it as a graph showing
trends and relationships in the data.
iv. Application Generators:- A facility for producing a program that interfaces with the databases
Model:- is a representation of real world objects and events, and their associations.
Data model:- An integrated collection of concepts for describing and manipulating data, relationships
between data, and constraints on the data in an organization.
NB: Purpose of data model is to represent data and to make the data understandable.
Page 29 of 40
Prepared by J phillis
Three Related Data Models
(i) External data model/universe of Discourse (UOD):- To represent each users view of the
organization
(ii) Conceptual data model:- to represent logical (or community) view that is DBMS
independent.
(iii) Internal data model:- to represent the conceptual schema in such a way that it can be
understood by the DBMS.
Fact finding: Formal process of using technique (interviews, questionnaires) to collect facts about
systems, requirements and preferences.
Types of questions
i. Open-ended questions: allow interviewee to respond in any way that seems appropriate.
ii. Closed-ended questions Restrict answers either to specific choices, or short, direct responses.
Page 30 of 40
Prepared by J phillis
DATABASE DEVELOPMENT LIFE CYCLE
Page 31 of 40
Prepared by J phillis
1. DATABASE PLANNING
Are the management activities that allow the stages of the database system development lifecycle to be
realized as efficiently and effectively as possible? Focus here is on management activities which may
include: work to be done and resources available
2. SYSTEM DEFINITION
Describes the scope and boundaries of the database application and the major user views
Users view defines what is required of a database system from the perspective of a particular job role
(such as manager or supervisor) or enterprise application area (such as marketing, personnel, or stock
control). A user view defines what is required of a database system in terms of the data to be held and
the transactions to be performed on the data. Thus it targets at: Scope, Parameters, Application areas,
User groups
Page 32 of 40
Prepared by J phillis
4. DATABASE DESIGN
The process of creating a design that will support the enterprises mission statement and mission
objectives for the required database system.
During database design
Determine the information requirements
Analyze the real-world objects that you want to model in the database
Determine primary key attributes
Develop a set of rules that govern how each table is accessed, populated and updated
Identifying entities e.g. Students, Courses, Instructors, , Student Courses, Advisors etc
Identifying attributes
o What information on each entity should we know?
Students (student_id, Fname, lname, phone, advisor_id)
Advisors (Advisor_id, Advisorname, Advisorphone)
Instructors (instructor_id, Instructorname, Instructorphone)
Student Courses(Student_id, Course_id)
Courses (Course_id, Coursedescription instructor_id)
Identify relationship between the entities
Plan database security
Page 33 of 40
Prepared by J phillis
Inside-out- approach is related to the bottom up approach but differs by first identifying a
set of major entities and then spreading out to consider other entities, relationships and
attributes associated with those first identified.
Mixed strategy approach uses both the bottom-upand top-down approachfor various
parts of the model before finally combining all parts together.
It involves the creation of a conceptual data model of the part of the enterprise that we are
interested in modeling. The data model is built using the information document in the users
requirement specification.
Conceptual database design is entirely independent of implementation details such as the target
database, application programs, programming languages, Hardware platform or any other physical
consideration.
The logical data model is based on the target data for the database e.g. relational data model,
whereas conceptual data model is independent of all physical models.
A logical model is derived knowing the inlying data model of the target DBMS. In other words this
DBMS is e.g. relational, network, hierarchical and object oriented.
It describes the base relations, file organization and indexes used to achieve efficient access to the
data and any associated integrity, constraints and security measures.
Although this structure is DBMS independent, its developed in accordance with a particular data
model such as relation network or hierarchical. This design is tailored to a specific DBMS system.
Page 34 of 40
Prepared by J phillis
There is feedback between physical and logical design because decisions are taken during physical
design for improving performance that may affect the data of logical data model.
In general, the aim of physical database design is to describe how we intend to physically implement
the logical database design. E.g. in relations model, this involves
i. Creating a set of relational table and the constraints on these tables from information
presented in the logical data model,
ii. Identifying the specific storage structure and access method for data to achieve an optimum
performance for the ds system
iii. Designing security protection for the system.
Physical design involves indexes selection (access method) clustering and partitioning of data e.g.
after normalization you decide which will be a primary key.
Ideally conceptual and logical database design for larger system should be separated from physical
design because:-
(a) It deals with a different subject matter the what? Not how?
(b) Its performed at a different time the what? must be understood before the how? is
determined.
(c) It requires different skills which are often found in different people.
6. APPLICATION DESIGN
Is the design of the user interface and the application programs that use and process the database.
Aspects of application design include transaction design and user interface design.
Page 35 of 40
Prepared by J phillis
User Interface Design Guidelines
i. Meaning title
ii. Comprehensible transactions
iii. Logical grouping and sequencing of fields
iv. Visually appending layout of the form/report
v. Familiar field tables
vi. Consistent terminology and abbreviations
vii. Consistent use of color
viii. Visible space and boundaries for data entry fields
ix. Error message for unacceptable values
x. Optional fields marked clearly
xi. Explanatory messages for fields
xii. Completion signals
Transaction Design
Transaction is an action, or series of action carried out by a single user or application program, which
accesses or changes the content of the database.
A transaction may be composed of several operations, such as the transfer of money from the A/C
to another. The main types of operations are:-
i. Retrieval transactions:- are used to retrieve data for display on the screen or in the
production of a report.
ii. Update transactions:- are used to insert new records, delete old records or modify existing
records in the dbs.
iii. Mixed transaction:- involve both the retrieval and updating of data. E.g. the operation to
search for and display details of a property and then update the value of the monthly rent is
an example of a mixed transaction.
iv.
Page 36 of 40
Prepared by J phillis
7. PROTOTYPING
Building an early working model of a database system. A prototype is a working model that doesnt
normally have all the required features or provide all the functionality of the final system. They
have a major advantage of being relatively inexpensive and quick to build.
8. IMPLEMENTATION
The physical realization of the database and application designs
The database implementation is achieved using the data definition language (DDL) of the selected
DBMS or a graphical user interface (GUI) which provides the same functionality while hiding the low-
level DDL statements.
The DDL statements are used to create the dbs structures and empty dbs fields. Any specified user
views are also implemented at this stage. Security and integrity controls for the system are also
implemented.
STAGES OF IMPLEMENTATION
Page 37 of 40
Prepared by J phillis
9. DATA CONVERSION AND LOADING
Transferring any existing data into the new database and converting any existing application to run
on the new database.
This is required only when a new database system is replacing an old system.
A DBMS should take a utility that loads existing files into the new dbs. The utility usually requires
the specification of the source file and the target databases and then automatically converts the
data to the required format of the new database files.
10. TESTING
The process of running the database system with the intent of finding errors
Example of criteria that can be used to conduct the evolution include:-
i. Learnability: - How does it take a new user to become productive with the system?
ii. Performance:- How well does the system response match the users work practice?
iii. Robustness:- How tolerant is the system error
iv. Recoverability:- How good is the system at recovering from user errors
v. Adaptability:- How closely is the system tied to a single model of work
Activities Involved
i. Monitoring the performance of the system:- If it falls below an acceptable level tuning or
reorganization of the database may be required.
ii. Maintaining and upgrading the database system.
DATABASE MAINTENANCE
Objectives: Fix bugs (incorrect program specs or code) in software, add enhanced functions,
cycle back through SDLC phases as needed for small-scale projects
End Result: Fully Functional Robust System
Methods: As needed for phases above; audit the system
How to Avoid Risk: Watch changing business requirements, set priorities.
Page 38 of 40
Prepared by J phillis
TERMINOLOGIES
Data administration
b. Management of the data resources which include DB plan, development and maintenance
of standards, policies and procedures, conceptual and logical DBs design.
c. Database administration
d. Management of physical realization of a database system which include:- physical database
design and implementation, setting security an integrity controls, monitoring system
performance and reorganizing the database as necessary.
Prepared by J phillis
Differentiate Between:-
5. Developers conceptual and logical db design 5. Develops physical and logical database design
Page 40 of 40
Prepared by J phillis