0% found this document useful (0 votes)
6 views9 pages

Chapter 9 Database Management Systems

Chapter 9 discusses the database approach to data management, highlighting its advantages over the flat-file model, such as reduced data redundancy, single updates, and improved data currency. It outlines the role of the Database Management System (DBMS) in controlling access and managing data efficiently, as well as the different conceptual models and views (internal, conceptual, and user views) that facilitate user interaction with the database. Additionally, it emphasizes the importance of the Database Administrator (DBA) in overseeing database planning, design, and maintenance.

Uploaded by

Owa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

Chapter 9 Database Management Systems

Chapter 9 discusses the database approach to data management, highlighting its advantages over the flat-file model, such as reduced data redundancy, single updates, and improved data currency. It outlines the role of the Database Management System (DBMS) in controlling access and managing data efficiently, as well as the different conceptual models and views (internal, conceptual, and user views) that facilitate user interaction with the database. Additionally, it emphasizes the importance of the Database Administrator (DBA) in overseeing database planning, design, and maintenance.

Uploaded by

Owa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER 9 THE DATABASE APPROACH

DATABASE MANAGEMENT SYSTEMS Figure 9-2(a) presents a simple overview of the database approach with the
same users and data requirements. The most obvious change from the flat-
Overview of the Flat-File versus Database Approach file model is the pooling of data into a common database that is shared by
Many so-called legacy systems are characterized by the flat-file approach all the users.
to data management. In this environment, users own their data files.
Exclusive ownership of data is a natural consequence of two problems
associated with the legacy-system era. The first problem is a business
culture that erects barriers between organizational units that inhibit
entity-wide integration of data. The second problem stems from
limitations in flat-file management technology, which require data files to
be structured to the unique needs of the primary user.

FLAT-FILE PROBLEMS SOLVED


Data sharing (the absence of ownership) is the central concept of
the database approach. Let’s see how this resolves the problems
identified.
 No data redundancy. Each data element is stored only once,
thereby eliminating data redundancy and reducing storage
costs.
 Single update. Because each data element exists only in one
In the figure, the file contents are represented conceptually with letters. place, it requires only a single update procedure. This reduces
Each letter could signify a single data attribute (field), a record, or an entire
the time and cost of keeping the database current.
file. Note also that data element B is present in all user files. This is called
data redundancy and is the cause of three types of data management
 Current values. A change any user makes to the database
problems: data storage, data updating, and currency of information. Each yields current data values for all other users.
of these, as well as a fourth problem—task-data dependency, which is not  Task-data independence. Users have access to the full domain
directly related to data redundancy—is examined next. of data available to the firm. As users’ information needs
expand beyond their immediate domain, the new needs can
 DATA STORAGE. An efficient information system captures and be more easily satisfied than under the flat-file approach.
stores data only once and makes this single source available to Only the limitations of the data available to the firm (the
all users who need it. This is not possible in the flat-file entire database), and the legitimacy of their need to access it,
environment. To meet the private data needs of users, constrain users.
organizations must incur the costs of both multiple collection
and multiple storage procedures. Indeed, some commonly CONTROLLING ACCESS TO THE DATABASE
used data may be duplicated dozens, hundreds, or even The database approach places all the firm’s information eggs in one basket.
It is essential, therefore, to take very good care of the basket. The example
thousands of times, creating excessive storage costs.
in Figure 9-2(a) has no provision for controlling to the database. Assume
Data X contain sensitive, confidential, or secret information that only User
 DATA UPDATING. Organizations have a great deal of data 3 is authorized to access. How can the organization prevent others from
stored on master files and reference files that require periodic gaining unauthorized access to it?
updating to reflect operational and economic changes. For
example, a change in a customer’s name or address must be THE DATABASE MANAGEMENT SYSTEM
reflected in the appropriate master files. This piece of
information may be important to several user departments in
the organization, such as sales, billing, credit, customer
services, sales promotion, and catalog sales. When users
maintain separate files, any such change must be made
separately for each user. This adds significantly to the cost of
data management.

 CURRENCY OF INFORMATION. In contrast to the problem of


performing multiple updates is the problem of failing to
update the files of all users affected by a change. If update
messages are not properly disseminated, then some users Figure 9-2(b) adds a new element to Figure 9-2(a). Standing between the
users’ programs and the physical database is the database management
may not record the change and will perform their duties and
system (DBMS). The purpose of the DBMS is to provide controlled access
make decisions based on out- dated data.
to the database. The DBMS is a special software system that is
programmed to know which data elements each user is authorized to
 TASK-DATA DEPENDENCY. Another problem with the flat-file access. The user’s program sends requests for data to the DBMS, which
approach is the user’s inability to obtain additional validates and authorizes access to the database in accordance with the
information as his or her needs change. This problem is called user’s level of authority. The DBMS will deny requests for data that the user
task-data dependency. The user’s information set is is unauthorized to access. As one might imagine, the organization’s criteria,
constrained by the data that he or she possesses and rules, and procedures for assigning user authority are important control
issues for accountants to consider.
controls.
THREE CONCEPTUAL MODELS constitute the database. The DDL defines the database on
Early database models are as different from modern database three levels called views: the internal view, the conceptual
models as they were from traditional flat files. The most common view (schema), and the user view (subschema).
database approaches used for business information systems are
the hierarchical model, users, and relational model. Because of  INTERNAL VIEW. The internal view presents the physical
certain conceptual similarities, the hierarchical and network arrangement of records in the database. This is the lowest
databases are termed navigational models or internal view. The level of representation, which is one step removed from the
way that data are organized in these early database systems forces physical database. The internal view describes the structure of
users to navigate between data elements using predefined records, the linkages between them, and the physical
structured paths. The relational model is far more flexible by arrangement and sequence of records in a file. There is only
allowing users to create new and unique paths through the one internal view of the database.
database to solve a wider range of business problems. Although
their limitations are severe and their ultimate demise is inevitable,  CONCEPTUAL VIEW (SCHEMA). The conceptual view or
hierarchical and network models still exist as legacy systems that schema represents the database logically and abstractly,
support mission-critical functions in some companies. Most rather than the way it is physically stored. This view allows
modern systems, however, employ relational databases. users’ programs to call for data without knowing or needing to
specify how the data are arranged or where the data reside in
Elements of the Database Environment the physical database. There is only one conceptual view for
1. USERS. Figure 9-3 shows how users access the database in two a database.
ways. The first is via user application programs that systems
professionals prepare. These programs send data access requests  USER VIEW (SUBSCHEMA). The user view defines how a
(calls) to the DBMS, which validates the requests and retrieves the particular user sees the portion of the database that he or
data for processing. Under this mode of access, the presence of the she is authorized to access. To the user, the user view is the
DBMS is transparent to the users. Data-processing procedures database. Unlike the internal and conceptual views, many
(both batch and real time) for transactions such as sales, cash distinct user views exist. For example, a user in the personnel
receipts, and purchases are essentially the same as they would be department may view the database as a collection of
in the flat-file environment. The second method of database access employee records and is unaware of the supplier and
is via direct query, which requires no formal user programs. The inventory records seen by the users in the inventory control
DBMS has a built-in query facility that allows authorized users to department.
process data independent of professional programmers. The
query facility provides a friendly environment for integrating and DBMS OPERATION. To illustrate the roles of these views, let’s look at the
retrieving data to produce ad hoc management reports. This typical sequence of events that occurs in accessing data through a DBMS.
feature has been an attractive incentive for users to adopt the The following description is hypothetical, and certain technical details are
database approach. omitted.
1. A user program sends a request (call) for data to the DBMS. The call is
written in a special data manipulation language that is embedded in the
2. THE DATABASE MANAGEMENT SYSTEM. The second element of user program.
the database approach depicted in Figure 9-3 is the DBMS. The 2. The DBMS analyzes the request by matching the called data elements
DBMS provides a controlled environment to assist (or prevent) against the user view and the conceptual view. If the data request matches,
user access to the database and to efficiently manage the data it is authorized, and processing proceeds to Step 3. If it does not match the
resource. Each DBMS model accomplishes these objectives views, access is denied.
differently, but some typical features include: 3. The DBMS determines the data structure parameters from the internal
 Program development. The DBMS contains application view and passes them to the operating system, which performs the actual
development software. Both programmers and end users may data retrieval. Data structure parameters describe the organization and
employ this feature to create applications to access the access method (an operating system utility program) for retrieving the
database. requested data.
 Backup and recovery. During processing, the DBMS periodically 4. Using the appropriate access method, the operating system interacts
makes backup copies of the physical database. In the event of with the disk storage device to retrieve the data from the physical database.
a disaster (e.g., disk failure, program error, and malicious act), 5. The operating system then stores the data in a main memory buffer area
which renders the database unusable, the DBMS can recover an managed by the DBMS.
earlier version that is known to be correct. Although some data 6. The DBMS transfers the data to the user’s work location in main memory.
loss may occur, without the backup and recovery feature, the At this point, the user’s program is free to access and manipulate the data.
database would be vulnerable to total destruction. 7. When processing is complete, Steps 4, 5, and 6 are reversed to restore
 Database usage reporting. This feature captures statistics on the processed data to the database.
what data are being used, when they are used, and who uses
them. The database administrator uses this information to  Data Manipulation Language. Data manipulation language
assign user authorization and to maintain the database. (DML) is the proprietary programming language, which a
 Database access. The most important feature of a DBMS is to particular DBMS uses to retrieve, process, and store data.
permit authorized user access to the database. Figure 9-3 Entire user programs may be written in the DML, or
shows the three software modules that facilitate this task. alternatively, selected DML commands can be inserted into
These are the data definition language, data manipulation
programs that are written in universal languages, such as PL/1,
language, and the query language.
COBOL, and FORTRAN. Inserting DML commands enables
THREE SOFTWARE MODULES legacy application programs, which were originally written
 Data Definition Language. Data definition language (DDL) is a for the flat-file environment or earlier types of DBMSs, to be
programming language used to define the physical database easily converted to work in the current database environment.
to the DBMS. The definition includes the names and the The use of standard language programs also provides the
relationship of all data elements, records, and files that organization with a degree of independence from the DBMS
vendor. If the organization decides to switch its vendors to The Data Dictionary
one that uses a different DML, it will not be necessary to Another important function of the DBA is the creation and
rewrite all the user programs. By replacing the old DML maintenance of the data dictionary. The data dictionary describes
commands with the new commands, user programs can be every data element in the database. This enables all users (and
modified to function in the new environment. programmers) to share a common view of the data resource and
greatly facilitates the analysis of user needs.
 Query Language. The query capability of the DBMS permits
end users and professional programmers to access data in 4. THE PHYSICAL DATABASE
the database directly without the need for conventional The fourth major element of the database approach is the physical
programs. IBM’s indexed sequential file (SQL, pronounced database. This is the lowest level of the database. The physical
sequel) has emerged as the standard query language for both database consists of magnetic spots on magnetic disks. The other
mainframe and microcomputer DBMSs. SQL is a fourth- levels of the database (e.g., the user view, conceptual view, and
generation, nonprocedural language with many commands internal view) are abstract representations of the physical level.
that allow users to input, retrieve, and modify data easily. The
SELECT command is a powerful tool for retrieving data. The At the physical level, the database is a collection of records and files.
example in Figure 9-5 illustrates the use of the SELECT Relational databases are based on the indexed sequential file structure
command to produce a user report from a database called which uses an index in conjunction with a sequential file organization. It
facilitates both direct access to individual records and batch processing of
Inventory. SQL is an efficient data processing tool. Although
the entire file. Multiple indexes can be used to create a cross- reference,
not a natural English language, SQL requires far less training in called an inverted list, which allows even more flexible access to data.
computer concepts and fewer programming skills than many
other programming languages. In fact, many database query The Relational Database Model
systems require no SQL knowledge at all. Users select data E. F. Codd originally proposed the principles of the relational model in the
visually by pointing and clicking at the desired attributes. The late 1960s. The formal model has its foundations in relational algebra and
visual user interface then generates the necessary SQL set theory, which provide the theoretical basis for most of the data
commands automatically. This feature places ad hoc reporting manipulation operations used. Other theorists have, therefore, proposed
and data processing capability in the hands of the less rigid requirements for assessing the relational standing of a system.
user/manager. By reducing reliance on professional Accordingly, a system is relational if it:
1. Represents data in the form of two-dimensional tables such as the
programmers, managers are better able to deal with problems
database table, called Customer.
that arise. 2. Supports the relational algebra functions of restrict, project, and join:
 Restrict: Extracts specified rows from a specified table. This
3. THE DATABASE ADMINISTRATOR operation creates a virtual table (one that does not physically
The administrative position of database administrator (DBA) does exist) that is a subset of the original table.
not exist in the flat-file environment. The DBA is responsible for  Project: Extracts specified attributes (columns) from a table to
managing the database resource. Having multiple users share a create a virtual table.
common database requires organization, coordination, rules, and  Join: Builds a new physical table from two tables consisting of
guidelines to protect the integrity of the database. In large all concatenated pairs of rows, from each table.
organizations, the DBA function may consist of an entire
department of technical personnel under the DBA. In smaller Although not the complete set of relational functions, the subset of restrict,
organizations, someone within the computer services group may project, and join satisfies most business information needs.
assume DBA responsibility. The duties of the DBA fall into the
following areas: database planning, database design, database DATA MODELING CONCEPTS
implementation, database operation and maintenance, and In this section, we review data modeling concepts. A data model is
database change and growth. Table 9-1 presents a breakdown of a visual representation of an organization’s data. The model
specific tasks within these broad areas. represents the nature of the data and the business rules that
dictate how they are used. A data model is similar to a blueprint
(plan) for a house. Database developers use the completed data
model, often in the form of an entity relationship diagram, to
construct the physical database tables and the coding needed to
make it function as prescribed. In theory, data models are designed
in a top-down approach, which requires a detailed analysis of the
organization’s information needs. In practice, commercial DBMSs
provide a predefined data model and tables that are based on
business best-practices. The client organization’s database
developers may then modify the model to suit the company’s
specific needs. This is often called the bottom-up approach.

As information needs arise, users send formal requests for computer Accountants and auditors work extensively with databases as part of their
applications to the systems professionals (programmers) of the day-to-day activities. Whether preparing financial statements, tagging data
organization. The requests are handled through formal systems attributes for XBRL reporting, or extracting financial data from tables during
development procedures, which produce the programmed applications. By the conduct of audit tests, accountants and auditors need to understand
keeping access authority separate from systems development (application how the data are structured. The purpose of this section, therefore, is to
programming), the organization is better able to control and protect the explain the basics of data modeling, which will serve to guide the
database. Intentional and unintentional attempts at unauthorized access accountant when performing database- related tasks. Toward this end, the
are more likely to be discovered when these two groups work subsections that follow introduce relational concepts, database
independently. terminology, and database design techniques. The final subsection employs
the top-down approach to develop a data model from scratch.
ANOMALIES, STRUCTURAL DEPENDENCIES, AND DATA
Entity, Relation, Occurrence, and Attributes NORMALIZATION
An entity is anything about which the organization wishes to capture data. This section deals with why database tables need to be normalized.
Entities may be physical, such as inventories, customers, and employees. In other words, why is it necessary for the organization’s database
They may also be conceptual, such as accounts receivable and accounts
to form an elaborate network of normalized tables linked together?
payable. Database developers identify entities and prepare a model of
them. The graphical technique used to depict the model is called an entity
Why, instead, can we not simply consolidate the views of one user
relationship (ER) diagram. Each entity that is represented in an ER diagram (or several) into a single common table from which all data needs
corresponds to a table in the physical database. This is a high-level may be met?
perspective of the overall schema, which presents only the key entities and
their relationships; it does not show the data types and keys contained Database Anomalies
within the entities. The term relation describes how data in one entity are The answer to the questions asked in previous section is that
related to data in another entity. improperly normalized tables can cause DBMS processing
As a matter of convention, entities are named in the singular noun form,
problems that restrict, or even deny, users’ access to the
such as Customer rather than Customers. The term occurrence (or instance)
is used to describe the number of items associated with an entity.
information they need. Such tables exhibit negative operational
Attributes are the data elements that define an entity. symptoms called anomalies. Specifically, these are the update
anomaly, the insertion anomaly, and the deletion anomaly.
Associations and Cardinality
The term association describes the nature of the functional connection  UPDATE ANOMALY. The update anomaly is the result of data
between two entities in a relation. This association is represented in an ER redundancy in an unnormalized table.
diagram by a verb label on the connecting line such as ships, requests, and
receives. Cardinality is the degree of association between two entities.  INSERTION ANOMALY. To demonstrate the effects of the
Simply stated, cardinality describes the number of possible occurrences in insertion anomaly, assume that a new vendor has entered
one table that is associated with a single occurrence in a related table.
the marketplace. The organization does not yet purchase
Four basic forms of cardinality are possible: zero or one (0,1), one and only
one (1,1), zero or many (0,M), and one or many (1,M). These are combined
from the vendor but may wish to do so in the future. In the
to describe the cardinality between entities in a relation. meantime, the organization wants to add the vendor to the
database.
The Physical Database Tables
The data model is the blueprint for the physical database tables that  DELETION ANOMALY. The deletion anomaly involves the
constitute the database. The tables are constructed based on the entity unintentional deletion of data from a table.
specifications in the model. The attributes form columns with the attribute
name at the top of each column. Occurrences form rows that cut across the The presence of the deletion anomaly is less conspicuous but potentially
columns and at their intersection is an attribute value that pertains to the more serious than the update and insertion anomalies. A flawed database
occurrence. The term tuple is the formal name for a row in the table, Codd design that prevents the insertion of records or requires the user to perform
gave a precise definition when he first introduced this term. Accordingly, excessive updates attracts attention quickly. The deletion anomaly,
properly designed tables possess the following four characteristics: however, may go undetected, leaving the user unaware of the loss of
1. The value of at least one attribute in each tuple must be unique. important data until it is too late. This can result in the unintentional loss
This attribute is the primary key. The values of the other of critical accounting records and the destruction of audit trails. Proper
(nonkey) attributes in the row need not be unique. table design, therefore, is not just an operational efficiency issue; it carries
2. All attribute values in any column must be of the same class. internal control significance that accountants need to recognize.
3. Each column in a given table must be uniquely named. However,
different tables may contain columns with the same name. Normalizing Tables in a Relational Database
4. Tables must conform to the rules of normalization. This means The database anomalies described earlier in the text are symptoms of
they must be free from structural dependencies, including structural problems within tables called dependencies. Specifically, these
repeating groups, partial dependencies, and transitive are known as repeating groups, partial dependencies, and transitive
dependencies. dependencies. The normalization process involves identifying and
removing structural dependencies from the table(s) being modeled such
Linkages between Relational Tables that the resulting 3NF table designs will meet two conditions:
Logically related tables need to be physically connected to achieve the 1. All nonkey (data) attributes in the table are dependent on (defined by)
associations described in the data model. This is accomplished by using the primary key.
foreign keys. The foreign keys are embedded in the related table but not 2. All nonkey attributes are independent of the other nonkey attributes.
always. The degree of cardinality between the related tables determines
the method used for assigning foreign keys. With foreign keys in place, a In other words, a 3NF table is one in which the primary key of a table wholly
computer program can be written to navigate among the tables of the and uniquely defines each attribute in the table. Furthermore, none of the
database and provide users with the data they need to support their day- table attributes is defined by an attribute other than the primary key.
to-day tasks and decision-making responsibilities.

User Views REPRESENT THE VIEW AS A SINGLE TABLE The next step is to
A user view was defined earlier as the set of data that a particular user represent the view as a single table that contains all of the view
sees. Examples of user views are computer screens for entering or viewing attributes. Figure 9-18 presents a single-table structure containing
data, management reports, or source documents such as an invoice and a the sample data from Figure 9-17. Because the table contains
purchase order. Views may be digital or hard copy, but in all cases, they customer invoices, the invoice number (INVOICE NUM) will serve
derive from underlying database tables. Simple views may be constructed as a logical primary key. Notice the attributes Ext Price and Total
from a single table, while more complex views may require several tables.
Due have been highlighted in Figure 9-18. The values for these
The task of identifying all views and translating them into normalized tables
attributes may be either stored or calculated. Because Ext Price is
is an important responsibility of database designers as it has internal control
implications. The issues and techniques related to this task are examined the product of two other attributes (Quantity and Unit Price), and
next. Total Due is the sum of all Ext Price values, they both can be
calculated from existing stored attributes rather than storing them
the transaction details for the invoices of many thou- sands of
directly in the database table. To simplify this example, therefore, customers. Relational database theory requires that a table’s
we will assume that the system will calculate these attributes and primary key uniquely iden- tify each tuple stored in the table.
they will not be part of this analysis. Now that we have a base table
to work from, the next few steps in the normalization process PROD NUM alone cannot do this since a particular product, such as
involve identifying and eliminating any structural dependencies 1234 (bolt cutter), may well have been sold to many other
that exist. If dependencies exist, correcting them will involve customers whose transactions are also in the table. By combin- ing
splitting the original single-table structure into two or more smaller PROD NUM with the INVOICE NUM, however, we can uniquely
and independent 3NF tables. Each of the structural dependencies define each transaction because the table will never contain two
and the techniques for identify- ing and removing them is outlined occurrences with the same combined invoice number and product
in the following sections. REMOVE REPEATING GROUP DATA The number.
first step in correcting structural dependencies is to determine if
the table under review con- tains repeating groups. Repeating REMOVE PARTIAL DEPENDENCIES Next we check to see if the
group data occur when multiple values for a particular attribute resulting tables contain partial dependencies. A partial dependency
exist in a specific tuple. For example, the sales invoice in Figure 9- occurs when one or more nonkey attributes are dependent on
17 contains multiple values for the attributes PROD NUM, (defined by) only part of the primary key, rather than the whole
DESCRIPTION, QUANTITY, and UNIT PRICE (we ignore Ext Price). key. This can occur only in tables that have composite (two or more
These repeating groups represent the transaction details of the attributes) primary keys. Since the Sales Invoice table has a single
attribute primary key, we can ignore it in this step of the analysis.
REMOVE TRANSITIVE DEPENDENCIES Database Management This table is already in 2NF. The Line Item table, however, needs to
Systems The final step in resolving structural dependencies is to be examined further. Figure 9-20 illustrates the partial
remove transitive dependencies. A transi- tive dependency occurs dependencies 1n it. In the Line Item table, INVOICE NUM and PROD
in a table where nonkey attributes are dependent on another NUM together define the quantity sold attribute (Qunty). If we
nonkey attri- bute and independent of the table’s primary key. An assume, however, that the price charged for r234 is the same for
example of this is illustrated by the Sales Invoice table in Figure 9- all custo- mers, then the Unit Price attribute is common to all
21. The primary key INVOICE NUM uniquely and wholly defines the transactions involving product 1234. Simi- larly, the attribute
economic event that the Order Date, Shpd Date, and Shpd Via named Description is common to all such transactions. These two
attributes represent. The key does not, however, uniquely define attributes are not dependent on the INVOICE NUM component of
the customer attributes. The attributes Cust Name, Street Address, the composite key. Instead, they are defined by PROD NUM only
and so on define an entity (Customer) that is independent of the and, therefore, are partially dependent rather than wholly depen-
specific transaction captured by a particular invoice record. For dent on the primary key. We resolve this by splitting the table into
example, assume that during the period, the firm had sold to a par- two, as illustrated in Figure 9-20. The resulting Line Item table is
ticular customer on 10 different occasions. This would result in 10 now left with the single nonkey attribute Qunty. Product
different invoice occurrences in the table. Using the current table description and unit price data are placed in a new table called
structure, each of these occurrences would capture the data Inventory. Notice that the Inventory table will contain addi- tional
uniquely related to the respective transaction, along with customer attributes that do not pertain to this user view. A typical inventory
data that are common to all 10 transactions. Therefore, the primary table may contain attri- butes such as reorder point, quantity on
key does not uniquely define customer attributes in the table. hand, supplier code, and warehouse location. This demonstrates
Indeed, they are independent of the primary key. We resolve this how a single table may be used to support many different user
transitive dependency by splitting out the customer data and views. We will return to this issue later. At this point, both the
placing them in a new table called Customer. The logical key for this tables in Figure 9-20 are in 3NF. The Line Item table’s primary key
table is CUST NUM, which was the nonkey attribute in the former (INVOICE NUM, PROD NUM) wholly defines the attribute Qunty.
table on which the other nonkey customer attributes were Similarly, in the Inventory table, the Description and Unit Price
dependent. With this dependency resolved, both the revised Sales attributes are wholly defined by the primary key PROD NUM.
Invoice table and the new Customer table are in [Link]. We
see repeating group data in many business user views, such as LINKING THE NORMALIZED TABLES At this point, the original single-
purchase orders, receiving reports, and bills of lading. Relational table structure has been reduced to the four normalized but
database theory prohibits the construction of a table in which a indepen- dent tables presented in Figure 9-22. The tables contain
single tuple (a row in the table) represents multiple values for an the sample data used in the original single- table structure
attribute (a column in the table). To represent repeating group presented in Figure 9-18. Notice how data redundancy in the
values in a single table, therefore, will require multiple rows as original single-table structure has been eliminated from the more
illustrated in Figure 9-18. Notice that the invoice attributes, which efficient structure represented here. To work together in the
are common to each occurrence of the repeating group data, will physical database, however, the tables in this data model need to
also be represented multiple times. For example, Order Date, be linked via foreign keys. This requires first determining the
Shipped Date, Customer Name, and Customer Address are stored cardinality between the tables and then assigning foreign keys.
along with each unique occurrence of Prod Num, Description, Determine Cardinality In our example, the cardinality between the
Quantity, and Unit Price. To avoid such data redundancy, the four tables is one-to-many (1:M), as explained next: 1. 2. Each
repeating group data need to be removed from the table and customer (Customer table) may be associated with one or many
placed in a separate table. Figure 9-19 shows the resulting tables. sales events (Sales Invoice table), but each invoice is for a single
One is called Sales Invoice table, with INVOICE NUM as the primary customer. Each Sales Invoice record is associated with one or more
key. The second table contains the transaction details for the Line-Item records, but each line item is associated with only one
invoice and is called Line Item table. Notice that the primary key of sales invoice.
the Line Item table is a composite key comprises two attributes:
INVOICE NUM and PROD NUM. Keep in mind that this table will Each Inventory record is associated with one or more line items (a
contain the transaction details for our example invoice as well as particular product has been sold many times to many customers),
but each Line-Item record represents only one inventory item. Each entity in Figure 9-26 has been assigned a name and a primary
When linking in a I:M relation as depicted in Figure 9-22, the key. The primary key should logically define the nonkey attributes
database designer will use the primary key from the table on the | and uniquely identify each occurrence in the entity. Sometimes this
side of the relation and embed it as a foreign key in the table of the is accomplished by using a simple sequential code such as an
M side. Notice that in the relations between the Line Item table, invoice number, check number, and purchase order number.
the Invoice table, and the Inventory table, this is already the case Sequential codes, however, are not always appropriate keys. Block
because of the Line Item table’s composite key. The Sales Invoice codes, group codes, alphabetic codes, and mnemonic codes can
table, however, needs to be modified to include CUST NUM as the also be assigned as primary keys. These techniques were discussed
foreign key, which links it to the Customer table. The rules for in detail in Chapter 2. The primary keys for the entities in our
linking normalized tables in (1:1, 1:M, or M:M) relations are example are Part Number, PO Number, and Rec Rept Number.
discussed next.
NORMALIZE DATA MODEL AND ADD PRIMARY KEYS Figure 9-27
Accountants and Data Modeling presents a normalized data model. The following section describes
Although most accountants will not be directly responsible for the normalization procedures that produced this model: 1.
normalizing an organization’s databases, they should have an Repeating Group Data in Status Report Entity. The nonkey
understanding of the process and be able to determine whether attributes of Supplier Number, Supplier Name, Supplier Address,
financial data are normalized properly to avoid anomalies. For and Supplier Tel Number are repeating group data in the Status
example, the update anomaly can generate conflicting and/or Report. These data were removed and placed in a new table called
obsolete database values in accounts, the insertion anomaly can Supplier and assigned a primary key of Supplier number. The
result in unrecorded transactions and incomplete audit trails, and remaining entity, which now contains only inventory attributes was
the deletion anomaly can cause the loss of accounting records and renamed Inventory. 2. Repeating Group Data in Purchase Order
the destruction of audit trails. Also, the conduct of many financial Entity. The attributes of Part Number, Description, Order Quantity,
audit procedures involves accessing data stored in normalized base and Unit Cost are repeating group data. To resolve this, these
tables similar to those in Figure 9-22. An organization’s financial attributes data were removed to a new entity called PO Item Detail,
database may consist of thousands of normalized tables and which was assigned a primary key that is a composite of Part
navigating such a network requires an understanding of data Number and PO Number. 3. Partial Dependency in PO Item Detail
structures. Entity. An analysis of the PO Item Detail entity reveals that its
composite primary key uniquely defines only Order Quantity. The
Top-Down Approach to Designing Relational Databases attributes of Description and Unit Cost are partially dependent on
This section examines the steps involved in creating a relational Part Number and therefore removed from the table leaving Order
database using a top-down approach. Keep in mind that database Quantity as the only nonkey attribute. Unit Cost was added to the
design is an element of a much larger systems development process Inventory entity. Description was already a listed attribute of the
that involves extensive analysis of user needs, which are not Inventory entity. 4. 5. Transitive Dependency in Purchase Order
covered at this time. That body of material is the subject of Entity. The attributes of Supplier Name, Supplier Address, and
Chapters 13 and 14. Thus, our starting point is one that nor- mally Supplier Tel Number are transitive dependencies, which are
follows considerable preliminary work to specify in detail the key defined by Supplier Number. These attributes were therefore
elements of the system under development. With this caveat, the removed from the Purchase Order entity. Repeating Group Data in
focus will be on the following phases of database design, which are Receiving Report Entity. The attributes Part Number, Quantity
known collectively as view modeling: 1. Identify the views to be Received, and Condition Code are repeating groups in the Receiving
modeled. 2. Normalize the data model and add primary keys. Report entity and were removed to a new entity called Rec Report
3. 4. 5. Determine cardinalities and add foreign keys. Construct the Item Detail. A composite key composed of Part number and Rec
physical database. Prepare the physical user views. Rept Number was assigned. 6. Transitive Dependency in Receiving
Report Entity. The attributes Supplier Name, Supplier Address, and
IDENTIFY THE VIEWS TO BE MODELED Supplier Tel Number are defined by Supplier Number and not
View modeling begins by identifying the relevant views of the defined by the primary key PO Number. They are Supplier entity
business function in question. This involves user interviews, and attributes and therefore removed from the Purchase Order entity.
other techniques, to determine what users do and what
information they need to support their tasks. To demonstrate view DETERMINE CARDINALITIES AND ADD FOREIGN KEYS The next step
identification, we will analyze the following key features of a in view modeling is to determine the cardinality between entities
simplified purchasing function: 1. The purchasing agent reviews the and link the entities with foreign keys. Recall that cardinalities
inventory status report for items that need to be reordered. 2. The represent business rules. Sometimes, the rules are obvi- ous and
agent selects a supplier and prepares a digital purchase order from are the same for all organizations. For example, the normal
his terminal. 3. The agent prints a copy of the purchase order and cardinality between a Customer entity and a Sales Order entity is
sends it to the supplier. 4. The supplier ships the inventory to the 1:M. This signifies that one customer may place many orders during
company. Upon its arrival, the receiving clerk inspects the inventory a sales period. The cardinality would never be 1:1. This would mean
and prepares the digital receiving report. The computer system that the organization restricts each customer to a single sale, which
automatically updates the inventory records. This description is illogical. Sometimes, the cardinality between entities is not
identifies three views: (1) Inventory Status Report, (2) Purchase apparent because different rules may apply in different
Order, and (3) Receiving Report. The data attributes to be used in organizations. If the database is to function properly, its designers
the views are specified in Figures 9-14, 9-25a, and 9-25a need to understand the organization’s business rules as well as the
respectively. Note that the Purchase Order attributes of Extended specific needs of individual users. The cardinalities and underlying
Cost and Total Cost can be calculated from other attributes and are business rules in our example are explained next.
not included in the model. Figure 9-26 presents the unnormalized
entities that represent these views. The view modeling process described previously in the chapter
involved only one business function— the purchases system—and
the example entities that resulted constitute only a small portion quantity-on hand < reorder-point. The SELECT command identifies
of that system. A modern company would employ thousands of all of the attributes to be contained in the view. ¢ 409 The FROM
views. Combining the data needs of all users into a single entity- command identifies the tables used in creating the view.
wide schema is called view integration. This involves the
consolidation of the attri- butes from all user views into a common The WHERE command specifies how rows in the Inventory, Part-
set of normalized entities that meet the following conditions: 2. 1. Supplier, and Supplier tables are to be matched to create the view.
An entity must consist of two or more occurrences. No two entities In this case, the three tables are algebraically joined on the *
may have the same primary key. The exceptions to this are entities primary keys PART-NUM and SUPPLIER-NUMBER. Multiple
with com- posite keys that comprise the primary keys of other expressions may be linked with the AND, OR, and NOT operators.
entities. 3. No nonkey attribute may be associated with more than In this example, the last expression uses AND to restrict the records
one entity. For example, if Customer Address is an attribute of the to be selected with the logical expression quant-on-hand < reorder-
Customer entity, it cannot also be an attribute of another entity. All point. Only records whose quantities on hand have fallen to or
views that use Customer Address will retrieve it from the Customer below their reorder points will be selected for the view and the user
entity. will not see the many thousands of other inventory items that have
adequate quantities available. These SQL commands will be saved
CONSTRUCT THE PHYSICAL DATABASE Figure 9-28 illustrates the in a user program called a query. To view the Inventory Status
3NF table structures for the physical database. The dotted lines Report, the purchasing agent executes the query program. Each
represent the primary and foreign keys linking the tables. The time this is done, the query builds a new view with current data
following points are worth further elaboration. Each occurrence in from the Inventory and Vendor tables. By providing the user with
the Rec Report Item Detail table represents an individual item on his or her personal query, rather than permitting access to the
the receiving report. The table has a combined key consists of REC underlying base tables, the user is limited only to authorized data.
REPT NUMBER and PART NUMBER. This composite key is needed A report program is used to make the view visually attractive and
to uniquely identify the Quantity Received and Condi- tion easy to read. Column head- ings can be added to produce a hard-
attributes of each item-detail record. The REC REPT NUMBER copy or computer screen report that resembles the original user
portion of the key provides the link to the Receiving Report table report in Figure 9-14. The report program will Suppress
that contains general data about the receiving event. The PART unnecessary data from the view, such as duplicated fields and the
NUMBER portion of the key is used to access the Inventory table to key values in the Inventory/Vendor link table. These keys are
facilitate updating the Quantity on Hand field from the Quantity necessary to build the view, but are not needed in the actual report.
Received field of the Rec Rept Item-Detail table. The PO Item Detail
table uses a composite primary key consisting of PO NUMBER and COMMERCIAL DATABASE SYSTEM Modeling the data~needs of
PART NUMBER to uniquely identify the Order Quantity attribute. thousands of user views is a daunting undertaking when creating
The PO NUMBER compo- nent of the composite key provides a link an entity-wide database from scratch. To facilitate this task,
to the Purchase Order table. The PART NUMBER ele- ment of the modern commercial database systems Come equipped with a core
key is a link to the Inventory table where Description and Unit Cost schema, normalized tables, and templates for thousands of views.
data are stored. The next step is to create the physical tables and Commercial systems are designed to comply with proven industry
populate them with data. This is an involved step that must be best practices and to satisfy the most common needs of different
carefully planned and executed, and may take many months in a client organizations. For example, all organizations that sell
large installation. Programs will need to be written to transfer products to customers will need an Inventory table, a Customer
organization data currently stored in flat files or legacy databases table, a Supplier table, and so forth. Many of the attributes and keys
to the new relational tables. Data currently stored on paper in these tables are common to all organizations. Working from a
documents will be entered into database tables manually. Once commercial database package, developers can thus focus on
this is done, the physical user views can be produced. configuring predefined views to accommodate specific user needs
within their organizations rather than starting from scratch.
PREPARE THE PHYSICAL USER VIEWS The normalized tables should Database vendors cannot, however, anticipate the information
be rich enough to support the views of all users of the system being needs of all users in advance. Therefore, new entities and new
modeled. For example, the Purchase Order in Figure 9-28, which attributes may need to be added to the core schema. Although
could be the data entry screen for a purchasing clerk, has been configuring the core database in this fashion is far more efficient
constructed from attributes located in several tables. To illustrate than working from scratch, the objective is the same. The database
the relationship, the fields in the user view are cross-referenced via designer must produce a set of integrated tables that are free of
circled numbers to the attri- butes in the supporting tables. Keep in the update, insert, and deletion anomalies and sufficiently rich to
mind that these tables may also provide data for many other views serve the needs of all users. Databases in a Distributed
not shown here, such as the receiving report, purchase requisition Environment Chapter | introduced the concept of distributed data
listing, inventory status report, and vendor purchases activity processing (DDP) as an alternative to the cen- tralized approach.
report. Database developers use SQL to identify which tables to use Most modern organizations use some form of distributed
and which attributes to select to create the user view. The processing and net- working to process their transactions. Some
Receiving Report, Purchase Order, and Inventory Status Report companies process all of their transactions in this way. An
views in our example would all be created in this way. SQL important consideration in planning a distributed system is the
commands needed to produce the inventory status report location of the organiza- tion’s database. In addressing this issue,
illustrated in Figure 9-14 are shown below: the planner has two basic options: databases can be cen- tralized,
SELECT [Link]-num, description, quantity-on-hand, or they can be distributed. Distributed databases fall into two
reorder-point, EOQ, inventory-supplier, part-num, inventory- categories: partitioned and replicated. This section examines
[Link]-number, [Link]-number, name, issues, features, and trade-offs that should be carefully evaluated
address, tele-num, FROM inventory, part-supplier, supplier WHERE in deciding how databases should be distributed. CENTRALIZED
[Link]-num = [Link]-num AND inventory- DATABASES Under the centralized database approach, remote
[Link]-number = ¢ [Link]-number AND users send requests via terminals for data to the central site, which
processes the requests and transmits the data back to the user. The the implications for transaction processing, accountants should be
central site performs the functions of a file manager that services aware of the issues pertain- ing to deadlock resolutions.
the data needs of the remote users. The cen- tralized database
approach is illustrated in Figure 9-29. DEADLOCK RESOLUTION. Resolving a deadlock usually involves
Earlier in the chapter, three primary advantages of the database sacrificing one or more transactions. These must be terminated to
approach were presented: the reduction of data storage costs, the complete the processing of the other transactions in the deadlock.
elimination of multiple update procedures, and the establish- ment The preempted transactions must then be reinitiated. In
of data currency (i.e., the firm’s data files reflect accurately the preempting transactions, the deadlock resolution software
effects of its transactions). Achieving data currency is critical to attempts to minimize the total cost of breaking the deadlock.
database integrity and reliability. However, in the DDP envi- Although not an easy task to automate, some of the factors that
ronment, this can be a challenging task. influence this decision are as follows: 1. The resources currently
invested in the transaction. This may be measured by the number
Data Currency in a DDP Environment of updates that the transaction has already performed and that
During data processing, account balances pass through a state of must be repeated if the transac- tion is terminated. 2. The
temporary inconsistency in which their values are incorrectly transaction’s stage of completion. In general, deadlock resolution
stated. This occurs during the execution of any accounting software will avoid terminating transactions that are close to
transaction. completion. 3. The number of deadlocks associated with the
transaction. Because terminating the transaction breaks all
Database Lockout deadlock involvement, the software should attempt to terminate
To achieve data currency, simultaneous access to individual data transactions that are part of more than one deadlock.
elements by multiple sites needs to be prevented. The solution to
this problem is to use a database lockout, which is a software con- Replicated Databases
trol (usually a function of the DBMS) that prevents multiple In some organizations, the entire database is replicated at each site.
simultaneous accesses to data. The previous example can be used Replicated databases are effec- tive in companies in which there
to illustrate this technique: immediately upon receiving the access exists a high degree of data sharing but no primary user. Because
request from Site A for AR-Control (TI, Instruction Number 2), the common data are replicated at each site, the data traffic between
central site DBMS places a lock on AR-Control to prevent access sites is reduced considerably. Figure 9-32 illustrates the replicated
from other sites until Transaction TI is complete. Thus, when Site B database model. The primary justification for a replicated database
requests AR-Control (T2, Instruction Number 2), it is placed on wait is to support read-only queries. With data replicated at every site,
status until the lock is removed. Only then can Site B access AR- data access for query purposes is ensured, and lockouts and delays
Control and complete Transaction T2. because of network traffic are minimized. A problem arises,
however, when local sites also need to update the replicated
DISTRIBUTED DATABASES Distributed databases can be distributed database with transactions. Because each site processes only its
using either the partitioned or replicated technique. Partitioned local transactions, different transactions will update the common
Databases The partitioned database approach splits the central data attributes that are replicated at each site, and thus, each site
database into segments or partitions that are distributed to their will possess uniquely different values after the respective updates.
primary users. The advantages of this approach are: * ¢ * Storing Using the data from the earlier example, Figure 9-33 illustrates the
data at local sites increases users’ control. Permitting local access effect of processing credit sales for Jones at Site A and Smith at Site
to data and reducing the volume of data that must be transmitted B. After the transactions are processed, the value shown for the
between sites improve transaction processing response time. common AR-Control account is inconsistent ($12,000 at Site A and
Partitioned databases can reduce the potential for disaster. By $11,000 at Site B) and incorrect at both sites. Concurrency Control
having data located at several sites, the loss of a single site cannot Database concurrency is the presence of complete and accurate
terminate all data processing by the organization. The partitioned data at all remote sites. System designers need to employ methods
approach, which is illustrated in Figure 9-30, works best for to ensure that transactions processed at each site are accurately
organizations that require minimal data sharing among users at reflected in the databases at all other sites. This task, while
remote sites. To the extent that remote users share data, the problematic, has implications for accounting records and is a
problems associated with the centralized approach still apply. The matter of concern for accountants. A commonly used method for
primary user must now manage requests for data from other sites. concurrency control is to serialize transactions. This involves
Selecting the optimum host location for the partitions will minimize labeling each transaction by two criteria. First, special software
data access problems. This requires an in-depth analysis of end- groups transactions into classes to identify potential conflicts. For
user data needs. example, read-only (query) transactions do not conflict with other
classes of transactions. Similarly, AP and AR transactions are not
THE DEADLOCK PHENOMENON. In a distributed environment, it is likely to use the same data and do not conflict. However, multiple
possible that multiple sites will lock out each other, thus preventing sales order transactions involving both read and write operations
each from processing its transactions. For example, Figure 9-31 will potentially conflict. The second part of the control process is to
illustrates three sites and their mutual data needs. Notice that Site time stamp each transaction. A system-wide clock is used to keep
| has requested (and locked) Data A and is waiting for the removal all sites, some of which may be in different time zones, on the same
of the lock on Data C to complete its transac- tion. Site 2 has a lock logical time. Each time stamp is made unique by incorporating the
on C and is waiting for E. Finally, Site 3 has a lock on E and is waiting site’s identification number. When transac- tions are received at
for A. A deadlock occurs here because there is mutual exclusion to each site, they are examined first for potential conflicts. If conflicts
data, and the transactions are in a wait state until the locks are exist, the transactions are entered into a serialization schedule. An
removed. This can result in transactions being incompletely pro- algorithm is used to schedule updates to the database based on the
cessed and corruption of the database. A deadlock is a permanent transaction time stamp and class. This method permits multiple
condition that must be resolved by special software that analyzes inter- leaved transactions to be processed at each site as if they
each deadlock condition to determine the best solution. Because of were serial events.
Distributed Databases and the Accountant
The decision to distribute databases is one that should be entered
into thoughtfully. There are many issues and trade-offs to consider.
Some of the most basic questions to be addressed are as follows:
Should the organization’s data be centralized or distributed? * * *
If data distribution is desirable, should the databases be replicated
or partitioned? If replicated, should the databases be totally
replicated or partially replicated? If the database is to be
partitioned, how should the data segments be allocated among the
sites? The choices involved in each of these questions impact the
organization’s ability to maintain data- base integrity. The
preservation of audit trails and the accuracy of accounting records
are key concerns. Clearly, these are decisions that the modern
accountant should understand and influence intelligently.

You might also like