Unit 3-1 Data Management (Database Concepts)
Unit 3-1 Data Management (Database Concepts)
Concepts
In This Chapter
.
DATABASE CONCEPTS
O
11.1 Introduction
11.1 INTRODUCTION
C
11.2 Purpose of Databases
&
based record keeping system. The collection of 11.4 Different Data Models
data, usually referred to as the database, contains
AI
11.5 The Relational Model
information about one particular enterprise. It 11.6 Comparison of Data Models
R
maintains any information that may be necessary to the
decision-making processes involved in the management
of that organization.
T
PA
The ways in which end users want to utilize existing data will constantly change, and
in some cases demands for new uses of the data will arise rapidly and urgently. The extent
to which these demands can be satisfied determines the overall value of the database
system.
In a typical file-processing system, permanent records are stored in various files. A
number of different application programs are written to extract records from and add
records to the appropriate files. But this scheme has a number of major limitations and
disadvantages, such as data redundancy (duplication of data), data inconsistency,
unsharable data, unstandardized data, insecure data, incorrect data etc. A database
management system is answer to all these problems as it provides a centralized control of the
data.
Let us consider some of the advantages provided by the database systems and see
how a database system overcomes the above mentioned problems.
.
1. Databases reduce the data redundancy to a large extent
O
Data redundancy means duplication of data. Non-database systems maintain
C
separate copy of data for each application. For example, in a college, student records are
maintained and the hostel also maintains records of all those students who live in hostel.
&
Though the records for hosteler students are already being maintained by
Data Redundancy Duplication the college, the hostel keeps a separate copy of it. As far as everything
AI
of data is known as Data goes well, the above mentioned file processing system works well. But
Redundancy.
R
duplication of data may lead to inconsistency or incorrect data at times.
Let us see how.
T
Suppose the permanent address of a hosteler gets changed. The hosteler student
PA
informs about it to the hostel authorities and the changed address is reflected in the
student’s record. But see there is a huge bug here. The data for the same student has not
been changed in the college records. If the student forgets to inform the college authorities
AN
or college people themselves forget to update the student’s record, the same student’s
record differs in two different files. Now, which record of the two would be considered
accurate? Say, the college has to send certain information by post. Then, what will happen?
H
The wrong data will get processed. The only way out is that every file that is maintaining
D
the same record should be updated. But because of some human errors, certain files may
be left out and this will lead to inconsistent
data. The other problem of redundancy is
Database
system DBMS unnecessary wastage of storage space.
Application The database systems do not maintain
program 1 Database separate copies of the same data. Rather, all
User 1
the data are kept at one place and all the
Application applications that require data refer to the
program 2
User 2 centrally maintained database (collection of
data). (See Fig. 11.1). Now, if any change is
Application
program 3 to be made to data, it will be made at just
one place and the same changed infor-
User 3
Application mation will be available to all the appli-
program 4
cations referring to it. Thus redun- dancy
gets controlled and so are the problems
associated with it.
Fig. 11.1 Centrally Controlled Database System.
UNIT 2-1 : Data Management
11.3
DATABASE CONCEPTS
.
management system ensures that any change made to either of the two entries is
O
automatically made to the other. This process is known as propagating updates.
C
3. Databases facilitate sharing of data
Sharing of data means that individual pieces of data in the database may be shared
&
among several different users, in the sense that each of those users may have access to the
AI
same piece of data and each of them may use it for different purposes.
The database management system makes sure that not only the existing applications
R
can share the data in the database, but also that new applications can be developed to
operate against that same stored data. Or it can be said that the data requirements of new
T
applications may be satisfied without having to create any new stored files.
PA
centrally) follow the applicable standards. There may be certain standards laid by the
company or organization using the database. Or there may be certain industry standards
that must be satisfied by the data. Similarly, there may be national or international
H
associations between data items not be destroyed. Hardware failures and various types of
accidents will occur occasionally. The storage of data and its updation, and insertion
procedures, defined by the database, are such that the system can (easily) recover from
these circumstances without harm to the data.
In addition to protecting data from systems problems, the database management
system designs certain integrity checks to ensure that data values confirm to certain
specified rules. For example, a date cannot be like 25/25/12 ; it is invalid date. Or say the
number of days-worked for an employee cannot exceed the number of working days in a
month. Therefore, a database management system defines integrity checks like range checks
to check for the values that must lie within certain range of values ; and value matching to
check for the values that must be present already. (e.g., while preparing pay slips, the
employee-number must be a valid one i.e., it must be present in the employee-master).
With the above mentioned advantages, the purpose of databases is very obvious.
That is, databases are preferred so as to ensure consistent, sharable, standardized, integrated
.
and secure data, managed effectively so as to serve the requirements of the organization
O
owning it.
C
11.3 DATABASE ABSTRACTION
&
As we know that a collection of interrelated files and a set of programs that allow
AI
users to access and modify these files is known as a database management system. A major
purpose of a database system is to provide the users only that much information that is
R
required by them. This means that the system does not disclose all the details of data,
rather it hides certain details of how the data is stored and maintained. A good database
T
system ensures easy, smooth and efficient data structures in such a way so that every type
PA
of database user : end user, application system analyst, and physical storage system analyst, is
able to access its desired information efficiently. (An end user is a person who is not a
computer-trained person but uses the database to retrieve some information. For example,
AN
in a bank database, a customer, who wants to know how much balance remains in his
account, is an end-user. An application system analyst is the one who is concerned about all of
?
H
2. Conceptual Level
This level of abstraction describes what data are actually stored in the database. It also
describes the relationships existing among data. At this level, the database is described
logically in terms of simple data-structures. The users of this level are not concerned with
.
how these logical data structures will be implemented at the physical level. Rather, they
O
just are concerned about what information is to be kept in the database.
C
3. External Level (View Level)
&
This is the level closest to the users and is concerned with the way in which the data
are viewed by individual users. Most of the users of the database are not concerned with
AI
all the information contained in the database. Instead, they need only a part of the database
relevant to them. For example, even though the bank database stores a lot many
R
information, an account holder (a user) is interested only in his account details and not with
the rest of the information stored in the database. To simply such users’ interaction with
T
the system, this level of abstraction is defined. The system, thus provides many views for
PA
the same database. The following Fig. 11.2 illustrates the interrelationship among these
three levels of abstraction.
AN
Internal
Stored-Item Length = 40
Item # Type = Byte(6), offset = 0, Index = Ix
Name Type = Byte(20), offset = 6,
Price Type = Byte(8), offset = 26,
ROQ Type = Byte(4), offset = 34,
.
improve performance of the system.
O
C
2. Logical Data Independence
Logical Data Independence refers to the ability to modify the conceptual scheme without
&
causing any changes in the schemes followed at view levels. The logical data independence
ensures that the application programs remain the same. Modi- fications at the conceptual
AI
level are necessary whenever logical structures of the database get altered because of some
unavoidable reasons. (For example, the introduction of paternity leave in the employee
R
independence. The reason being that the application programs are heavily dependent on
PA
data independence to a large extent. Both hide implementation details from the users. This
allows users to concentrate on the general structure rather than low-level implementation
details.
H
The external level and conceptual level use certain data structures to help utilize the
database efficiently. How are these data structures decided ? What data structures and
associated operators should the system support ? The answer to this very crucial question
depends upon the approach or model being used for database management. The three data
models that are used for database management are :
m Relational data model m Hierarchical data model
m Network data model
?
11.4.1 The Relational Data Model
1. What are the three levels of data ab- In relational data model, the data is organized into tables
straction ? (i.e., rows and columns). These tables are called relations.
2. What do you understand by data inde- A row in a table represents a relationship among a set of
pendence ? values. Since a table is a collection of such relationships, it is
3. What are two types of data inde- generally referred to using the mathematical term relation,
pendence ? How are they different ? from which the relational data model derives its name.
UNIT 2-1 : Data Management
11.7
DATABASE CONCEPTS
Let us see how a sample database can be represented in relational form. The sample
database being shown here has three tables (relations) : Suppliers, Items, Shipments.
Suppliers
Supp# Supp-Name Status City
S1 Britannia 10 Delhi
S2 New Bakers 30 Mumbai
S3 Mother Dairy 10 Delhi
S4 Cookz 50 Bangalore
S5 Haldiram 40 Jaipur
.
O
Items Shipments
C
Item# Item-Name Price Supp# Item# Qty-supplied
I1 Milk 15.00 S1 I2 10
&
I2 Cake 5.00 S1 I3 20
I3 Bread 9.00 S1 I6 20
AI
I4 Milk Bread 14.00 S2 I4 20
R
I5 Plain Biscuit 6.00 S2 I5 10
I6 Cream Biscuit 10.00 S3 I1 10
T
Notice, that here each supplier has a unique supplier number, exactly one name, status
value and location. Likewise, we assume that each item has a unique item number, name,
D
price ; and also that, at any given time, no more than one shipment exists for a given
supplier/item combination.
Each of these three tables closely resembles a conventional sequential file, with rows
of the table corresponding to records of the file and columns corresponding to fields of the
records. Each of these tables is actually a special case of the construct known in
mathematics as a relation. (More about relations and their characteristics, we shall learn
under section 11.5.1).
Rows of relations are generally referred to as tuples and the columns are usually
referred to as attributes.
The relational data model is based on a collection of tables (relations). The user of the
(relational) database system may query these tables, insert new tuples, delete tuples, and
modify tuples. There are several languages for expressing these operations. One such
language is relational query language. The relational algebra is a procedural language that
defines the basic operations used within the relational query language. The section 11.5,
The Relational Model deals with relational data base management systems in details. But
before that you must be introduced to two other data models : network and hierarchical.
.
S1 Britannia ... S2 New Bakers ... S3 Mother Dairy ... S4 Cookz ... S5 HaldiRam ...
O
C
10 20 20 20 10 10 10 30 30
&
AI
R
Notice that, in the shipments table the supplier S1 supplies three items I2, I3 and I6 in
the quantities 10, 20 and 20 respectively. Now, this relationship has been shown in Fig. 11.5
H
See also, since only one quantity (10) is being supplied for item I2, I2 joins with only
one quantity. If there are more values for a same value (say X), the links form a chain and
the last value in the chain points back to the same value X. See there is a chain of
10 ® 20 ® 20. The last value 20 in the chain for supplier S1 points back to S1.
In network data model, while mapping to files, links are implemented by adding
pointer fields to records that are associated via a link. Each record must have one pointer
field for each link with which it is associated.
The operations on a network database are performed through a data manipulation
language for network model. The operations that can be performed on a network database,
include find, insert, delete,
S1 Britannia 10 Delhi modify etc. The inserting or
removing records involve
10 20 20
connect, disconnect, and reconnect
operations.
Fig. 11.5 The Relationships are implemented through links in Network model.
UNIT 2-1 : Data Management
11.9
DATABASE CONCEPTS
.
to items.
O
I4 Milk Bread 14.00
The record type at the top of the tree — the
C
supplier record type in our example - is usually
I5 Plain Biscuit 6.00 20 known as the root. In general, the root may have
&
10 any number of dependents, each of these
dependents may have any number of lower-
AI
level dependents, and so on, to any number of
S5 Haldiram 40 Jaipur levels.
R
The relational model was propounded by E.F. Codd of the IBM and has since been
D
11.5.1 Terminology
Different terms used in the relational model are being discussed here.
2. For a row, each column must have an atomic (indivisible) value and also for a
row, a column cannot have more than one value.
3. All rows of a relation are distinct. That is, a relation does not contain two
rows which are identical in every column. That is, each row of the relation
can be uniquely identified by its contents.
4. The ordering of rows within a relation is immaterial. That is, we cannot
retrieve any thing by saying that from row number 5, column name is to be
accessed. There is no order maintained for rows inside a relation.
5. The columns of a relation are assigned distinct names and the ordering of
these columns is immaterial.
Domain
A domain is a pool of values from which the actual values appearing in a given
column are drawn. For example, the values appearing in the Supp# column of both the
Suppliers table and the Shipments table are drawn from the underlying domain of all valid
.
O
supplier numbers. (See Fig. 11.7). A domain is said to be atomic if elements of the domain
are considered to be indivisible units. For example, the set of integers is an atomic domain but
C
the set of all sets of integers is a nonatomic domain.
Supplier Numbers
S4
&
AI
See, the Supp# columns are drawing their
Domain S1 S2
values from supplier numbers domain
R
S6
S3 S5
T
PA
S7
Shipments
Suppliers
S1 I2 10
S1 Britannia 10 Delhi
S1 I3 20
S2 New Bakers 30 Mumbai
S1 I6 20
H
S2 I5 10 S4 Cookz 50 Bangalore
S3 I1 10 S5 Haldiram 40 Jaipur
S3 I7 10
S4 I8 30
S5 I9 30
Fig. 11.7 Supp# in Suppliers and Shipments draw values from same domain.
In the sample database, observe that relations Suppliers and Shipments have a
common domain (Supplier numbers domain) and so do Items and Shipments (Item
numbers domain).
NOTE
A crucial feature of relational See that in the table Shipments the supplier S1 is supplying 10 units of
data model is that associations item I2 and from the table Suppliers it can be said that a Delhi based
between rows are represented supplier Britannia is supplying 10 units of item I2. That is how the
solely by data values in columns association between rows (S1, I2, 10 of Shipments and S1, Britannia, 10,
drawn from a common domain. Delhi of Suppliers) can be represented.
UNIT 2-1 : Data Management
11.11
DATABASE CONCEPTS
Tuple
Tuples The rows of a relation
are known as Tuples. The rows of tables (relations) are generally referred to as Tuples (usually
pronounced to rhyme with “couples’’).
Attributes
Attributes The columns of a The columns/fields of tables (relations) are generally referred to as
relation are known as attributes.
Attributes.
Degree
The number of attributes in a relation determine the degree of a relation.
Degree The number of attri- A relation having 3 attributes is said to be a relation of degree 3.
butes in a relation is called Similarly, a relation having n attributes is said to be a relation of degree
Degree. n. Relations of degree one are said to be unary, relations of degree two
are binary, relations of degree three are ternary, ....., and relations of
degree n are n-ary.
.
O
Cardinality The number of Cardinality
C
rows in a relation is known as The number of tuples (rows) in a relation is called the cardinality of the
Cardinality. relation ; e.g., the cardinality of Shipments relation is 9. Similarly, the
&
cardinality of Suppliers relation is 5.
11.5.2 Views
AI
A view is a kind of table whose contents are taken from other tables depending upon
R
a condition. Views do not contain data of their own. The contents of a view are determined
by carrying out the execution of the given query (the given condition). Let us elaborate the
T
concept.
PA
The kinds of tables (relations) that you have come across with until now are called
base tables. These are the tables that actually contain data. There is another kind of tables,
the views. Views are tables whose contents are derived from other tables
AN
derived from one or more view is stored. Everytime a view is referred to, its contents are derived
underlying base table(s). from its underlying base table/s) using its stored definition. Thus, a view
D
Following Fig. 11.8 explains the concept of a view. Views are like windows through
which you view desired information that is actually stored in a base table.
A view can be used just like any other table. It can be queried, updated, inserted into,
deleted from, and joined with other tables and views.
Views greatly extend the control you have over your data. They are an excellent way
to give people access to some but not all of the information in a table. Even Read-only views
can also be created which means that they can be queried, but they cannot be subjected to
update commands.
(Though there is a lot that can be discussed about views yet we are not going further
in explaining views as it is beyond the scope of this chapter and the book).
.
O
say (x1, x2, x3) where x1 is an item number, x2 is the item name and x3 is the price of the
item. That means x1 is in domain D1, x2 is in domain D2, and x3 is in domain D3. In
C
general, the table Items will contain only a subset of the set of all possible rows. Therefore,
Items is a subset of
&
D1 ´ D2 ´ D3.
AI
In general, a table having n attributes must be a subset of
D1 ´ D2 ´ ...... ´ Dn - 1 ´ Dn .
R
11.5.3A Keys
T
them must be expressed in terms of their attributes. Keys come here for a rescue.
Primary Key
AN
Within a given relation, a set of one or more attributes having values that are unique
within the relation and thus are able to uniquely identify that tuple, is said to be the primary
H
Every relation does have a primary key. In our sample database, Supp# is
Primary Key A primary key is the primary key for Suppliers as it contains unique value for each tuple in
a set of one or more attributes the relation. Similarly, Item# is the primary key for Items and the
that can uniquely identify combination of Supp# and Item# is the primary key for the Shipments
tuples within the relation.
relation.
In some tables, combination of more than one attribute provides a unique
value for each row. In such tables, the group of these attributes is declared
as primary key. In such cases, the primary key consists of more than one
attribute, it is called composite-primary-key.
The primary key is nonredundant i.e., it does not have duplicate values in the
Candidate Keys All attribute same relation. The non-primary-key attributes of a table can be referred to as
combinations inside a relation non-key attributes.
that can serve as primary key
are Candidate Keys as they are Candidate Key
candidates for the primary key Occasionally we may encounter a relation in which there is more than one
position. attribute possessing the unique identification property.
UNIT 2-1 : Data Management
11.13
DATABASE CONCEPTS
In our sample database, there are two candidate keys Supp# and Supp-Name in the
Suppliers relation. Both of these attributes contain unique values for each tuple. Similarly,
in Items, Item# and Item-Name are candidate keys.
In the case of two or more candidate keys, the database analyst decides one of them as
the primary key for the relation.
Alternate Key
In case of two or more candidate keys, only one of them serves as the
Alternate Key A candidate primary key. The rest of them are alternates only.
key that is not the primary key
In Suppliers table, Supp-Name is the alternate key and in Items table
is called an Alternate Key.
Item-Name is the alternate key.
Foreign Key
A foreign key is used to represent the relationship between two tables. A foreign key
.
is a non-key attribute (or a group of non-key attributes) whose value is derived from the
O
primary-key of another table. Or in other words, a non-key attribute of a
Foreign-Key A non-key table, which is the primary-key of some other table is known as
C
attribute, whose values are foreign-key.
derived from the primary key
&
of some other table, is known The table in which this non-key attribute i.e., the foreign-key attribute
as Foreign-Key in its current exists, is called a Foreign table or Detail table, and the table that
AI
table. defines the Primary-key, which the foreign-key of detail-table refers to, is
called Primary table or Master table.
R
one or more relations as its operand(s) and produces another relation as its result. The
operations defined in relational algebra include select, project, cartesian product, union, set
AN
(a) (b)
Fig. 11.9 (a) SELECT operation ; (b) Result of s price > 14.00 (Items)
.
O
Similarly, tuples of the Suppliers where the city is “Delhi’’ can be selected as
s city = “Delhi” (Suppliers)
C
In general, in a selection predicate (condition), all
&
Item# Item-Name Price relational operators (=, ¹ , <, £, >, ³) may be used. Also,
more than one condition may be combined using the
AI
I5 Plain Biscuit 6.00
connectives and (denoted by Ù) and or (denoted by Ú).
I8 Cold Drink 8.00
Thus, to find those tuples pertaining to prices between
R
C002 Reva Reva 21000.00 22300.00 C003 Astha Ravi 25199.00 24801.00
(a) (b)
Fig. 11.12 (a) Result of s cust-name = banker-name (Customer), (b) Result of s amount > balance (Customer).
UNIT 2-1 : Data Management
11.15
DATABASE CONCEPTS
.
Cookz Bangalore That means, from table Suppliers (table
O
Haldiram Jaipur name is given in parenthesis), project
C
(b) (denoted by p) the attributes Supp-Name and
City. The relation resulting from this query
Fig. 11.13 (a) The PROJECT operation will select the vertical subset
&
is as shown in Fig. 11.13(b).
of entire table. (b) Result of pSupp-Name, City (Suppliers).
AI
However, if you give the project query as follows :
R
City Supp-Name
PA
Delhi Britannia
Mumbai New Bakers
AN
Jaipur Haldiram
D
It is obvious from the above table that you can specify the order of attributes in the
resulting relation. Since these operations result in a relation, project operation can also be
applied on a resulting relation of a query. For example, if you are interested only in the
names of those items that are costlier than ` 14.00, you may write it as
pItem-name (s prime > 14.00 (Items))
Item-Name City First the inner query is evaluated and then the
Milk Delhi outer query is evaluated using the relation that
resulted from the inner query. The result of
Ice cream Mumbai
above query is as shown below in Fig. 11.15(a).
Namkeen Bangalore
Duplicating tuples are automatically removed
Jaipur in the resulting relation. For instance, if you write
(a) (b) pCity (Suppliers)
Fig. 11.15 Result of (a) pItem-Name (s price > 14.00 (Items)) the resulting relation will be as shown in
(b) pCity (Suppliers). Fig. 11.15(b).
11.16 Support Material
.
O
in Fig. 11.16.
Fig. 11.16 Student and Instructor Relations.
C
The cartesian product of these two relations, Student ´ Instructor, will yield a
relation that will have a degree of 6 (3 + 3 : sum of degrees of Student and Instructor)
Student ´ Instructor
&
and a cardinality 4 (2 ´ 2 : product of
AI
cardinalities of two relations). The
Stud# Stud-Name Hosteler Inst# Inst-Name Subject resulting relation has been shown in
S001
R
Meenakshi Y I01 K. Lal English Fig. 11.17(a).
S001 Meenakshi Y I02 R.L. Arora Maths See the resulting relation contains all
T
two relations.
S002 Radhika N I02 R.L. Arora Maths The illustration of cartesian product of
two tables has been given in Fig. 11.17(b).
AN
Fig. 11.17 (a) The Cartesian Product of Student and Instructor tables.
Table A Table B
H
Table A ´ B is
.
O
Now, if you want to obtain union of these two relations (shown in Fig. 11.12), you may
C
write
(s cust-name = banker-name (Customer)) È (s amount > balance (Customer))
&
And the resultant relation will be as shown in Fig. 11.18(b).
AI
Drama Song
A B
R
17 Swati 14 10 Rishabh 15
PA
13 Kush 15 AÈB
Result of Song È Drama will be
AN
13 Kush 15
17 Swati 14
(a)
(b)
Fig. 11.18 (a) Union operation, (b) Result of (s cust-name = banker-name (Customer) ) È (s amount > balance (Customer) ) . (Named as NewCust)
Do remember that in the resultant relation, all the duplicating tuples will auto-
matically be removed.
containing those tuples in A but not in B. Let us assume that the relation shown in
Fig. 11.18 has been named as NewCust. Now the result of difference between Customer
(shown in Fig. 11.11) and NewCust will be as shown in Fig. 11.19(b).
Drama Song
A B
Rollno Name Age Rollno Name Age
13 Kush 15 2 Manya 15
17 Swati 14 10 Rishabh 15
13 Kush 15 A-B
Result of Drama – Song will be
.
(a) Result of Drama - Song
O
C
Cust# Cust-Name Banker-Name Amount Balance
C001 Anjuman Rakesh 17510.00 17510.00
be rewritten by replacing the denoted by Ç . That means A Ç B will yield a relation having tuples
intersection operation with a common to A and B. The result produced by Customer Ç NewCust
pair of set difference opera-
AN
(customer shown in Fig. 11.11, NewCust shown in Fig. 11.18) has been
tions as :
shown in Fig. 11.20(b).
A Ç B = A - (A - B ).
H
Drama Song
D
13 Kush 15 2 Manya 15
17 Swati 14 10 Rishabh 15
13 Kush 15
AÇB
Result of Song Ç Drama will be
Fig. 11.20 (a) Set Intersection operation, (b) Result of Customer Ç NewCust.
UNIT 2-1 : Data Management
11.19
DATABASE CONCEPTS
? 1.
2.
What is the function of select and
project operations ?
What is cartesian product ? How is it
allocation and search paths often prefer network systems.
But rarely network models have gone commercial ;
mostly they remain confined to development or testing
different from join operation ? stages. People who believe that users should be isolated
.
from the mechanisms used to access data, and those who
O
3. What does union operation do ? What
are the conditions to carry out union assign high priority to a model that is easy to understand
C
operation ? and construct, frequently favour relational systems.
4. What are the set difference and set However, few database systems are fully relational.
&
intersection operations ? How are they Instead they graft relational features on a basic
different ? hierarchical/network structure.
AI
R
L e t Us R e v i s e
T
Ò A collection of data is referred to as database and a database (management) system is basically a computer based
PA
Ò A database is implemented through three general levels : internal (closest to physical storage), conceptual (describes entire
data and its relationships) and external (closest to the users).
Ò Data independence allows modification of a scheme definition without affecting other scheme definitions.
H
There are three models available for database management : relational, network & hierarchical.
Ò A relational data model organizes the data into tables known as relations.
Ò A network data model represents data by collection of records and relationships among data are represented by
links (pointers).
Ò A hierarchical data model represents data by records organized in form of trees and the relationships among data
are represented by links.
Ò A view is a virtual table derived from one or more underlying base tables.
Ò There is no stored file created for storing a view’s contents, rather, only the view definition is stored.
Ò The relational algebra is a collection of operations on relations. The various operations of relational algebra are
select, project, cartesian product, union, set difference, set intersection, join etc.
Ò The selection extracts tuples from a relation depending upon a condition (symbol s).
Ò The projection extracts columns from a relation. (Symbol p).
Ò The cartesian product yields a relation with all possible combinations of the tuples of the two relations operated
upon. (Symbol ´).
Ò The union yields a relation containing tuples from both the operand relations. (Symbol È).
Ò The set difference finds tuples that are in one relation but not in another. (Symbol -).
Ò The set intersection finds tuples that are common to the two operand relations (Symbol Ç).
11.20 Support Material
Solved Problems
1 What is a database system ? What is its need ?
Solution. A database is a collection of interrelated data and a database system is basically a computer
based record keeping system.
A typical file processing system suffers from some major limitations like data redundancy (duplication of
data), data inconsistency, unsharable data, unstandardized data, insecure data, incorrect data etc. On the other
hand, a database system overcomes all these limitations and ensures continues efficiency.
The advantages provided by a database system are :
(i) Reduced data redundancy (ii) Controlled data inconsistency (iii) Shared data
(iv) Standardized data (v) Secured data (vi) Integrated data.
Therefore, to have the systems with increased performance and efficiency, the database systems are preferred.
2 Can you think of disadvantages of using a database system ? What are these ?
Solution. With the complex tasks to be performed by database systems, certain things may crop up that
.
O
may be termed as disadvantages of using database system. These are :
1. Security may be compromised without good controls.
C
2. Integrity may be compromised without good controls.
&
3. Extra hardware may be required.
4. Performance overhead may be significant.
AI
5. System is likely to be complex.
from and storing new information into the database. There are three different types of database system
users, differentiated by the way they expect to interact with the system.
PA
1. End User. An end user is a person who is not a computer trained person but uses the database to
retrieve some information. For example, in a railway reservation system, a customer retrieving the
AN
of the database such as which storage device, which storage technique should be used ? etc.
2. Network Model. The network model represents data by collections of records and relationships
among data are represented by links which can be viewed as pointers. The records in the database are
organized as collection of arbitrary graphs.
3. Hierarchical Model. The hierarchical model is similar to the network model in the sense that data
and relationships among data are represented by records and links respectively. It differs from the
network model in that the records are organized as collections of trees rather than arbitrary graphs.
The relational model differs from the network and hierarchical models in that it does not use pointers
or links. Instead, the relational model relates records by the values they contain.
.
not all of the information in a table.
O
7 Differentiate between Candidate Key and Primary Key in context of RDBMS. (Delhi 2008)
C
Solution. Candidate key. A candidate key is the one that is capable of becoming primary key i.e., a field or
attribute that has unique value for each row in the relation.
&
Primary key is a designated attribute or a group of attributes whose values can uniquely identify the tuples
AI
in the relation.
8 Give a suitable example of a table with sample data and illustrate Primary and Candidate Keys in it.
R
(Delhi 2012)
Solution. Example : Table : Class 11
T
PA
1083 2 Irfan 75
2011 3 Maya 63
1000 4 Shaun 60
H
999 5 Sukhi 92
D
1200 6 Zoya 86
In the table, columns AdmNo and RollNo have unique values for each row, so both are candidates to
become primary keys. Hence both of these columns are Candidate Keys. Out of these two, we can assign one
as Primary key and the other one will become alternate key.
Candidate keys : AdmNo, RollNo
Primary key : RollNo
Alternate key : AdmNo
9 Differentiate between Candidate Key and Alternate Key in context of RDBMS. (Outside Delhi 2008)
Solution. Candidate key. A candidate key is the one that is capable of becoming primary key i.e., a field or
attribute that has unique value for each row in the relation.
A candidate key that is not a primary key is called an Alternate Key.
11 What is the importance of a Primary Key in a table ? Explain with a suitable example. (Outside Delhi 2007)
Solution. A Primary Key is a set of one or more attributes that can uniquely identify tuples within the
relation. For example, in the following table Student, the column Rollno can uniquely identify each row in
the table, hence Rollno is the primary key of the following table.
.
O
13 What is relation ? Define the relational data model. (Delhi 2002)
C
Solution. A relation is a table having atomic values, unique rows and unordered rows and columns.
The relational model represents data and relationships among data by a collection of tables known as
&
relations, each of which has a number of columns with unique names.
AI
14 What is an Alternate Key ? (Delhi 2006)
Solution. A candidate key that is not a primary key is called an Alternate Key. In Suppliers table if there
R
are two candidate keys - SuppId and Supp_Name and SuppId is the primary Key then Supp_Name is the
alternate key.
T
PA
15 What do you understand by the terms Primary Key and Degree of a relation in relational database ? (Delhi 05)
Solution. Primary key is an attribute or a group of attributes whose values can uniquely identify the
tuples in the relation.
AN
16 What do you understand by the terms Candidate Key and Cardinality of a relation in relational database ?
H
Solution. Candidate key. A candidate key is the one that is capable of becoming primary key i.e., a field or
attribute that has unique value for each row in the relation.
Cardinality of a relation represents number of rows in the relation.
17 Differentiate between the terms Degree and Cardinality in context of RDBMS (Delhi 2008C)
Or
What is the difference between degree and cardinality of a table ? What is the degree and cardinality of the
following table ? (Delhi 2013)
18 What do you understand by Primary Key ? Give a suitable example of Primary Key from a table containing
some meaningful data. (Outside Delhi 2010)
Solution. A primary key is a designated attribute or group of attributes whose values can uniquely
identify the tuples in the relation.
Example Table : ITEM
.
Primary Key
O
19 What do you understand by Candidate Keys in a table ? Give a suitable example of Candidate keys from a
C
table containing some meaningful data. Or (Delhi 2010)
&
What are candidate keys in a table ? Give a suitable example of candidate keys in a table. (Delhi 2009)
Solution. A candidate key is the one that is capable of becoming primary key i.e., a field or attribute that
AI
has unique value for each row in the relation.
R
Example Table : ITEM
Ino
T
Item Quantity
PA
Candidate Keys
20 What is the purpose of a key in a table ? Give an example of a key in a table. (Outside Delhi 2009)
Soluion. An attribute/group of attributes in a table that identifies each tuple uniquely is known as a Key
(also called Primary Key).
Example Table : Item
Key
11.24 Support Material
21 What do you understand by Union and Cartesian Product operations in relational algebra ? (Delhi 2011)
Or
Explain the concept of Cartesian Product between two tables, with the help of appropriate example.
(Outside Delhi 2014)
Or
Explain the concept of Union between two tables, with the help of appropriate example. (Delhi 2014)
Soluion. Cartesian Product. The Cartesian product of two relations A and B is written as A ´ B. The
cartesian product of two relations yields a relation with all possible combinations of the tuples of the two
relations operated upon.
For example, given two relations Student and Instructor as shown below :
Student Instructor
.
O
S001 Meenakshi Y I01 K. Lal English
C
S002 Radhika N I02 R.L. Arora Maths
S003 Abhinav N
&
The cartesian product of these two relations, Student ´ Instructor, will yield a relation as :
AI
Stud# Stud-Name Hosteler Inst# Inst-Name Subject
R
Union. The union operation produces a third relation that contains tuples from both the operand
D
relations which must be union-compatible. To denote the union of two relations X and Y, we write as X È Y.
Table X Table Y
No Name No Name
1 Toy 3 Pencil
2 Drum
SQL Statement
SELECT * FROM X UNION SELECT * FROM Y ;
No Name
1 Toy
2 Drum
3 Pencil
UNIT 2-1 : Data Management
11.25
DATABASE CONCEPTS
22 What do you understand by Selection and Projection operations in relational algebra ? (O.D. 2011)
Soluion. Selection means selecting some rows (tuples) from a relation according to given condition
e.g., s price > 20.2 (Item)
Project operation yields a vertical subset of a given relation(i.e., select all tuples containing only given
columns of a relation).
e.g., pNAME, DESIG (Employee)
23 Observe the following table CANDIDATE carefully and write the name of the RDBMS operation out of
(i) SELECTION (ii) PROJECTION (iii) UNION (iv) CARTESIAN PRODUCT, which has been used to
produce the output as shown in RESULT. Also, find the Degree and Cardinality of the RESULT.
.
C1 AJAY LAW C3 ROHAN
O
C2 ADITI MEDICAL
C
C3 ROHAN EDUCATION
C4 RISHAV ENGINEERING
24 Observe the following table MEMBER carefully and write the name of the RDBMS operation out of
T
(i) SELECTION (ii) PROJECTION (iii) UNION (iv) CARTESIAN PRODUCT, which has been used to
PA
produce the output as shown in RESULT. Also, find the Degree and Cardinality of the RESULT.
25 A database is to contain information about persons and skills. At a particular time the following persons are
represented in the database, and their skills are as indicated.
Person Skills
Anuj Programming
Bhawna Operating and Programming
Chetan Engineering and Programming
David Operating and Engineering
For each person, the database contains various personal details, such as address. For each skill it contains an identification
of the appropriate basic training course, an associated job grade code, and other information. The database also contains the
date each person attended each course, where applicable (the assumption is that attendance at the course is essential
before the skill can be said to be acquired ). Sketch a relational view for this data.
11.26 Support Material
Perskill
.
Bhawna Programming
O
Chetan Engineering -
C
Chetan Programming -
David Operating -
&
David Engineering -
AI
Solved Problems
R
Thinking Skills
Solution. A relation is a table having atomic values, unique rows and unordered
rows and columns.
AN
4. What are the various data models available for database systems ?
Solution. A data model is a collection of conceptual tools for describing data, data
relationships, data semantics etc. There are generally three data models available :
relational, network and hierarchical model.
1. Relational Model. The relational model represents data and relationships
among data by a collection of tables known as relations, each of which has a
number of columns with unique names.
2. Network Model. The network model represents data by collections of records
and relationships among data are represented by links which can be viewed
as pointers. The records in the database are organized as collection of arbitrary
graphs.
3. Hierarchical Model. The hierarchical model is similar to the network model
in the sense that data and relationships among data are represented by
records and links respectively. It differs from the network model in that the
.
records are organized as collections of trees rather than arbitrary graphs.
O
The relational model differs from the network and hierarchical models in that it
C
does not use pointers or links. Instead, the relational model relates records by the
values they contain.
&
AI
Glossary
R
stored file.
Candidate Keys Attribute combinations in a relation that can serve as a primary key.
AN
Assignments
Short Answer Questions
1. Draw a diagram explaining various levels of data Write relational expressions for questions 12 – 25
abstraction. using the above relations.
2. What is meant by “data independence’’ ? Make a
12. List employees’ first and last names.
list of data independence capabilities.
13. Give all information on employees in manu-
3. What are the main differences between a file-pro-
facturing systems.
cessing system and a database management system ?
14. Just give employee names in manufacturing
4. This chapter has described some of the major
systems.
advantages of a database system. What are the
disadvantages ? 15. Give employee names outside manufacturing
systems.
5. Explain the difference between physical and
.
logical data independence. 16. List employee names and their departments for
O
employee numbers between 100 and 300.
6. Illustrate the difference between the three levels
17. List employees numbers, names and educational
C
of data abstraction.
levels for three education levels 16, 18, and 20.
7. What is a relation ? What is the difference
18. List employee number and name in the
&
between a tuple and an attribute ? (Delhi 1998)
manufacturing systems for job code 54.
8. What is a relation ? Define the properties of a
AI
relation. 19. Same request as above with education level
below 15.
9. Summarise the major differences between a
R
relation and a traditional file. 20. List those employees’ last name and monthly
salary plus bonus, whose job code is at least three
10. Give a network data-structure diagram for the
times their education level and whose annual
T
enrolment (course-name, student-name, grade) 23. List employee number, name and department
number for employees who are managers.
D