0% found this document useful (0 votes)
205 views8 pages

COP 5725 Database Systems Homework 2

This document contains instructions for Homework 2 for the Database Management Systems (COP 5725) course taught by Dr. Daisy Zhe Wang. It includes 5 questions covering topics such as functional dependencies, buffer management, disk manager, and B+ trees. Students are asked to provide solutions to exercises related to determining normal forms of relations based on given functional dependencies, analyzing page access patterns with LRU and MRU replacement policies, calculating time costs for sequential and random disk access, and manipulating a B+ tree through insertions and deletions.

Uploaded by

Chuang James
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views8 pages

COP 5725 Database Systems Homework 2

This document contains instructions for Homework 2 for the Database Management Systems (COP 5725) course taught by Dr. Daisy Zhe Wang. It includes 5 questions covering topics such as functional dependencies, buffer management, disk manager, and B+ trees. Students are asked to provide solutions to exercises related to determining normal forms of relations based on given functional dependencies, analyzing page access patterns with LRU and MRU replacement policies, calculating time costs for sequential and random disk access, and manipulating a B+ tree through insertions and deletions.

Uploaded by

Chuang James
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Database Management Systems (COP 5725)

Homework 2
Instructor: Dr. Daisy Zhe Wang

TAs:
Yang Chen, Kun Li, Yibin Wang
yang, kli, [email protected] l.edu
October 24, 2012

Name:
UFID:
Email Address:

Pledge(Must be signed according to UF Honor Code)


On my honor, I have neither given nor received unauthorized aid in doing this assignment.

Signature

For grading use only:

Question: I II III IV V Total


Points: 26 18 18 22 16 100
Score:

i
COP5725, Fall 2012 Homework 2 Page 1 of 7

I. [26 points] Functional Dependencies.


Consider the following set S of functional dependencies:

A→B (1)
AB → C (2)
AC → B (3)
B→C (4)

(1) i. [2 points] Given (1) and (4), prove (2) using Armstrong axioms or closure
test.
Solution:

AB → BB Augmentation of (1)
AB → B BB = B
AB → C Transitivity of (4) and above

ii. [2 points] Given (1) and (4), prove (3) using Armstrong axioms or closure
test.
Solution:

AC → BC Augmentation of (1)
AC → B Decomposition of above

iii. [2 points] Give all candidate keys of relation R(ABC) that satisfies (1-4).
Explain your answer.
Solution: There is only one candidate key A. {A}+ = {ABC}
iv. [2 points] Give a minimal cover for the set S.
Solution: The minimal cover for S is {A → B, B → C}, because you can generate
the other FDs using these two.

For the following exercises, suppose you are given the functional dependencies set S ≡
{AB → C, C → B, C → D}.
(2) [6 points] Given S, is the relation R1 (A, B, C) in 3NF? If yes, justify. If no, specify
at least one FD which violates the definition.
Solution: The relation R1 is in 3NF. AB → C is OK because AB is a superkey and
C → B is OK because B is part of the superkey AB (C → D is not applicable because
D is not in R1 ).
(3) [6 points] Given S, which normal form(s) (BCNF, 3NF) does relation R2 (C, D)
obey?
Solution: Clearly R2 is in BCNF. C → D is the only applicable FD and BCNF is
satisfied because C is a candidate key for R2 . Since BCNF is a stronger normal form
than 3NF, it is also in 3NF.
COP5725, Fall 2012 Homework 2 Page 2 of 7

(4) Decompose the relation R(A, B, C, D) into a collection of BCNF relations. Of


course make sure that the decomposition is lossless. Specifically:
i. [2 points] How many different decompositions exist?
ii. [4 points] List all such decompositions.
Solution: There are only 2 both lossless and BCNF decompositions:
• R1 (C, B, D) and R2 (A, C)
• R1 (C, B), R2 (C, D) and R3 (A, C)

II. [18 points] Buffer Management.


Suppose you have a Buffer Pool that can hold 3 pages. There are 26 blocks on your disk;
for simplicity we will refer to each block with a letter of the English alphabet between A
and Z. An “access pattern” is a string of letters that log the requests to the Buffer Pool
for pages. For each access pattern below, write down next to it the number of I/Os that
would occur starting with an empty buffer pool each time, for both the LRU and MRU
replacement policies. (Spaces in the access patterns are there just for legibility.)

Access pattern LRU MRU

ZYXW ZYXW

ABCD DCBA ABCD

ACEG BDFH ACEG BDFH

Solution:

Access pattern LRU MRU

ZYXW ZYXW 8 5

ABCD DCBA ABCD 6 6

ACEG BDFH ACEG BDFH 16 13

III. [18 points] Disk Manager.


Consider a disk with a sector size of 512 bytes, 2000 tracks per surface, 50 sectors per
track, five double-sided platters, and average seek time of 10 msec. Suppose a block size
of 1024 bytes is chosen. A file containing 100k records of 100 bytes each is to be stored
on such a disk and that no record is allowed to span two blocks.
COP5725, Fall 2012 Homework 2 Page 3 of 7

(1) [3 points] If the disk platters rotate 5400 rpm, what is the maximum rotational
delay?
Solution: If the disk platters rotate at 5400rpm, the time required for one complete
rotation, which is the maximum rotational delay, is
1
× 60 = 0.011s.
5400
The average rotational delay is half of the rotation time, 0.006s.
Note: The definition of maximum rotational delay is the time it takes to do a full rotation
excluding any spin-up time. If you use half rotation time according to the slides, we will
not deduct points.
(2) [3 points] If one track of data can be transferred per revolution, what is the trans-
fer rate?
Solution: The capacity of a track is 25K bytes. Since one track of data can be transferred
per revolution, the data transfer rate is
25K
= 2, 250KB/s.
0.011

(3) [6 points] What is the time required to read the file sequentially? What if the
disk were capable of reading/writing from all heads in parallel?
Solution: A file containing 100,000 records of 100 bytes needs 40 cylinders or 400 tracks
in this disk. The transfer time of one track of data is 0.011 seconds. Then it takes
400 × 0.011=4.4 seconds to transfer 400 tracks.

This access seeks the track 40 times. The seek time is 40 × 0.01=0.4 seconds. Therefore,
total access time is 4.4+0.4=4.8 seconds.

If the disk were capable of reading/writing from all heads in parallel, the disk can read
10 tracks at a time. The transfer time is 10 times less, which is 0.44 seconds. Thus total
access time is 0.44+0.4=0.84 seconds.
(4) [6 points] What is the time required to read each block in the file in a random or-
der? (assuming that each block request incurs the average seek time and rotational
delay)
Solution: For any block of data, average access time = seek time+rotational delay+transfer
time.

seek time = 10ms


rotational delay = 6ms
1K
transfer time = = 0.44ms
2, 250K/s

The average access time for a block of data would be 16.44ms. For a file containing 100k
records of 100 bytes, the total access time would be 164.4 seconds.
COP5725, Fall 2012 Homework 2 Page 4 of 7

IV. [22 points] B+ Trees.


Consider a B+ tree containing the elements 2,4,8,9,11,12,13,14,16,17,18,20. The tree has
order d = 2. This means that internal nodes have at least 2 keys and 3 pointers, and at
most 4 keys and 5 pointers; leaf nodes have at least 2 data entries and at most 4 data
entries.
(1) [4 points] In the picture below, draw what the tree looks like if the leaves are as
densely packed as possible. Draw directly on the picture.

Solution:

11 16

2 4 8 9 11 12 13 14 16 17 18 20

(2) [6 points] Based on your original tree from 1, what is the minimum number of
inserts that could cause a root split?
Solution: 3
(3) [6 points] Draw the tree just after the root has split with a minimum of insertions
as you described in 2. Use any value you like that would cause the split. We have
started you off by drawing the root and internal nodes; you need to fill them in and
draw the leaves.
Solution:
COP5725, Fall 2012 Homework 2 Page 5 of 7

12

8 11 16 17

2 4 8 9 10 11 11 12 13 14 16 16 17 18 20

(4) [6 points] Given the B+ tree shown in Figure 1, draw the tree after deleting keys
19, 20, 24 in the given order.

19

5 14 24 33

2 3 5 7 8 14 16 19 20 22 24 27 29 33 34 38 39

Figure 1: Exercise IV.4

Solution:
COP5725, Fall 2012 Homework 2 Page 6 of 7

5 14 19 33

2 3 5 7 8 14 16 22 27 29 33 34 38 39

V. [16 points] Record Formats.


A patient record consists of the following fixed-length fields: the patient’s date of birth,
social-security number, and patient ID, each 9 bytes long. It also has the following
variable-length fields: name, address, and patient history.
Note: Due to some ambiguity of this question, we accept open answers as long as they
are reasonably correct.
(1) [4 points] If pointers within a record require 8 bytes, and the record length is
a 2-byte integer, how many bytes, exclusive of the space needed for the variable-
length fields, are needed for the record? You may assume that no alignment fields
is required.
Solution: We get 9 × 3=27 for the fixed-length fields, 8 × 3=24 for the pointers, 2 for
record length. The total is 53 bytes; or
we get 9 × 3=27 for the fixed-length fields, 8 × 4=32 for the pointers. The total is 59
bytes.

The following four parts are based on the following assumption: Suppose we add fields
for tests and their results. Each test consists of a test name, a date, and a test result.
Assume that each such test requires 100 bytes. Also, suppose that for each patient and
each test a result is stored with probability p. You may introduce other variables for
model convenience.
This question is open to interpretation. There could be a large number of different tests and
each test could be taken by a patient many times. Therefore, the number of test result fields
can be unbounded, and it is not clear how to abtain the answers based on the given data.
(2) [6 points] Suppose we use a hybrid scheme, where room for k test results are kept
within the record, and additional test results are found by following a pointer to
another block (or chain of blocks) where those results are kept. As a function of p,
what value of k minimizes the amount of storage used for test results?
Solution: If we pre-allocate space in the record for k, we will have to use 4 bytes for a
pointer plus 100k bytes, plus 4 bytes to a pointer to another block (or chain of blocks).
This does not seem to provide us much advantage in terms of saving the storage. If we
need less fields than k, we will waste storage. And if we need more fields than k, then
it would not matter (as far as storage size) whether we had k fields in the record and x
fields outside the record or we had all k + x fields outside the record.
(3) [6 points] The amount of space used by the repeating test-result fields is not the
only issue. Let us suppose that the figure of merit we wish to minimize is the
COP5725, Fall 2012 Homework 2 Page 7 of 7

number of bytes used, plus a penalty of 5000 if we have to store some results on
another block (and therefore will require a disk I/O for many of the test-result
accesses we need to do). Under this assumption, what is the best value of k as a
function of p?
Solution: In this case, unlike 2, we do get some benefit from avoiding storing fields
outside the record. However, once we have at least one field stored outside the record,
there is no furhter penalty for as many other fields to be stored outside the record. The
best value for k seems to be around the expected number of fields: k = np.

You might also like