Aplikasi Database MKG
#7
Edward Trihadi
Ins 7 - STMKG
23.11
Index
• Indexing mechanisms used to speed up access to desired data.
o + E.g., author catalog in library
• Search Key - attribute to set of attributes used to look up records in a file.
• An index file consists of records {called index entries) of the form
Search Key Pointer
• Index files are typically much smaller than the original file
• Two basic kinds of indices:
• + Ordered indices: search keys are stored in sorted order
• + Hash indices: search keys are distributed uniformly across “buckets” using a
“hash function”.
Index Evaluation Metrics
• Access types supported efficiently. E.g.,
• + records with a specified value in the attribute
• + or records with an attribute value falling in a specified range of values.
• Access time
• Insertion time
• Deletion time
• Space overhead
Ordered Indices
• In an ordered index, index entries are stored sorted on the search key value. E.g.,
author catalog in library.
• Primary index: in a sequentially ordered file, the index whose search key specifies
the sequential order of the file.
• + Also called clustering index
• « The search key of a primary index is usually {but not necessarily) the primary key.
• Secondary index: an index whose search key specifies an order different from the
sequential order of the file. Also called non-clustering index.
• Index-sequential file: ordered sequential file with a primary index.
Dense Index Files
• Dense index — Index record appears for every search-key value in the
file
Spare Index Files
• Sparse Index: contains index records for only some search-key values.
• + Applicable when records are sequentially ordered on search-key
• To locate a record with search-key value K we:
• + Find index record with largest search-key value < K
Spare Index Files
• Compared to dense indices:
• + Less space and less maintenance
overhead for insertions and deletions.
• « Generally slower than dense index for
locating records.
• Good tradeoff: sparse index with an
index entry for every block in file,
corresponding to least search-key
value in the block.
Multilevel Index
• If primary index does not fit in memory, access becomes expensive.
• Solution: treat primary index kept on disk as a sequential file and
construct a sparse index on it.
• # outer index — a sparse index of primary index
• # inner index — the primary index file
• If even outer index is too large to fit in main memory, yet another
level of index can be created, and so on.
• Indices at all levels must be updated on insertion or deletion from the
file.
Multilevel Index
(Cont.}
Index Update:
Deletion
• If deleted record was the only record in the file with its particular search-
key value, the search-key is deleted from the index also.
• Single-level index deletion:
• + Dense indices — deletion of search-key:similar to file record deletion.
• + Sparse indices —
• » if an entry for the search key exists in the index, it is deleted by replacing the entry
in the index with the next search-key value in the file (in search-key order).
• » If the next search-key value already has an index entry, the entry is deleted instead
of being replaced.
Index Update: Insertion
• Single-level index insertion:
• + Perform a lookup using the search-key value appearing in the record to
be inserted.
• + Dense indices — if the search-key value does not appear in the index,
insert it.
• + Sparse indices — if index stores an entry for each block of the file, no
change needs to be made to the index unless a new block is created.
• » If a new block is created, the first search-key value appearing in the new block is
inserted into the index.
• Multilevel insertion (as well as deletion) algorithms are simple
extensions of the single-level algorithms
Secondary Indices
• Frequently, one wants to find all the records whose values in a certain
field {which is not the search-key of the primary index) satisfy some
condition.
• « Example 1: In the account relation stored sequentially by account
number, we may want to find all accounts in a particular branch
• + Example 2: as above, but where we want to find all accounts with a
specified balance or range of balances
• We can have a secondary index with an index record for each search-
key value
Secondary Indices Example
Secondary index on balance field of account
• Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
• Secondary indices have to be dense
Primary and Secondary Indices
• Indices offer substantial benefits when searching for records.
• BUT: Updating indices imposes overhead on database modification -- when a file
is modified, every index on the file must be updated,
• Sequential scan using primary index is efficient, but a sequential scan using a
secondary index is expensive
• + Each record access may fetch a new block from disk
• + Block fetch requires about 5 to 10 milliseconds
• » versus about 100 nanoseconds for memory access
STATIC HASHING
• A bucket is a unit of storage containing one or more records 9a bucket is
typically a disk block).
• In a hash file organization, we obtain the bucket of a record directly from its
search-key value using a hash function.
• Hash function h is a function from the set of all search-key values to the set of all
bucket addresses B.
• Hash function is used to locate records for access, insertion as well as deletion.
• Records with different search-key values may be mapped to the same bucket;
thus, entire bucket has to be searched sequentially to locate a record.
Hash Functions
• Worst hash function maps all search-key values to the same bucket; this makes
access time proportional to the number of search-key values in the file.
• An ideal hash function is uniform, i.e., each bucket is assigned the same number
of search-key values from the set of all possible values.
• Ideal hash function is random, so each bucket will have the same number of
records assigned to it irrespective of the actual distribution of search-key values
in the file.
• Typical hash functions perform computation on the internal binary
representation of the search-key.
• + For example, for a string search-key, the binary representations of all the characters in the
string could be added and the sum modulo the number of buckets could be returned.
Hash Indices
• Hashing can be used not only for file organization, but also for index-
structure creation.
• A hash index organizes the search keys, with their associated record
pointers, into a hash file structure.
• Strictly speaking, hash indices are always secondary indices
• « if the file itself is organized using hashing, a separate primary hash index
on it using the same search-key is unnecessary.
• + However, we use the term hash index to refer to both secondary index
structures and hash organized files.
Deficiencies of Static Hashing
• In static hashing, function # maps search-key values to a fixed set of B of bucket
addresses. Databases grow or shrink with time.
o If initial number of buckets is too small, and file grows, performance will
degrade due to too much overflows.
o If space is allocated for anticipated growth, a significant amount ofvspace will
be wasted initially {and buckets will be underfull).
o If database shrinks, again space will be wasted.
• One solution: periodic re-organization of the file with a new hash function
o Expensive, disrupts normal operations
• Better solution: allow the number of buckets to be modified dynamically.
Dynamic Hashing
• Good for database that grows and shrinks in size
• Allows the hash function to be modified dynamically
• Extendable hashing — one form of dynamic hashing
o + Hash function generates values over a large range — typically b-bit integers, with
b= 32.
o At any time use only a prefix of the hash function to index into a table of bucket
addresses.
o Let the length of the prefix be ibits, 0 <7<32.
▪ Bucket address table size = 2i Initially i = 0
▪ Value of i grows and shrinks as the size of the database grows and shrinks.
o Multiple entries in the bucket address table may point to a bucket (why?)
o Thus, actual number of buckets is < 2i
▪ The number of buckets also changes dynamically due to coalescing and splitting of buckets.
Comparison of Ordered Indexing and Hashing
• Cost of periodic re-organization
• Relative frequency of insertions and deletions
• Is it desirable to optimize average access time at the expense of worst-case access time?
• Expected type of queries:
o + Hashing is generally better at retrieving records having a specified value of the key.
o + if range queries are common, ordered indices are to be preferred
• In practice:
o « PostgreSQL supports hash indices, but discourages use due to poor performance
o + Oracle supports static hash organization, but not hash indices
o SQLServer supports only B+-trees
TUGAS
• Database penggantian Suku Cadang
• Pemeliharaan Peralatan MKG
• Data Klimatologi
• Database Peralatan Operasional
• Data Pegawai • install DB Browser.
• Data Klimat https://sqlitebrowser.org/dl/
• Barang Habis Pakai
• Mysql
•
[email protected]