B+ Tree Index Structure
B+-Tree Index File
• So far, we have seen single level indexing (primary indexing, cluster based
indexing and secondary indexing) and also multilevel indexing for accessing the
records stored in disk
• ISAM files – Ordered files with a multilevel primary/clustering index
• Index sequential files are static structure
• Disadvantage of indexed-sequential files
• Performance degrades as file grows, since many overflow blocks get created
• Periodic reorganization of entire file is required
• If the files are dynamic, we need to go for dynamic multi-level index structures
based on B+ trees.
• B+ tree indices are alternative of to index sequential files
Advantages of B+ Tree
• Advantage of B+-tree index files:
• Unlike index sequential structure, B+ tree has many fan outs i.e. number of
pointers to child nodes of a node, which minimizes the number of I/O access
• Automatically reorganizes itself with small, local, changes
• Efficient for insertion, deletion and update and search
• Keep keys in sorted order for sequential traversing
• Use a hierarchical index to minimize the number of disk reads
• Reorganization of entire file is not required to maintain performance
• (Minor) disadvantage of B+-trees:
• Extra insertion and deletion overhead, space overhead
• Advantages of B+-trees outweigh disadvantages, and they are used extensively.
Properties of B+ Tree
B+ tree is rooted tree satisfying following properties
• It is a generalization of BST
• Balanced search trees : All leaves are at the same level
• It has root, internal nodes and leaf nodes
• Root may be leaf or a node with two or more children
• Each node that is not a root or a leaf has between and n children
• A leaf node has between and n − 1 values
• Special cases: If the root is not a leaf, it has at least 2 children
If the root is a leaf (that is, there are no other nodes in the tree), it
can have between 0 and (n − 1) values
Properties of B+ Tree
• B+ tree is an extension of B tree. B+ tree has index page and data
page. In B tree, keys/ indices and records both are stored in leaf nodes
as well as internal nodes. That’s why the fan-out in B-trees is less
• However, in B+ tree, internal nodes contain the search keys/index and
leaf nodes store the data records
• All leaf nodes are linked up as a singly linked list to make the search
operation more efficient
• Structure of leaf and non-leaf nodes is different in B+ tree, while, it is
same in B tree.
Properties of B+ Tree
• B+ tree use fill factor to control growth and shrinkage. Minimum fill factor is
50%
• Support both random and sequential access of records
Root
40
20 33 51 63
10 15 20 27 33 37 40 46 51 55 63 97
Properties of B+ Tree
• Order of B+ tree is number of children of a node
• Order of internal node
The maximum number of tree pointers held in it
Maximum of (n-1) keys can be present in an internal node
• Order of leaf node
The maximum number of record pointers held in it
It is equal to the number of keys in a leaf node
Node Structure
• Typical node
• Ki are the search-key values
• Pi are pointers to children (for non-leaf nodes) or pointers to records or
buckets of records (for leaf nodes).
• The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(assume no duplicate keys)
• Node structure order =3 P1 K1 P2 K2 P3
Leaf Nodes in B+ tree
• For i = 1, 2, . . ., n–1, pointer Pi points to a file record with search-key value Ki,
• If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than or equal to
Lj’s search-key values
• Pn points to next leaf node in search-key order
Leaf Nodes in B+ tree
P1 K1 P2 K2 P3
Record of K2
Record of K1
Record of K2
Non Leaf Nodes in B+ tree
• Non leaf nodes form a multi-level sparse index on the leaf
nodes. For a non-leaf node with n pointers:
• All the search-keys in the subtree to which P1 points are less
than K1
• For 2 i n – 1, all the search-keys in the subtree to which
Pi points have values greater than or equal to Ki–1 and less
than Ki
• All the search-keys in the subtree to which Pn points have
values greater than or equal to Kn–1
Non Leaf Nodes in B+ tree
P1 K1 P2 K2 P3
S1 S2 S3
Example of B+ Tree
• Order n =6
• Leaf nodes must have between 3 and 5 values
((n-1/2 ) and n –1, with n = 6
• Non-leaf nodes other than root must have between 3 and 6 children
(n/2 and n with n =6
• Root must have at least 2 children
Observations about B+ tress
• Since the inter-node connections are done by pointers, there is no
assumption that in the B+-tree, the “logically” close blocks are
“physically” close.
• The non-leaf levels of the B+-tree form a hierarchy of sparse indices.
• The B+-tree contains a relatively small number of levels (logarithmic
in the size of the main file), thus searches can be conducted efficiently.
• Insertions and deletions to the main file can be handled efficiently, as
the index can be restructured in logarithmic time
Searching in B+ tree
Searching in B+ tree
• Search records which has the search key value k
• Examine the root and finding the smallest search key (k1) in the node that
is greater than k
• Follow the pointer P1 for the corresponding k1 until reach to the leaf node
• If k is equal to key in node then follow the corresponding pointer to
access the record.
B+ Tree Insertion (1/4)
• Find the appropriate leaf node in which search key would appear
• If node contains key, do nothing. Insert the record with the key into file and add to
bucket a pointer to file
• Otherwise, insert the key in the node (if not full) and position it so that search key
values are sorted in the node and then add the record to file
• Example : order = 6, no. of keys = 5
10 13 14 15
10 12 13 14 15
• Insert 12
B+ Tree Insertion (2/4)
• If node is full or overflow, i.e. when number of search key values
exceeds n-1. Do the following steps
• Insert 11:
10 11 12 13 14 15
Node is overflow as the maximum number of keys= 5
B+ Tree Insertion (3/4)
If overflow happens in leaf node
• Split the node into two nodes
• 1st node contains ceil((n-1)/2) values ( n: number of pointer)
• 2nd node contains remaining values
• Copy the smallest search value of the 2nd node to parent node
13
10 11 12 13 14 15
B+ Tree Insertion (4/4)
If overflow happens in non-leaf node
• Split the node into two nodes
• 1st node contains (ceil(n/2) -1) values ( n: number of pointer/children)
• Move the smallest of remaining value , together with pointer, to parent
node
• 2nd node contains remaining values
12
10 11 13 14 15
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
• Insert
1
1
28
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1
• Insert 3,
5
29
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1
• Insert 3,
5
1 3 5
30
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1 3 5
• Insert
7
31
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1 3 5
• Insert
7
1 3 5 7
32
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5
1 3 5 7
• Insert
9
33
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5
1 3 5 7
• Insert
9
5
1 3 5 7 9
34
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5
1 3 5 7 9
• Insert
2
35
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5
1 3 5 7 9
• Insert
2
5
1 2 3 5 7 9
36
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5
1 2 3 5 7 9
• Insert
4
37
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5
1 2 3 5 7 9
• Insert
3 5
4
1 2 3 4 5 7 9
38
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
3 5
1 2 3 4 5 7 9
• Insert
6
3 5 7
1 3 5 7
2 4 6 9
40
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
3 5 7
1 2 3 4 5 6 7 9
• Insert
8
3 5 7
1 2 3 4 5 6 7 8 9
42
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
3 5 7
1 2 3 4 5 6 7 8 9
• Insert
10 7
3 5 9
7 8 9 10
1 2 3 4 5 6
44
• Example : To calculate of order p of B+ tree
• Search key field is V = 9 bytes . Block size is B = 512 bytes . Record pointer is Pr = 7 bytes . Block
pointer/tree pointer is P = 6 bytes
• An internal node can have up to p tree pointers and p-1 key fields, these must fit in a single block . Thus,
p * P + (p-1) * V ≤ B
p*6+(p-1)9 ≤ 512 29
15p ≤ 521
p ≤ 34 for intermediate nodes
• The leaf nodes have the same number of values and pointers, except they are data pointers and next
pointer.
• The order of pleaf can be calculated as follows:
Pleaf * (Pr+V) + P ≤ 512
Pleaf * (7+9) + 6 ≤ 512
Pleaf * 16 ≤ 512 - 6
Pleaf * 16 ≤ 506
Pleaf ≤ 31
• Construct a B+ tree on previous example
• Assume each node is 69% full
• p = 34, pleaf = 31
• On the average, each internal node have 0.69 * 34 ~ 23 pointers and 22 key values
• On the average leaf node has 0.69 * 31 ~ 21 data record pointers
• Root 1 node 22 key entries (22+1=23) ptrs
• Level1 23odes 23*22 (506) (506+23=529) ptrs
• Level2 23*23(529) 528*22(11638) (11638+529=12167)
• Leaf 529*23(12167) 12167*21 (255507) data record ptrs
Numerical on Secondary Index
In a file, number of records r= 30,000
Each record is fixed length with size= 100 bytes. Block size= 1024 bytes
Blocking factor (bfr) = flooring (1024/100)= 10 records/ block
Number of required blocks for r = ceiling (30000/10)= 3000
If we perform linear search on data file , it will take 3000/2= 1500 block access on average
Let we use secondary index on a non key attribute of size 9 bytes and index pointer is 6 bytes long ,
so each index record is of (9+6)= 15 bytes long
Thus blocking factor is = flooring (1024/15)= 68 entries/block
Since secondary index is dense, so number of index entries = number of records
No of required blocks in index structure = ceiling (30000/ 68) =442 blocks
If we do perform binary search then it would require ceiling (log(442))= 9 blocks access
To search a record using the index we need one additional block access in data file
Thus total block access is 10
Numerical on Multi level Index
• In multilevel index, the index file is the first level (base) of multilevel index, as an
ordered file with distinct values for each primary key.
• We can create a primary index for the first level is called the second level multilevel
index. The second level has one entry for each block, as it is a primary index (we can
use block anchors)
• The process repeats for 3rd level and so on until all entries fit in one disk block for
the top level
b1= number of blocks in first level= 442 blocks
b2= number of blocks in second level= ceiling (442/68) = 7 blocks
b3 = ceiling (7/ 68)= 1 block
So, three block access + 1 ( for record in data file)= 4 block access are needed