Java Collections Framework Deep Dive
Java Collections Framework Deep Dive
The Java Collections Framework (JCF), introduced in JDK 1.2, is a unified architecture for
representing and manipulating collections, which are objects that group multiple elements into a
single unit. It provides a standardized way to work with data structures like lists, sets, and maps,
enabling them to be manipulated independently of the details of their representation. Before the
JCF, Java relied on basic structures like arrays, Vector, and Hashtable, which lacked a common
interface, leading to inconsistent and cumbersome code. The framework was designed to
address these shortcomings and offers several profound advantages:
● Reduces Programming Effort: The JCF provides a comprehensive library of
ready-to-use data structures and algorithms. This saves developers from the complex and
error-prone task of implementing these structures from scratch, allowing them to focus on
application logic rather than low-level data management.
● Increases Performance: The framework offers highly optimized, production-quality
implementations of fundamental data structures. These implementations are fine-tuned
for speed and reliability, leading to better overall application performance.
● Promotes Software Reuse and Interoperability: By defining a standard set of interfaces
(e.g., Collection, List, Map), the JCF enables different parts of a system and even
unrelated APIs to exchange collections seamlessly. This common language fosters
interoperability and makes it easier to build modular, reusable components.
● Consistent API: The framework provides a consistent API with a common set of methods
across different implementations. For example, methods like add(), remove(), and size()
are present in ArrayList, LinkedList, and HashSet, which reduces the learning curve and
makes the code more predictable and maintainable.
The architecture of the JCF is built upon a hierarchy of interfaces. This design separates the
abstract definition of a collection from its concrete implementation. At the highest level, the
framework is divided into two main hierarchies.
● The Iterable Interface: This is the root interface for the entire collection hierarchy. Its sole
purpose is to provide an iterator() method, which returns an Iterator object. By extending
Iterable, the Collection interface guarantees that every collection in Java can be
traversed, or iterated over, most commonly using the for-each loop construct.
● The Collection Interface: As the foundation for most of the framework, the Collection
interface declares the core methods that all of its sub-types will have. It defines
fundamental operations such as checking the size (size(), isEmpty()), querying contents
(contains()), modifying the collection (add(), remove()), and providing an iterator
(iterator()). It serves as the direct parent for the List, Set, and Queue interfaces.
● The Map Interface: A Map is an object that maps keys to values, forming key-value pairs.
A crucial architectural point, often raised in technical interviews, is that the Map interface
does not extend the Collection interface. This is a deliberate design choice. The
Collection interface's contract is built around a single element (e.g., add(E element)),
whereas a Map's fundamental operations are based on a key-value pair (e.g., put(K key,
V value)). The contracts are semantically different, so Map forms its own distinct
hierarchy.
The separation of interface from implementation is the most powerful design principle of the
JCF. It allows developers to "program to the interface," a cornerstone of robust object-oriented
design. When code is written to use an interface type, such as List<String> list = new
ArrayList<>();, the specific implementation (ArrayList) can be changed with minimal impact on
the rest of the application. If performance profiling later reveals that frequent insertions at the
beginning of the list are creating a bottleneck, the implementation can be swapped to a
LinkedList simply by changing the constructor call: List<String> list = new LinkedList<>();. The
rest of the code, which interacts with the list variable through the methods defined in the List
interface, remains unchanged. This decoupling makes the code more flexible, adaptable, and
easier to maintain over its lifecycle.
● Iterator: The Iterator interface provides a universal mechanism for traversing the
elements of any collection, one at a time. It abstracts away the underlying structure of the
collection, whether it's an array-backed list, a linked list, or a hash-based set. It defines
three core methods:
○ boolean hasNext(): Returns true if the iteration has more elements.
○ E next(): Returns the next element in the iteration and advances the cursor.
○ void remove(): An optional operation that removes the last element returned by
next() from the underlying collection.
● ListIterator: As a specialized sub-interface of Iterator, ListIterator is designed for use with
List implementations. It enhances the basic iterator with powerful features for list
manipulation:
○ Bidirectional Traversal: It adds hasPrevious() and previous() methods, allowing
traversal in both forward and backward directions.
○ Element Modification: It includes an optional set(E e) method to replace the last
element returned by next() or previous().
○ Element Insertion: It provides an optional add(E e) method to insert an element
into the list at the current iterator position.
A Critical Interview Topic: Fail-Fast vs. Fail-Safe Iterators
When a collection is modified while it is being iterated over, the behavior of the iterator can be
unpredictable. The JCF defines two primary modes of behavior to handle such concurrent
modification scenarios.
● Fail-Fast Iterators: The majority of general-purpose collection implementations in
java.util (such as ArrayList, HashSet, and HashMap) provide fail-fast iterators. These
iterators are designed to throw a ConcurrentModificationException if the collection is
structurally modified (i.e., elements are added or removed) at any time after the iterator is
created, by any means other than the iterator's own remove() method.
○ Internal Mechanism: This behavior is implemented using an internal counter within
the collection, typically named modCount. This counter is incremented every time a
structural modification occurs. When an iterator is created, it records the current
value of modCount. With each call to next(), the iterator checks if the collection's
modCount has changed. If the values do not match, it throws the exception. This is
a "best-effort" mechanism intended to detect bugs, and it is not guaranteed to work
under all concurrent scenarios.
● Fail-Safe Iterators: Collections in the java.util.concurrent package, such as
CopyOnWriteArrayList and ConcurrentHashMap, provide fail-safe iterators. These
iterators do not throw a ConcurrentModificationException because they operate on a
stable snapshot or a clone of the collection, not the original data structure itself.
○ Trade-offs: This robustness comes at a cost. Creating a copy of the collection
consumes additional memory and CPU time. Furthermore, the iterator reflects the
state of the collection at the time of its creation and will not show any modifications
made to the original collection after that point.
A common source of confusion for newcomers is the distinction between the Collection interface
and the Collections class. The Collections class is a final utility class in the java.util package that
consists exclusively of static methods that operate on or return collections. It provides a
powerful set of algorithms and functionalities that are decoupled from the specific collection
implementations.
Key methods include:
● Sorting and Ordering: Methods like sort(), reverse(), shuffle(), and rotate() provide
algorithms for reordering the elements within a List.
● Searching: The binarySearch() method provides an efficient O(log n) algorithm to search
for an element in a sorted List.
● Data Analysis: Methods such as min(), max(), and frequency() allow for quick
calculations on the elements within a collection.
● Synchronization Wrappers: Methods like synchronizedList(), synchronizedSet(), and
synchronizedMap() return a thread-safe "wrapper" collection. Each method call on the
wrapper is delegated to the corresponding method on the backing collection, but within a
synchronized block. While this provides thread safety, it can create a performance
bottleneck in highly concurrent applications because it locks the entire collection for every
operation.
● Unmodifiable Wrappers: Methods like unmodifiableList() return a read-only view of a
collection. Attempting to modify this view will result in an
UnsupportedOperationException. This is useful for providing safe, read-only access to
internal data structures.
● get(int index): O(1). Accessing an element by its index is a direct memory access
operation on the backing array, making it extremely fast.
● add(E element) (at the end): O(1) amortized. In most cases, this is a simple O(1)
operation of placing the element at the next available index. When a resize is triggered, it
becomes an O(n) operation.
● add(int index, E element): O(n). Inserting an element in the middle of the list requires
shifting all subsequent elements one position to the right to make space. This is typically
done using the highly optimized native method System.arraycopy().
● remove(int index): O(n). Similar to insertion, removing an element from the middle
requires shifting all subsequent elements one position to the left to fill the gap.
● Random Access: ArrayList implements the RandomAccess marker interface. This
interface has no methods but serves as a signal to generic algorithms that the collection
supports fast (generally constant time) indexed element access. Algorithms can check for
this interface and switch to index-based loops instead of iterator-based loops for better
performance.
1. How does an ArrayList grow? Explain the resizing process and its performance
implications. An ArrayList grows when an element is added, and its internal array is full.
A new array is created with a capacity that is 50% larger than the old one. All elements
from the old array are copied to the new one. This is an O(n) operation, but since it
happens progressively less often as the list grows, the amortized time complexity for
adding an element is O(1).
2. What is the difference between size() and capacity() in an ArrayList? The size()
refers to the number of elements currently stored in the ArrayList. The capacity refers to
the length of the underlying array, which is the total number of elements it can hold before
it needs to be resized. The capacity is always greater than or equal to the size.
3. When would you use ensureCapacity() or trimToSize()? ensureCapacity(int
minCapacity) is a performance optimization used to pre-allocate memory for the backing
array. If you know you will be adding a large number of elements, calling this method once
can prevent multiple costly resizing operations. trimToSize() is a memory optimization that
shrinks the capacity of the backing array to match the current size of the list, releasing
any unused memory. This is useful when the list has grown large but will not have more
elements added to it.
4. Why is adding or removing from the middle of an ArrayList slow? These operations
are slow because ArrayList is backed by a contiguous array. To insert an element at a
specific index, all elements from that index to the end of the list must be shifted one
position to the right. To remove an element, all subsequent elements must be shifted one
position to the left. Both are O(n) operations involving System.arraycopy().
● get(int index): O(n). To access an element by index, LinkedList must traverse the list from
either the beginning or the end, whichever is closer to the target index. There is no direct
index-based access.
● add(E element) (at end): O(1). This operation simply involves updating the last pointer
and the pointers of the new and former last nodes.
● add(int index, E element): O(n). This is dominated by the time it takes to traverse to the
specified index.
● remove(int index): O(n). This is also dominated by the traversal time to the index.
● Sequential Access: LinkedList does not implement the RandomAccess marker interface,
correctly signaling that its indexed access is slow and should be avoided in
performance-critical loops.
Key Interview Questions for LinkedList
1. Explain the internal data structure of a LinkedList. It is a doubly-linked list composed
of Node objects. Each Node contains the element data, a pointer to the next Node, and a
pointer to the previous Node. The LinkedList class itself maintains pointers to the first and
last nodes in the list.
2. Why is get(index) an O(n) operation in a LinkedList? Because a LinkedList does not
store elements in a contiguous memory block accessible by an index. To find the element
at a given index, it must start from the head (or tail) and follow the next (or prev) pointers
sequentially until it reaches the desired position.
3. When is LinkedList a better choice than ArrayList? LinkedList excels when the
primary operations involve frequent additions or removals from the beginning or end of
the list. Its O(1) performance for these operations makes it a superior choice for
implementing queues and deques.
4. How can a LinkedList be used as a Stack or a Queue? LinkedList implements the
Deque (double-ended queue) interface, which extends the Queue interface and provides
stack-like methods. It can be used as a stack with push() and pop(), and as a queue with
offer() and poll().
● Vector: The Vector class is one of the original collection types in Java, predating the
Collections Framework. Internally, it is nearly identical to ArrayList, using a dynamic,
growable array to store elements.
○ Key Difference - Synchronization: The defining characteristic of Vector is that all
of its public methods are declared as synchronized. This makes it inherently
thread-safe, as only one thread can execute a method on a Vector instance at a
time. However, this method-level locking is very coarse-grained and leads to
significant performance degradation in multi-threaded applications due to high lock
contention.
○ Growth Strategy: The default growth strategy for a Vector is to double its capacity
when it runs out of space, which can be inefficient in terms of memory usage
compared to ArrayList's 50% growth.
● Stack: The Stack class extends Vector, providing classic LIFO (Last-In-First-Out) stack
operations like push(), pop(), and peek(). By extending Vector, it inherits all of its
properties, including the method-level synchronization and its associated performance
overhead.
Both Vector and Stack are considered legacy or obsolete classes and their use is discouraged
in new code.
● The primary reason is their inefficient synchronization model. Locking the entire collection
for every single operation is a major performance bottleneck. Modern concurrent
applications achieve better performance and scalability using the classes in the
java.util.concurrent package, such as CopyOnWriteArrayList, or by using an
unsynchronized ArrayList and managing synchronization externally with more granular
locking mechanisms.
● For a stack implementation, ArrayDeque is the recommended modern alternative. It
provides the same stack functionality (push, pop) but is not synchronized and is therefore
significantly faster for single-threaded use cases.
While the theoretical complexity for middle insertions and deletions appears better for LinkedList
(as it avoids an array copy), in many real-world scenarios, ArrayList can outperform LinkedList
even for these operations. This counter-intuitive result stems from how modern computer
memory and CPUs work.
ArrayList stores its elements in a single, contiguous block of memory. When a program
accesses an element, the CPU doesn't just fetch that single element; it pulls in a larger, adjacent
block of memory (known as a "cache line") into its high-speed cache (L1, L2, L3). Because all
the elements of the ArrayList are next to each other, iterating through it is extremely
cache-friendly. The CPU can often predict which data will be needed next and pre-fetch it,
resulting in very fast access times (cache hits).
Conversely, the nodes of a LinkedList are individual objects that can be scattered randomly
throughout the heap memory. Traversing a LinkedList involves "pointer chasing," where the
CPU must follow the next reference from one node to another. Each of these nodes could be in
a completely different region of memory. This leads to poor cache locality and frequent "cache
misses," where the required data is not in the cache and must be fetched from the much slower
main memory.
Therefore, the O(n) traversal required to find an insertion point in a LinkedList can be
significantly slower in practice than the highly optimized, native System.arraycopy() operation
used by ArrayList for its O(n) shifts. This makes ArrayList the default choice for a List in most
scenarios unless the application's behavior is dominated by additions and removals specifically
at the head of the list.
Operation ArrayList Big-O LinkedList Memory Usage Cache Locality Best Use Case
Big-O
get(index) O(1) O(n) Lower (no Excellent Read-heavy
pointer operations;
overhead) when list size is
stable.
add(end) O(1) amortized O(1) Lower Excellent Read-heavy
operations.
Operation ArrayList Big-O LinkedList Memory Usage Cache Locality Best Use Case
Big-O
add(middle) O(n) O(n) Higher (each Poor Write-heavy
node has 2 operations at
pointers) the ends of the
list
(queue/deque).
remove(end) O(1) O(1) Lower Excellent Read-heavy
operations.
remove(middle) O(n) O(n) Higher Poor Write-heavy
operations at
the ends of the
list.
Iteration O(n) O(n) Lower Excellent General-purpos
(faster) e iteration.
Part III: The Set Interface - Unique Element Collections
The Set interface models the mathematical set abstraction. Its most defining characteristic is
that it cannot contain duplicate elements. It adds a stronger contract on the equals and
hashCode operations to ensure that instances can be meaningfully compared even if their
implementation types differ.
The internal mechanism of a HashSet is a classic interview topic that reveals a deep
understanding of the framework's design. A HashSet is essentially a wrapper around a
HashMap instance.
● Internally, HashSet maintains a private field: private transient HashMap<E,Object> map;.
● When an element is added to the HashSet using hashSet.add(element), the set internally
calls map.put(element, PRESENT). The element being added becomes the key in the
backing HashMap.
● The value associated with this key is a dummy singleton object, private static final Object
PRESENT = new Object();. This same object is used for all entries, minimizing memory
overhead.
● The uniqueness of elements in the HashSet is a direct consequence of the fact that a
HashMap does not allow duplicate keys. The add method of HashSet returns true if the
put method of the internal map returns null (indicating the key was new), and false
otherwise.
1. How does a HashSet ensure that it only contains unique elements? It uses a backing
HashMap for storage. The elements added to the Set are used as keys in the Map. Since
a Map cannot have duplicate keys, the Set cannot contain duplicate elements.
2. What happens when you add an element to a HashSet? The HashSet delegates the
operation to its internal HashMap's put method. The element's hashCode() is calculated
to find a bucket. If the bucket is not empty, the equals() method is used to check for an
exact match against existing elements in that bucket. If no match is found, the new
element is added as a key with a dummy value.
3. What is the time complexity of add(), remove(), and contains() for a HashSet? The
average case time complexity is constant, O(1), because these operations are directly
mapped to the corresponding HashMap operations. The worst-case complexity is O(n) if a
poor hash function causes all elements to collide into a single bucket.
4. Why is it important to override hashCode() and equals() for objects stored in a
HashSet? It is essential for the HashSet to correctly identify logically equivalent objects
as duplicates. Without a proper implementation, the default Object class methods will be
used, which compare memory addresses, and the set will fail to recognize two distinct but
logically identical objects as the same.
● LinkedHashSet extends HashSet and inherits its hashing mechanism and uniqueness
guarantee.
● Its unique feature comes from its internal use of a LinkedHashMap instead of a regular
HashMap. A LinkedHashMap, in addition to being a hash table, maintains a doubly-linked
list running through all of its entries.
● This linked list connects the elements in the order they were first inserted into the set.
Consequently, when an iterator is obtained from a LinkedHashSet, it traverses the
elements in their original insertion order.
● This ordering comes with a minor performance and memory overhead compared to
HashSet because of the additional pointers that must be maintained for the linked list.
1. What is the main difference between HashSet and LinkedHashSet? LinkedHashSet
maintains the insertion order of elements, providing predictable iteration. HashSet does
not guarantee any order.
2. How does LinkedHashSet maintain insertion order? It is backed by a LinkedHashMap.
This data structure combines a hash table for fast lookups with an embedded
doubly-linked list that connects the entries in the order they are added.
3. If you re-insert an existing element into a LinkedHashSet, does its position in the
iteration order change? No. The documentation specifies that insertion order is not
affected if an element is re-inserted into the set. The linked list position is established only
upon the first insertion.
● Like its hashed counterparts, TreeSet is also a wrapper around a Map implementation. It
is backed by a NavigableMap, and the standard JDK provides TreeMap for this purpose.
● A TreeMap does not use hashing. Instead, it stores its key-value pairs in a self-balancing
binary search tree, specifically a Red-Black Tree.
● A Red-Black Tree is a sophisticated data structure that maintains its balance through a
set of rules related to node coloring (red or black). This balancing ensures that the tree's
height remains logarithmic in relation to the number of elements, which guarantees that
add, remove, and contains operations have a time complexity of O(\log n). The elements
in a TreeSet are the keys of the backing TreeMap.
To maintain a sorted order, TreeSet must be able to compare its elements. This can be achieved
in one of two ways:
1. Comparable: The class of the objects being stored in the TreeSet implements the
Comparable interface. This interface has a single method, compareTo(), which defines the
object's "natural ordering".
2. Comparator: If the objects do not have a natural ordering, or if a different ordering is
required, a Comparator object can be passed to the TreeSet's constructor. The
Comparator's compare() method will then be used to order the elements.
A TreeSet does not permit null elements. Attempting to add null will result in a
NullPointerException because null cannot be compared to any other object to determine its
position in the sorted tree.
1. How does a TreeSet maintain a sorted order? It uses a backing TreeMap, which is
implemented using a Red-Black Tree. A Red-Black Tree is a self-balancing binary search
tree that arranges elements based on their comparison, ensuring the collection remains
sorted at all times.
2. What is the difference between Comparable and Comparator? When would you use
one over the other? Comparable is implemented by the class of the objects you want to
sort, defining their single, natural order. Comparator is implemented in a separate class
and defines an external, custom ordering. Use Comparable for the most common, default
ordering of an object. Use Comparator when you need multiple different sorting strategies,
or when you cannot modify the source code of the class you want to sort.
3. Why can't you add a null element to a TreeSet? Because TreeSet needs to compare
elements to place them correctly within its internal tree structure. The null value cannot be
compared to non-null elements, which would break the sorting mechanism and result in a
NullPointerException.
1. What is the difference between a Queue and a PriorityQueue? A standard Queue (like
one implemented with LinkedList) follows a strict FIFO order. A PriorityQueue orders its
elements based on a priority determined by their natural ordering or a custom
Comparator. The element at the head is always the one with the highest priority, not
necessarily the one that was inserted first.
2. How is a PriorityQueue implemented in Java? It is implemented using a binary
min-heap, which is a tree-based data structure that keeps the smallest element at the
root, providing efficient access to the highest-priority item.
3. How can you implement a max-heap using PriorityQueue? By default, PriorityQueue
is a min-heap. To create a max-heap, you must provide a Comparator that reverses the
natural ordering. This is easily done using new
PriorityQueue<>(Collections.reverseOrder());.
4. What is the time complexity for adding and removing elements from a
PriorityQueue? The time complexity for both add (offer) and remove (poll) operations is
O(\log n), where n is the number of elements in the queue. This is due to the heap
re-balancing (sift-up/sift-down) operations.
● vs. Stack: ArrayDeque is strongly preferred over the legacy Stack class. Stack extends
Vector and inherits its synchronized methods, which introduces unnecessary performance
overhead in single-threaded contexts. ArrayDeque is not synchronized and is therefore
much faster.
● vs. LinkedList: When used as a queue or stack, ArrayDeque is generally more
performant than LinkedList. This is due to two main factors:
1. Better Cache Locality: The array backing ArrayDeque stores elements in a
contiguous memory block, which is more cache-friendly than the scattered nodes of
a LinkedList.
2. No Allocation Overhead: LinkedList must allocate a new Node object for every
element added, which incurs a small but cumulative performance cost. ArrayDeque
simply places the element in its existing array.
1. What is a Deque? A Deque, or double-ended queue, is a data structure that allows
efficient insertion and removal of elements from both the front (head) and the back (tail).
2. How does ArrayDeque provide efficient additions and removals at both ends? It
uses a resizable circular array along with head and tail pointers. This structure allows it to
add or remove elements by simply adjusting the pointers and wrapping around the array's
boundaries, avoiding the need for shifting elements.
3. Why is ArrayDeque a better choice for a stack implementation than the Stack
class? ArrayDeque is significantly faster because it is not synchronized. The Stack class
extends the synchronized Vector class, which imposes a performance penalty on every
operation.
4. Why is ArrayDeque often faster than LinkedList when used as a queue? ArrayDeque
benefits from better memory locality due to its contiguous array backing, which is more
efficient for modern CPU caches. It also avoids the object allocation overhead associated
with creating a new Node for each element in a LinkedList.
The contract, defined in the Object class, establishes a strict relationship between these two
methods:
1. equals() must be an equivalence relation:
○ Reflexive: For any non-null reference value x, x.equals(x) must return true.
○ Symmetric: For any non-null reference values x and y, x.equals(y) must return true
if and only if y.equals(x) returns true.
○ Transitive: For any non-null reference values x, y, and z, if x.equals(y) returns true
and y.equals(z) returns true, then x.equals(z) must return true.
2. equals() must be consistent: For any non-null reference values x and y, multiple
invocations of x.equals(y) must consistently return true or consistently return false,
provided no information used in equals comparisons on the objects is modified.
3. Null comparison: For any non-null reference value x, x.equals(null) must return false.
4. hashCode() consistency: The hashCode() method must consistently return the same
integer value for an object, provided no information used in equals comparisons is
modified.
5. The Golden Rule: If two objects are equal according to the equals(Object) method, then
calling the hashCode() method on each of the two objects must produce the same integer
result.
6. The Collision Rule: It is not required that if two objects are unequal according to the
equals(Object) method, then calling the hashCode() method on each of the two objects
must produce distinct integer results. This scenario, where unequal objects have the
same hash code, is known as a hash collision.
Consequences of Violation
● Failing to override hashCode() when equals() is overridden: This is the most common
mistake. If you define logical equality in equals() but use the default hashCode() from
Object (which is typically based on memory address), two logically equal objects will have
different hash codes. When you try to retrieve a value from a HashMap using a key, the
map will calculate the key's hash code and look in the wrong bucket, failing to find the
entry even though an "equal" key exists.
● Using Mutable Objects as Keys: This is a dangerous practice. If an object is used as a
key in a HashMap, and then a field used in its hashCode() calculation is modified, the
hash code will change. When you later try to access the entry with that key, the HashMap
will compute the new hash code and look in a different bucket, making the original entry
effectively lost and irretrievable. This can lead to subtle memory leaks. For this reason,
immutable classes like String and the primitive wrapper classes are excellent candidates
for map keys.
● Core Structure: A HashMap internally uses an array of nodes, often called buckets,
declared as transient Node<K,V> table;. The length of this array is always a power of two.
● The put(K key, V value) Process:
1. Handle Null Key: A null key is a special case. Its hash code is always 0, so it is
always stored in the first bucket, table.
2. Calculate Hash: For a non-null key, its hashCode() is calculated. To defend against
poor hash functions, HashMap applies a secondary hashing function that spreads
the bits of the original hash code more evenly. This is done by XORing the hash
code with its own higher-order bits: (h = key.hashCode()) ^ (h >>> 16).
3. Calculate Index: The index of the bucket where the entry will be stored is
calculated using a bitwise AND operation: index = (n - 1) & hash, where n is the
length of the table. Since n is always a power of two, (n - 1) is a bitmask of all 1s,
making this operation an efficient equivalent of hash % n.
4. Store Entry and Handle Collisions:
■ If table[index] is empty, a new Node object containing the hash, key, value,
and a null next pointer is created and placed at that index.
■ If table[index] is not empty, a hash collision has occurred. The HashMap
must then check if the new key is already present in that bucket. It iterates
through the existing nodes in the bucket, comparing the new key with each
existing key first by hash code and then by the equals() method.
■ If an identical key is found, its value is updated with the new value, and the
old value is returned.
■ If no identical key is found, the new Node is added to the structure within that
bucket.
● Collision Resolution:
○ Chaining: Until Java 8, HashMap exclusively used chaining to resolve collisions. All
entries that mapped to the same bucket index were stored in a singly linked list. If a
bucket contained many entries, retrieval would degrade to O(n) as it required
traversing the list.
○ Treeifying (Java 8+ Optimization): To address the worst-case performance of
chaining, Java 8 introduced a significant optimization. If the number of nodes in a
single bucket grows beyond a certain threshold (TREEIFY_THRESHOLD, which
defaults to 8), the linked list is converted into a balanced Red-Black Tree. This
transformation improves the worst-case lookup performance for that bucket from
O(n) to O(\log n). If the number of nodes in the tree later drops below a certain
threshold (UNTREEIFY_THRESHOLD, default 6), it is converted back to a linked
list.
● Load Factor, Threshold, and Rehashing:
○ Capacity: The length of the internal bucket array. The default initial capacity is 16.
○ Load Factor: A measure of how full the map is allowed to get before its capacity is
automatically increased. The default is 0.75, representing a trade-off between
memory usage and performance.
○ Threshold: The maximum number of entries the HashMap can hold before it must
be resized. It is calculated as Threshold = Capacity * Load Factor. With default
values, the threshold is 16 * 0.75 = 12.
○ Rehashing: When the number of entries in the map exceeds the threshold, a
rehashing operation is triggered. A new array with double the capacity is created.
All existing entries are then re-evaluated and placed into new buckets in this larger
array. This is an expensive O(n) operation, as it involves iterating through all
existing entries and recalculating their bucket indices.
1. Explain the internal working of HashMap's put() method. The method first calculates
the key's hash code, applies a secondary hash function, and then computes a bucket
index using (n-1) & hash. If the bucket is empty, the new entry is placed there. If a
collision occurs, it traverses the existing structure (linked list or tree) in the bucket, using
equals() to check for an existing key. If found, the value is updated; otherwise, the new
entry is added to the structure.
2. What is a hash collision and how does HashMap handle it? A collision occurs when
two different keys produce the same bucket index. HashMap handles this by storing
multiple entries in the same bucket. Prior to Java 8, this was done using a linked list
(chaining). Since Java 8, if the list becomes too long (more than 8 entries), it is converted
into a balanced red-black tree to improve lookup performance from O(n) to O(\log n).
3. What are load factor and rehashing? Why are they important? The load factor
determines the threshold at which the map will resize itself. When the number of entries
exceeds capacity * load factor, a rehashing occurs: a new, larger array is created, and all
entries are re-mapped to new buckets. These concepts manage the trade-off between
memory footprint and performance. A low load factor reduces collisions but wastes
memory, while a high load factor saves memory but increases the likelihood of collisions
and degrades performance.
4. Why should HashMap keys be immutable? If a key object is mutable and its state is
changed after being inserted into the map, its hashCode() might change. This would
cause the map to look in the wrong bucket when trying to retrieve the entry, making the
key-value pair effectively lost. Immutable keys guarantee that the hash code remains
constant.
5. What was the performance improvement for HashMap in Java 8? The key
improvement was the conversion of long linked lists within buckets into red-black trees
when the number of collisions exceeds a threshold. This changed the worst-case time
complexity for get() and put() from O(n) to O(\log n), providing a significant safeguard
against performance degradation due to poor hash functions.
1. How does LinkedHashMap maintain order? It extends HashMap but overlays a
doubly-linked list on top of the entries. This linked list connects all entries and is updated
on each modification to preserve either insertion or access order.
2. What is the difference between insertion-order and access-order? Insertion-order
maintains the sequence in which keys were first added. Access-order maintains the
sequence in which entries were last accessed, moving any accessed entry to the end of
the list.
3. How would you implement an LRU cache using LinkedHashMap? Create a
LinkedHashMap instance with the accessOrder flag set to true. Then, extend the class
and override the removeEldestEntry(Map.Entry eldest) method to return true when the
map's size() exceeds the desired cache capacity. This will cause the map to automatically
remove the least recently accessed entry upon the next insertion.
● Unlike other map implementations, TreeMap does not use hashing. Its internal structure is
a Red-Black Tree, which is a type of self-balancing binary search tree.
● This tree structure ensures that the keys are always in a sorted order. The rules of a
binary search tree dictate that for any given node, all keys in its left subtree are smaller,
and all keys in its right subtree are larger. The "self-balancing" nature of the Red-Black
Tree guarantees that the tree never becomes too lopsided, which ensures that put(), get(),
remove(), and containsKey() operations have a time complexity of O(\log n).
● Because it relies on comparisons rather than hashing, keys must either implement the
Comparable interface or a Comparator must be supplied to the TreeMap's constructor.
1. When would you use a TreeMap over a HashMap? Use a TreeMap when there is a
requirement to maintain the keys in a sorted order. This is particularly useful for scenarios
that require efficiently retrieving ranges of keys, or finding the first/last key in the map.
2. What is the time complexity of operations in a TreeMap? The time complexity for
fundamental operations like put, get, remove, and containsKey is guaranteed to be O(\log
n) due to the balanced nature of the internal Red-Black Tree.
3. What is a Red-Black Tree? It is a self-balancing binary search tree where each node has
an extra bit for storing color (red or black). A set of strict rules governing the node colors
ensures that the longest path from the root to any leaf is no more than twice as long as
the shortest path. This property maintains the tree's balance and guarantees logarithmic
time complexity for operations.
Table 4: Map Implementation Characteristics
Works cited