Introduction
While Python's standard data structures (list, tuple, dict, set) are powerful, they are
designed for general-purpose use. The collections module offers more specialized
tools designed to solve specific problems with better performance and more
readable, convenient APIs. Mastering these data structures can significantly improve
the efficiency and clarity of your code.
This guide provides a detailed look at the following key data structures from the
module:
● Counter
● deque
● namedtuple
● OrderedDict
● defaultdict
1. collections.Counter
A Counter is a dictionary subclass specifically designed for counting hashable
objects. It's a collection where elements are stored as dictionary keys and their counts
are stored as dictionary values.
Key Features
● Counting Made Easy: It simplifies the process of tallying frequencies of items in
an iterable.
● Dictionary-like: It behaves much like a regular dictionary, but returns a zero
count for missing items instead of raising a KeyError.
● Arithmetic Operations: It supports arithmetic operations for combining
counters.
Initialization
You can create a Counter in several ways:
from collections import Counter
# From an iterable (like a list or string)
c1 = Counter(['apple', 'orange', 'apple', 'banana', 'orange', 'apple'])
print(c1) # Output: Counter({'apple': 3, 'orange': 2, 'banana': 1})
c2 = Counter('programming')
print(c2) # Output: Counter({'r': 2, 'g': 2, 'm': 2, 'p': 1, 'o': 1, 'a': 1, 'i': 1, 'n': 1})
# From a dictionary
c3 = Counter({'red': 4, 'blue': 2})
print(c3) # Output: Counter({'red': 4, 'blue': 2})
# Using keyword arguments
c4 = Counter(cats=4, dogs=8)
print(c4) # Output: Counter({'dogs': 8, 'cats': 4})
Common Methods
Method Description
elements() Returns an iterator over elements, repeating
each as many times as its count.
most_common([n]) Returns a list of the n most common elements
and their counts.
subtract(iterable/mapping) Subtracts counts of elements from another
iterable or mapping.
update(iterable/mapping) Adds counts of elements from another iterable
or mapping.
Code Example: Word Frequency
from collections import Counter
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_counts = Counter(words)
# Find the 3 most common words
print(f"Most common words: {word_counts.most_common(3)}")
# Using elements()
print(f"Elements iterator: {sorted(list(word_counts.elements()))}")
# Arithmetic with another counter
inventory = Counter(apple=5, orange=3)
sold = Counter(apple=2, orange=3, banana=1)
inventory.subtract(sold)
print(f"Remaining inventory: {inventory}")
# Output:
# Most common words: [('the', 2), ('quick', 1), ('brown', 1)]
# Elements iterator: ['brown', 'dog', 'fox', 'jumps', 'lazy', 'over', 'quick', 'the', 'the']
# Remaining inventory: Counter({'apple': 3, 'orange': 0, 'banana': -1})
2. collections.deque
A deque (pronounced "deck") object is a double-ended queue. It's a list-like container
optimized for fast appends and pops from both its ends.
Key Features
● High Performance: Appending or popping from either end of a deque takes
constant time, O(1). In contrast, inserting or removing from the beginning of a
standard list is slow, O(n).
● Thread-Safe: Deques support thread-safe, memory-efficient appends and pops
from either side.
● Bounded Size: A deque can be created with a maximum length (maxlen). When a
bounded deque is full, it automatically discards items from the opposite end when
new items are added.
Common Methods
Method Description Time Complexity
append(x) Add x to the right side. O(1)
appendleft(x) Add x to the left side. O(1)
pop() Remove and return an O(1)
element from the right side.
popleft() Remove and return an O(1)
element from the left side.
extend(iterable) Extend the right side by O(k)
appending from an iterable.
extendleft(iterable) Extend the left side (note: the O(k)
iterable is reversed).
rotate(n) Rotate the deque n steps to O(k)
the right. n can be negative.
Code Example: Processing a Queue and Keeping History
from collections import deque
# Basic queue operations
tasks = deque()
tasks.append("Task 1")
tasks.append("Task 2")
tasks.append("Task 3")
print(f"Initial tasks: {tasks}")
# Process a task from the left
processed_task = tasks.popleft()
print(f"Processed: {processed_task}, Remaining tasks: {tasks}")
# Keeping a history of the last N items
last_five_searches = deque(maxlen=5)
last_five_searches.append("python collections")
last_five_searches.append("python deque")
last_five_searches.append("python list performance")
last_five_searches.append("python generators")
last_five_searches.append("python decorators")
print(f"\nLast 5 searches: {last_five_searches}")
# Adding a new item pushes the oldest one out
last_five_searches.append("python async")
print(f"After one more search: {last_five_searches}")
# Rotating elements
d = deque([1, 2, 3, 4, 5])
d.rotate(2) # rotate right
print(f"\nRotated deque: {d}")
# Output:
# Initial tasks: deque(['Task 1', 'Task 2', 'Task 3'])
# Processed: Task 1, Remaining tasks: deque(['Task 2', 'Task 3'])
#
# Last 5 searches: deque(['python collections', 'python deque', 'python list
performance', 'python generators', 'python decorators'], maxlen=5)
# After one more search: deque(['python deque', 'python list performance', 'python
generators', 'python decorators', 'python async'], maxlen=5)
#
# Rotated deque: deque([4, 5, 1, 2, 3])
3. collections.namedtuple
A namedtuple is a factory function for creating tuple subclasses with named fields. It
allows you to create lightweight, immutable object types that are more readable than
regular tuples.
Key Features
● Readability: Accessing fields by name (e.g., point.x) is more descriptive than by
index (point[0]).
● Backward Compatibility: It can be used anywhere a regular tuple can, including
being unpacked.
● Lightweight: It's memory-efficient, just like a regular tuple.
Defining and Using a namedtuple
from collections import namedtuple
# Define a namedtuple type 'Point' with fields 'x' and 'y'
Point = namedtuple('Point', ['x', 'y'])
# Create an instance
p1 = Point(10, 20)
# Access fields by name
print(f"Access by name: x={p1.x}, y={p1.y}")
# Access fields by index (like a regular tuple)
print(f"Access by index: x={p1[0]}, y={p1[1]}")
# Unpack it like a regular tuple
x_val, y_val = p1
print(f"Unpacked values: x={x_val}, y={y_val}")
# namedtuples are immutable
try:
p1.x = 30
except AttributeError as e:
print(f"\nError on trying to modify: {e}")
Helper Methods
● _make(iterable): Class method to create a new instance from an existing iterable.
● _asdict(): Returns a new OrderedDict which maps field names to their values.
● _fields: A tuple of strings listing the field names.
# Using helper methods
data = [100, 200]
p2 = Point._make(data)
print(f"\nCreated with _make(): {p2}")
print(f"As dictionary: {p2._asdict()}")
print(f"Field names: {p2._fields}")
4. collections.OrderedDict
An OrderedDict is a dictionary subclass that remembers the order in which items were
inserted.
Important Note: Since Python 3.7, the standard dict type also preserves insertion
order. OrderedDict is now less critical but still has some unique features that make it
useful.
OrderedDict vs. Standard dict (Python 3.7+)
Feature OrderedDict dict
Order Preservation Yes, by design. Yes, as an implementation
detail (guaranteed in 3.7+).
Equality Test (==) Order-sensitive. Order-insensitive. dict(a=1,
OrderedDict(a=1, b=2) != b=2) == dict(b=2, a=1).
OrderedDict(b=2, a=1).
Reordering Provides move_to_end(key, No direct reordering method.
last=True).
Popping Items popitem(last=True) can pop popitem() only pops the last
from either end (LIFO/FIFO). inserted item (LIFO).
Intent Explicitly signals that order is Order is a feature, but may
critical to logic. not be critical.
Code Example
from collections import OrderedDict
# Create an OrderedDict
od = OrderedDict()
od['apple'] = 4
od['banana'] = 2
od['orange'] = 5
print(f"OrderedDict: {od}")
# Move an item
od.move_to_end('apple', last=False) # move 'apple' to the beginning
print(f"After moving 'apple' to front: {od}")
# Pop an item from the beginning
first_item = od.popitem(last=False)
print(f"Popped first item: {first_item}")
print(f"Remaining dict: {od}")
5. collections.defaultdict
A defaultdict is a dictionary subclass that calls a factory function to supply missing
values. It's a convenient way to handle missing keys without writing extra logic.
Key Feature
When a key is accessed for the first time, if it's not already in the dictionary, it is
automatically created with a default value provided by the default_factory. This avoids
KeyError.
Initialization and Usage
The defaultdict constructor takes a default_factory as its first argument. Common
factories are int, list, set, or any other callable that returns a value.
from collections import defaultdict
# Grouping items into lists
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
print(f"Grouped items: {sorted(d.items())}")
# Counting items
# default_factory is 'int', which returns 0
word = "mississippi"
d = defaultdict(int)
for k in word:
d[k] += 1
print(f"Letter counts: {sorted(d.items())}")
Without defaultdict, the same grouping logic would be more verbose:
# Equivalent logic with a standard dict
d_standard = {}
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
for k, v in s:
if k not in d_standard:
d_standard[k] = []
d_standard[k].append(v)