Custom Data Structures
Sometimes Python’s built-in data structures aren’t enough. I’ve built custom containers for everything from LRU caches to specialized tree structures for parsing algorithms. The key is understanding Python’s data model - the special methods that make objects behave like built-in containers.
Creating custom data structures isn’t just about solving unique problems; it’s about building intuitive, Pythonic interfaces that feel natural to use.
Understanding Python’s Data Model
Python’s data model defines how objects interact with operators and built-in functions through special methods (dunder methods). These methods let your custom classes behave exactly like built-in types.
class SimpleContainer:
def __init__(self, items=None):
self._items = list(items) if items else []
def __len__(self):
return len(self._items)
def __getitem__(self, index):
return self._items[index]
def __contains__(self, item):
return item in self._items
def __repr__(self):
return f"SimpleContainer({self._items})"
# Usage - behaves like a built-in container
container = SimpleContainer([1, 2, 3, 4])
print(len(container)) # 4
print(container[1]) # 2
print(3 in container) # True
The magic happens through these special methods. When you call len(container)
, Python actually calls container.__len__()
. This protocol system makes your custom objects feel native.
Building a Custom Stack
Let’s create a stack with additional features like size limits and peek operations. A stack follows Last-In-First-Out (LIFO) ordering, like a stack of plates.
class Stack:
def __init__(self, max_size=None):
self._items = []
self._max_size = max_size
def push(self, item):
if self._max_size and len(self._items) >= self._max_size:
raise OverflowError(f"Stack overflow: maximum size {self._max_size} reached")
self._items.append(item)
def pop(self):
if not self._items:
raise IndexError("pop from empty stack")
return self._items.pop()
def peek(self):
if not self._items:
raise IndexError("peek from empty stack")
return self._items[-1]
def is_empty(self):
return len(self._items) == 0
def __len__(self):
return len(self._items)
def __bool__(self):
return bool(self._items)
def __repr__(self):
return f"Stack({self._items})"
# Usage
stack = Stack(max_size=3)
stack.push(1)
stack.push(2)
stack.push(3)
print(f"Top item: {stack.peek()}") # 3
print(f"Stack size: {len(stack)}") # 3
This stack can be used for expression evaluation, undo operations, or any algorithm that needs LIFO behavior.
Building an Iterator Protocol
Creating objects that work with Python’s iteration protocol lets them work seamlessly with for loops, list comprehensions, and other iteration contexts.
class NumberRange:
"""Custom range-like iterator with filtering"""
def __init__(self, start, stop, step=1, filter_func=None):
self.start = start
self.stop = stop
self.step = step
self.filter_func = filter_func
def __iter__(self):
current = self.start
while current < self.stop:
if self.filter_func is None or self.filter_func(current):
yield current
current += self.step
# Usage
# Even numbers from 0 to 20
even_range = NumberRange(0, 21, 1, lambda x: x % 2 == 0)
print(list(even_range)) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
# Prime numbers (simple check)
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
prime_range = NumberRange(2, 30, 1, is_prime)
print(list(prime_range)) # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
The __iter__
method makes this object iterable. Using yield
creates a generator, which is memory-efficient for large ranges.
Building a Trie (Prefix Tree)
A trie is perfect for autocomplete and prefix matching. It stores strings in a tree structure where each path from root to leaf represents a word.
class TrieNode:
def __init__(self):
self.children = {}
self.is_end_word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for char in word.lower():
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end_word = True
def search(self, word):
node = self._find_node(word.lower())
return node is not None and node.is_end_word
def starts_with(self, prefix):
return self._find_node(prefix.lower()) is not None
def _find_node(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return None
node = node.children[char]
return node
def autocomplete(self, prefix, max_suggestions=10):
node = self._find_node(prefix.lower())
if node is None:
return []
suggestions = []
self._collect_words(node, prefix.lower(), suggestions)
return suggestions[:max_suggestions]
def _collect_words(self, node, current_word, words):
if node.is_end_word:
words.append(current_word)
for char, child_node in node.children.items():
self._collect_words(child_node, current_word + char, words)
# Usage
trie = Trie()
words = ['apple', 'application', 'apply', 'appreciate', 'approach']
for word in words:
trie.insert(word)
print(f"Contains 'apple': {'apple' in trie}") # Need to implement __contains__
print(f"Starts with 'app': {trie.starts_with('app')}")
print(f"Autocomplete for 'app': {trie.autocomplete('app')}")
Tries are incredibly efficient for prefix operations - much faster than searching through lists of strings.
Memory-Efficient Custom Containers
Using __slots__
and generators can make custom containers more memory-efficient:
class Point:
"""Memory-efficient point class using __slots__"""
__slots__ = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
def distance_to(self, other):
return ((self.x - other.x)**2 + (self.y - other.y)**2)**0.5
def __repr__(self):
return f"Point({self.x}, {self.y})"
class PointCollection:
"""Memory-efficient collection of points"""
def __init__(self):
self._points = []
def add_point(self, x, y):
self._points.append(Point(x, y))
def points_in_radius(self, center_x, center_y, radius):
"""Generator that yields points within radius"""
center = Point(center_x, center_y)
for point in self._points:
if point.distance_to(center) <= radius:
yield point
def __len__(self):
return len(self._points)
def __iter__(self):
return iter(self._points)
# Usage
collection = PointCollection()
collection.add_point(0, 0)
collection.add_point(3, 4)
collection.add_point(10, 10)
# Find points near origin
nearby_points = list(collection.points_in_radius(0, 0, 5))
print(f"Points within radius 5 of origin: {nearby_points}")
The __slots__
declaration prevents Python from creating a __dict__
for each instance, saving memory when you have many objects.
When to Build Custom vs Use Built-in
Build custom data structures when you need specific behavior not available in built-ins, have performance requirements that demand specialized implementation, want to enforce constraints or invariants, or need domain-specific interfaces.
Use built-ins when standard behavior meets your needs, performance is adequate, code simplicity is more important than custom features, or you’re prototyping quickly.
# Custom makes sense: specialized behavior
class BoundedList:
"""List that maintains a maximum size"""
def __init__(self, max_size):
self._items = []
self._max_size = max_size
def append(self, item):
if len(self._items) >= self._max_size:
self._items.pop(0) # Remove oldest
self._items.append(item)
# Built-in is better: standard behavior
def process_data(items):
# Just use a regular list - no need for custom container
return [item * 2 for item in items if item > 0]
What’s Next
In Part 8, we’ll explore algorithms and algorithmic thinking with data structures. You’ll learn how to analyze time and space complexity, implement classic algorithms like sorting and searching, and understand how choosing the right data structure can dramatically improve algorithm performance.
Understanding algorithms alongside data structures will complete your foundation for solving complex computational problems efficiently.