Custom Data Structures

Sometimes Python’s built-in data structures aren’t enough. I’ve built custom containers for everything from LRU caches to specialized tree structures for parsing algorithms. The key is understanding Python’s data model - the special methods that make objects behave like built-in containers.

Creating custom data structures isn’t just about solving unique problems; it’s about building intuitive, Pythonic interfaces that feel natural to use.

Understanding Python’s Data Model

Python’s data model defines how objects interact with operators and built-in functions through special methods (dunder methods). These methods let your custom classes behave exactly like built-in types.

class SimpleContainer:
    def __init__(self, items=None):
        self._items = list(items) if items else []
    
    def __len__(self):
        return len(self._items)
    
    def __getitem__(self, index):
        return self._items[index]
    
    def __contains__(self, item):
        return item in self._items
    
    def __repr__(self):
        return f"SimpleContainer({self._items})"

# Usage - behaves like a built-in container
container = SimpleContainer([1, 2, 3, 4])
print(len(container))        # 4
print(container[1])          # 2
print(3 in container)        # True

The magic happens through these special methods. When you call len(container), Python actually calls container.__len__(). This protocol system makes your custom objects feel native.

Building a Custom Stack

Let’s create a stack with additional features like size limits and peek operations. A stack follows Last-In-First-Out (LIFO) ordering, like a stack of plates.

class Stack:
    def __init__(self, max_size=None):
        self._items = []
        self._max_size = max_size
    
    def push(self, item):
        if self._max_size and len(self._items) >= self._max_size:
            raise OverflowError(f"Stack overflow: maximum size {self._max_size} reached")
        self._items.append(item)
    
    def pop(self):
        if not self._items:
            raise IndexError("pop from empty stack")
        return self._items.pop()
    
    def peek(self):
        if not self._items:
            raise IndexError("peek from empty stack")
        return self._items[-1]
    
    def is_empty(self):
        return len(self._items) == 0
    
    def __len__(self):
        return len(self._items)
    
    def __bool__(self):
        return bool(self._items)
    
    def __repr__(self):
        return f"Stack({self._items})"

# Usage
stack = Stack(max_size=3)
stack.push(1)
stack.push(2)
stack.push(3)
print(f"Top item: {stack.peek()}")  # 3
print(f"Stack size: {len(stack)}")  # 3

This stack can be used for expression evaluation, undo operations, or any algorithm that needs LIFO behavior.

Building an Iterator Protocol

Creating objects that work with Python’s iteration protocol lets them work seamlessly with for loops, list comprehensions, and other iteration contexts.

class NumberRange:
    """Custom range-like iterator with filtering"""
    
    def __init__(self, start, stop, step=1, filter_func=None):
        self.start = start
        self.stop = stop
        self.step = step
        self.filter_func = filter_func
    
    def __iter__(self):
        current = self.start
        while current < self.stop:
            if self.filter_func is None or self.filter_func(current):
                yield current
            current += self.step

# Usage
# Even numbers from 0 to 20
even_range = NumberRange(0, 21, 1, lambda x: x % 2 == 0)
print(list(even_range))  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

# Prime numbers (simple check)
def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

prime_range = NumberRange(2, 30, 1, is_prime)
print(list(prime_range))  # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

The __iter__ method makes this object iterable. Using yield creates a generator, which is memory-efficient for large ranges.

Building a Trie (Prefix Tree)

A trie is perfect for autocomplete and prefix matching. It stores strings in a tree structure where each path from root to leaf represents a word.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        node = self.root
        for char in word.lower():
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_word = True
    
    def search(self, word):
        node = self._find_node(word.lower())
        return node is not None and node.is_end_word
    
    def starts_with(self, prefix):
        return self._find_node(prefix.lower()) is not None
    
    def _find_node(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return None
            node = node.children[char]
        return node
    
    def autocomplete(self, prefix, max_suggestions=10):
        node = self._find_node(prefix.lower())
        if node is None:
            return []
        
        suggestions = []
        self._collect_words(node, prefix.lower(), suggestions)
        return suggestions[:max_suggestions]
    
    def _collect_words(self, node, current_word, words):
        if node.is_end_word:
            words.append(current_word)
        
        for char, child_node in node.children.items():
            self._collect_words(child_node, current_word + char, words)

# Usage
trie = Trie()
words = ['apple', 'application', 'apply', 'appreciate', 'approach']
for word in words:
    trie.insert(word)

print(f"Contains 'apple': {'apple' in trie}")  # Need to implement __contains__
print(f"Starts with 'app': {trie.starts_with('app')}")
print(f"Autocomplete for 'app': {trie.autocomplete('app')}")

Tries are incredibly efficient for prefix operations - much faster than searching through lists of strings.

Memory-Efficient Custom Containers

Using __slots__ and generators can make custom containers more memory-efficient:

class Point:
    """Memory-efficient point class using __slots__"""
    __slots__ = ['x', 'y']
    
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def distance_to(self, other):
        return ((self.x - other.x)**2 + (self.y - other.y)**2)**0.5
    
    def __repr__(self):
        return f"Point({self.x}, {self.y})"

class PointCollection:
    """Memory-efficient collection of points"""
    
    def __init__(self):
        self._points = []
    
    def add_point(self, x, y):
        self._points.append(Point(x, y))
    
    def points_in_radius(self, center_x, center_y, radius):
        """Generator that yields points within radius"""
        center = Point(center_x, center_y)
        for point in self._points:
            if point.distance_to(center) <= radius:
                yield point
    
    def __len__(self):
        return len(self._points)
    
    def __iter__(self):
        return iter(self._points)

# Usage
collection = PointCollection()
collection.add_point(0, 0)
collection.add_point(3, 4)
collection.add_point(10, 10)

# Find points near origin
nearby_points = list(collection.points_in_radius(0, 0, 5))
print(f"Points within radius 5 of origin: {nearby_points}")

The __slots__ declaration prevents Python from creating a __dict__ for each instance, saving memory when you have many objects.

When to Build Custom vs Use Built-in

Build custom data structures when you need specific behavior not available in built-ins, have performance requirements that demand specialized implementation, want to enforce constraints or invariants, or need domain-specific interfaces.

Use built-ins when standard behavior meets your needs, performance is adequate, code simplicity is more important than custom features, or you’re prototyping quickly.

# Custom makes sense: specialized behavior
class BoundedList:
    """List that maintains a maximum size"""
    def __init__(self, max_size):
        self._items = []
        self._max_size = max_size
    
    def append(self, item):
        if len(self._items) >= self._max_size:
            self._items.pop(0)  # Remove oldest
        self._items.append(item)

# Built-in is better: standard behavior
def process_data(items):
    # Just use a regular list - no need for custom container
    return [item * 2 for item in items if item > 0]

What’s Next

In Part 8, we’ll explore algorithms and algorithmic thinking with data structures. You’ll learn how to analyze time and space complexity, implement classic algorithms like sorting and searching, and understand how choosing the right data structure can dramatically improve algorithm performance.

Understanding algorithms alongside data structures will complete your foundation for solving complex computational problems efficiently.