Heaps and Priority Queues

Heaps solve a specific but crucial problem: maintaining the minimum (or maximum) element in a collection while allowing efficient insertions and deletions. I’ve used heaps to build task schedulers, implement Dijkstra’s algorithm, and create efficient sorting algorithms. They’re the secret behind many “find the top K” problems.

Understanding heaps isn’t just about the data structure - it’s about recognizing when you need partial ordering rather than complete sorting.

Understanding Heap Properties

A heap is a complete binary tree that satisfies the heap property. In a min-heap, parent nodes are smaller than their children; in a max-heap, parents are larger. This simple rule enables powerful algorithms.

The key insight is that heaps maintain partial order - you always know the minimum (or maximum) element, but the rest can be in any valid heap order.

class MinHeap:
    def __init__(self):
        self.heap = []
    
    def parent(self, i):
        return (i - 1) // 2
    
    def left_child(self, i):
        return 2 * i + 1
    
    def right_child(self, i):
        return 2 * i + 2
    
    def swap(self, i, j):
        self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
    
    def peek(self):
        if not self.heap:
            raise IndexError("Heap is empty")
        return self.heap[0]
    
    def size(self):
        return len(self.heap)

Heaps are typically implemented as arrays because the complete binary tree property means we can calculate parent and child indices with simple arithmetic.

Heap Operations

Insert (Bubble Up)

When inserting into a heap, we add the element at the end and “bubble up” until the heap property is restored:

def insert(self, value):
    # Add to end of heap
    self.heap.append(value)
    # Bubble up to maintain heap property
    self._bubble_up(len(self.heap) - 1)

def _bubble_up(self, index):
    while (index > 0 and 
           self.heap[index] < self.heap[self.parent(index)]):
        self.swap(index, self.parent(index))
        index = self.parent(index)

# Add to MinHeap class
MinHeap.insert = insert
MinHeap._bubble_up = _bubble_up

Extract Min (Bubble Down)

Extracting the minimum requires more work - we replace the root with the last element, then “bubble down”:

def extract_min(self):
    if not self.heap:
        raise IndexError("Heap is empty")
    
    min_value = self.heap[0]
    # Move last element to root
    self.heap[0] = self.heap[-1]
    self.heap.pop()
    
    # Bubble down to maintain heap property
    if self.heap:
        self._bubble_down(0)
    
    return min_value

def _bubble_down(self, index):
    while self.left_child(index) < len(self.heap):
        smaller_child_index = self.left_child(index)
        
        # Find smaller child
        if (self.right_child(index) < len(self.heap) and 
            self.heap[self.right_child(index)] < self.heap[smaller_child_index]):
            smaller_child_index = self.right_child(index)
        
        # If heap property is satisfied, stop
        if self.heap[index] <= self.heap[smaller_child_index]:
            break
        
        self.swap(index, smaller_child_index)
        index = smaller_child_index

# Add to MinHeap class
MinHeap.extract_min = extract_min
MinHeap._bubble_down = _bubble_down

Both operations maintain the heap property while keeping the tree complete.

Priority Queue Implementation

Priority queues are the most common application of heaps. Python’s heapq module provides efficient heap operations:

import heapq
from dataclasses import dataclass
from typing import Any

@dataclass
class PriorityItem:
    priority: int
    item: Any
    
    def __lt__(self, other):
        return self.priority < other.priority

class PriorityQueue:
    def __init__(self):
        self._heap = []
        self._index = 0  # Tie-breaker for equal priorities
    
    def put(self, item, priority):
        heapq.heappush(self._heap, (priority, self._index, item))
        self._index += 1
    
    def get(self):
        if not self._heap:
            raise IndexError("Priority queue is empty")
        priority, index, item = heapq.heappop(self._heap)
        return item
    
    def empty(self):
        return len(self._heap) == 0

# Usage
pq = PriorityQueue()
pq.put("Low priority task", 10)
pq.put("High priority task", 1)
pq.put("Medium priority task", 5)

while not pq.empty():
    task = pq.get()
    print(f"Processing: {task}")

The index ensures that items with equal priority are processed in insertion order.

Real-World Applications

Task Scheduler

Priority queues are perfect for task scheduling systems:

import time
from datetime import datetime, timedelta

@dataclass
class Task:
    name: str
    priority: int
    scheduled_time: datetime
    function: callable
    
    def __lt__(self, other):
        # First by scheduled time, then by priority
        if self.scheduled_time != other.scheduled_time:
            return self.scheduled_time < other.scheduled_time
        return self.priority < other.priority

class TaskScheduler:
    def __init__(self):
        self.task_heap = []
    
    def schedule_task(self, name, function, priority=5, delay_seconds=0):
        scheduled_time = datetime.now() + timedelta(seconds=delay_seconds)
        task = Task(name, priority, scheduled_time, function)
        heapq.heappush(self.task_heap, task)
    
    def run_pending_tasks(self):
        current_time = datetime.now()
        executed_tasks = []
        
        while (self.task_heap and 
               self.task_heap[0].scheduled_time <= current_time):
            task = heapq.heappop(self.task_heap)
            try:
                result = task.function()
                executed_tasks.append((task.name, result))
            except Exception as e:
                print(f"Error executing {task.name}: {e}")
        
        return executed_tasks

# Example usage
def send_email():
    return "Email sent"

def backup_database():
    return "Database backed up"

scheduler = TaskScheduler()
scheduler.schedule_task("Send email", send_email, priority=2, delay_seconds=1)
scheduler.schedule_task("Backup database", backup_database, priority=1, delay_seconds=2)

# Simulate running the scheduler
for i in range(3):
    executed = scheduler.run_pending_tasks()
    for task_name, result in executed:
        print(f"Completed: {task_name} -> {result}")
    time.sleep(1)

Top K Elements Problem

Heaps excel at “find the top K” problems:

def find_k_largest(nums, k):
    """Find k largest elements using min-heap"""
    if k <= 0:
        return []
    
    heap = []
    for num in nums:
        if len(heap) < k:
            heapq.heappush(heap, num)
        elif num > heap[0]:  # num is larger than smallest in heap
            heapq.heapreplace(heap, num)
    
    return sorted(heap, reverse=True)

def find_k_smallest(nums, k):
    """Find k smallest elements using max-heap (negate values)"""
    if k <= 0:
        return []
    
    heap = []
    for num in nums:
        if len(heap) < k:
            heapq.heappush(heap, -num)  # Negate for max-heap behavior
        elif num < -heap[0]:  # num is smaller than largest in heap
            heapq.heapreplace(heap, -num)
    
    return sorted([-x for x in heap])

# Test top K functions
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
print(f"3 largest: {find_k_largest(numbers, 3)}")
print(f"3 smallest: {find_k_smallest(numbers, 3)}")

This approach is much more efficient than sorting the entire array when you only need the top K elements.

Heap Sort Algorithm

Heaps enable an efficient sorting algorithm:

def heap_sort(arr):
    """Sort array using heap sort algorithm"""
    def heapify(arr, n, i):
        largest = i
        left = 2 * i + 1
        right = 2 * i + 2
        
        if left < n and arr[left] > arr[largest]:
            largest = left
        
        if right < n and arr[right] > arr[largest]:
            largest = right
        
        if largest != i:
            arr[i], arr[largest] = arr[largest], arr[i]
            heapify(arr, n, largest)
    
    n = len(arr)
    
    # Build max heap
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)
    
    # Extract elements from heap one by one
    for i in range(n - 1, 0, -1):
        arr[0], arr[i] = arr[i], arr[0]  # Move max to end
        heapify(arr, i, 0)  # Heapify reduced heap
    
    return arr

# Test heap sort
test_array = [64, 34, 25, 12, 22, 11, 90]
print(f"Original: {test_array}")
sorted_array = heap_sort(test_array.copy())
print(f"Heap sorted: {sorted_array}")

Heap sort guarantees O(n log n) time complexity and uses only O(1) extra space.

When to Use Heaps

Use heaps when you need to repeatedly find min/max elements, implement priority queues or task schedulers, solve “top K” problems, build efficient sorting algorithms, or manage streaming data where you need extremes.

Consider alternatives when you need to access arbitrary elements (use arrays), need fast membership testing (use sets), need to maintain complete ordering (use sorted lists), or when memory usage is critical (heaps have some overhead).

What’s Next

In Part 11, we’ll explore graphs and network algorithms. You’ll learn about graph representations, traversal algorithms, shortest path algorithms, and how to model real-world networks and relationships.

Graphs will complete your toolkit for modeling complex relationships and solving network-based problems in everything from social networks to routing algorithms.