Heaps and Priority Queues
Heaps solve a specific but crucial problem: maintaining the minimum (or maximum) element in a collection while allowing efficient insertions and deletions. I’ve used heaps to build task schedulers, implement Dijkstra’s algorithm, and create efficient sorting algorithms. They’re the secret behind many “find the top K” problems.
Understanding heaps isn’t just about the data structure - it’s about recognizing when you need partial ordering rather than complete sorting.
Understanding Heap Properties
A heap is a complete binary tree that satisfies the heap property. In a min-heap, parent nodes are smaller than their children; in a max-heap, parents are larger. This simple rule enables powerful algorithms.
The key insight is that heaps maintain partial order - you always know the minimum (or maximum) element, but the rest can be in any valid heap order.
class MinHeap:
def __init__(self):
self.heap = []
def parent(self, i):
return (i - 1) // 2
def left_child(self, i):
return 2 * i + 1
def right_child(self, i):
return 2 * i + 2
def swap(self, i, j):
self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
def peek(self):
if not self.heap:
raise IndexError("Heap is empty")
return self.heap[0]
def size(self):
return len(self.heap)
Heaps are typically implemented as arrays because the complete binary tree property means we can calculate parent and child indices with simple arithmetic.
Heap Operations
Insert (Bubble Up)
When inserting into a heap, we add the element at the end and “bubble up” until the heap property is restored:
def insert(self, value):
# Add to end of heap
self.heap.append(value)
# Bubble up to maintain heap property
self._bubble_up(len(self.heap) - 1)
def _bubble_up(self, index):
while (index > 0 and
self.heap[index] < self.heap[self.parent(index)]):
self.swap(index, self.parent(index))
index = self.parent(index)
# Add to MinHeap class
MinHeap.insert = insert
MinHeap._bubble_up = _bubble_up
Extract Min (Bubble Down)
Extracting the minimum requires more work - we replace the root with the last element, then “bubble down”:
def extract_min(self):
if not self.heap:
raise IndexError("Heap is empty")
min_value = self.heap[0]
# Move last element to root
self.heap[0] = self.heap[-1]
self.heap.pop()
# Bubble down to maintain heap property
if self.heap:
self._bubble_down(0)
return min_value
def _bubble_down(self, index):
while self.left_child(index) < len(self.heap):
smaller_child_index = self.left_child(index)
# Find smaller child
if (self.right_child(index) < len(self.heap) and
self.heap[self.right_child(index)] < self.heap[smaller_child_index]):
smaller_child_index = self.right_child(index)
# If heap property is satisfied, stop
if self.heap[index] <= self.heap[smaller_child_index]:
break
self.swap(index, smaller_child_index)
index = smaller_child_index
# Add to MinHeap class
MinHeap.extract_min = extract_min
MinHeap._bubble_down = _bubble_down
Both operations maintain the heap property while keeping the tree complete.
Priority Queue Implementation
Priority queues are the most common application of heaps. Python’s heapq
module provides efficient heap operations:
import heapq
from dataclasses import dataclass
from typing import Any
@dataclass
class PriorityItem:
priority: int
item: Any
def __lt__(self, other):
return self.priority < other.priority
class PriorityQueue:
def __init__(self):
self._heap = []
self._index = 0 # Tie-breaker for equal priorities
def put(self, item, priority):
heapq.heappush(self._heap, (priority, self._index, item))
self._index += 1
def get(self):
if not self._heap:
raise IndexError("Priority queue is empty")
priority, index, item = heapq.heappop(self._heap)
return item
def empty(self):
return len(self._heap) == 0
# Usage
pq = PriorityQueue()
pq.put("Low priority task", 10)
pq.put("High priority task", 1)
pq.put("Medium priority task", 5)
while not pq.empty():
task = pq.get()
print(f"Processing: {task}")
The index ensures that items with equal priority are processed in insertion order.
Real-World Applications
Task Scheduler
Priority queues are perfect for task scheduling systems:
import time
from datetime import datetime, timedelta
@dataclass
class Task:
name: str
priority: int
scheduled_time: datetime
function: callable
def __lt__(self, other):
# First by scheduled time, then by priority
if self.scheduled_time != other.scheduled_time:
return self.scheduled_time < other.scheduled_time
return self.priority < other.priority
class TaskScheduler:
def __init__(self):
self.task_heap = []
def schedule_task(self, name, function, priority=5, delay_seconds=0):
scheduled_time = datetime.now() + timedelta(seconds=delay_seconds)
task = Task(name, priority, scheduled_time, function)
heapq.heappush(self.task_heap, task)
def run_pending_tasks(self):
current_time = datetime.now()
executed_tasks = []
while (self.task_heap and
self.task_heap[0].scheduled_time <= current_time):
task = heapq.heappop(self.task_heap)
try:
result = task.function()
executed_tasks.append((task.name, result))
except Exception as e:
print(f"Error executing {task.name}: {e}")
return executed_tasks
# Example usage
def send_email():
return "Email sent"
def backup_database():
return "Database backed up"
scheduler = TaskScheduler()
scheduler.schedule_task("Send email", send_email, priority=2, delay_seconds=1)
scheduler.schedule_task("Backup database", backup_database, priority=1, delay_seconds=2)
# Simulate running the scheduler
for i in range(3):
executed = scheduler.run_pending_tasks()
for task_name, result in executed:
print(f"Completed: {task_name} -> {result}")
time.sleep(1)
Top K Elements Problem
Heaps excel at “find the top K” problems:
def find_k_largest(nums, k):
"""Find k largest elements using min-heap"""
if k <= 0:
return []
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, num)
elif num > heap[0]: # num is larger than smallest in heap
heapq.heapreplace(heap, num)
return sorted(heap, reverse=True)
def find_k_smallest(nums, k):
"""Find k smallest elements using max-heap (negate values)"""
if k <= 0:
return []
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, -num) # Negate for max-heap behavior
elif num < -heap[0]: # num is smaller than largest in heap
heapq.heapreplace(heap, -num)
return sorted([-x for x in heap])
# Test top K functions
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
print(f"3 largest: {find_k_largest(numbers, 3)}")
print(f"3 smallest: {find_k_smallest(numbers, 3)}")
This approach is much more efficient than sorting the entire array when you only need the top K elements.
Heap Sort Algorithm
Heaps enable an efficient sorting algorithm:
def heap_sort(arr):
"""Sort array using heap sort algorithm"""
def heapify(arr, n, i):
largest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and arr[left] > arr[largest]:
largest = left
if right < n and arr[right] > arr[largest]:
largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)
n = len(arr)
# Build max heap
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# Extract elements from heap one by one
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0] # Move max to end
heapify(arr, i, 0) # Heapify reduced heap
return arr
# Test heap sort
test_array = [64, 34, 25, 12, 22, 11, 90]
print(f"Original: {test_array}")
sorted_array = heap_sort(test_array.copy())
print(f"Heap sorted: {sorted_array}")
Heap sort guarantees O(n log n) time complexity and uses only O(1) extra space.
When to Use Heaps
Use heaps when you need to repeatedly find min/max elements, implement priority queues or task schedulers, solve “top K” problems, build efficient sorting algorithms, or manage streaming data where you need extremes.
Consider alternatives when you need to access arbitrary elements (use arrays), need fast membership testing (use sets), need to maintain complete ordering (use sorted lists), or when memory usage is critical (heaps have some overhead).
What’s Next
In Part 11, we’ll explore graphs and network algorithms. You’ll learn about graph representations, traversal algorithms, shortest path algorithms, and how to model real-world networks and relationships.
Graphs will complete your toolkit for modeling complex relationships and solving network-based problems in everything from social networks to routing algorithms.