Trees and Hierarchical Data

Trees are everywhere in computing - file systems, organizational charts, decision trees, HTML DOM, and database indexes. I’ve used trees to build everything from autocomplete systems to efficient search engines. Understanding trees isn’t just about data structures; it’s about thinking hierarchically and recursively.

Trees excel where linear data structures fail: representing relationships, enabling fast searches, and organizing data with natural hierarchies.

Tree Fundamentals

A tree is a hierarchical data structure with nodes connected by edges. Think of it like a family tree or company org chart - there’s a root at the top, and each node can have children below it.

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []
        self.parent = None
    
    def add_child(self, child_node):
        child_node.parent = self
        self.children.append(child_node)
    
    def is_leaf(self):
        return len(self.children) == 0
    
    def get_level(self):
        level = 0
        current = self
        while current.parent:
            level += 1
            current = current.parent
        return level

# Build a simple organizational tree
root = TreeNode("CEO")
cto = TreeNode("CTO")
cfo = TreeNode("CFO")

root.add_child(cto)
root.add_child(cfo)

dev_manager = TreeNode("Dev Manager")
cto.add_child(dev_manager)

Trees have specific terminology: the top node is the root, nodes with no children are leaves, and the depth of a node is its distance from the root.

Tree Traversal Algorithms

There are different ways to visit all nodes in a tree. The choice depends on what you’re trying to accomplish:

def traverse_preorder(node):
    """Visit root first, then children"""
    if node:
        print(node.value)
        for child in node.children:
            traverse_preorder(child)

def traverse_postorder(node):
    """Visit children first, then root"""
    if node:
        for child in node.children:
            traverse_postorder(child)
        print(node.value)

def traverse_level_order(root):
    """Visit nodes level by level"""
    from collections import deque
    if not root:
        return
    
    queue = deque([root])
    while queue:
        node = queue.popleft()
        print(node.value)
        queue.extend(node.children)

Preorder traversal is useful for copying trees, postorder for deleting trees (children before parents), and level-order for printing trees by level.

Binary Trees

Binary trees are trees where each node has at most two children (left and right). They’re the foundation for many efficient algorithms and data structures.

class BinaryTreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

class BinaryTree:
    def __init__(self, root_value=None):
        self.root = BinaryTreeNode(root_value) if root_value is not None else None
    
    def inorder_traversal(self, node=None):
        """Left -> Root -> Right"""
        if node is None:
            node = self.root
        
        result = []
        if node:
            result.extend(self.inorder_traversal(node.left))
            result.append(node.value)
            result.extend(self.inorder_traversal(node.right))
        
        return result
    
    def height(self, node=None):
        if node is None:
            node = self.root
        
        if not node:
            return -1
        
        left_height = self.height(node.left)
        right_height = self.height(node.right)
        
        return 1 + max(left_height, right_height)

# Build a binary tree
tree = BinaryTree(1)
tree.root.left = BinaryTreeNode(2)
tree.root.right = BinaryTreeNode(3)
tree.root.left.left = BinaryTreeNode(4)
tree.root.left.right = BinaryTreeNode(5)

print(f"Inorder: {tree.inorder_traversal()}")  # [4, 2, 5, 1, 3]
print(f"Height: {tree.height()}")              # 2

Binary trees enable efficient algorithms because you can eliminate half the remaining nodes with each decision.

Binary Search Trees (BST)

BSTs maintain a special property: left children are smaller than their parent, and right children are larger. This enables fast searching, insertion, and deletion.

class BSTNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

class BinarySearchTree:
    def __init__(self):
        self.root = None
    
    def insert(self, value):
        if not self.root:
            self.root = BSTNode(value)
        else:
            self._insert_recursive(self.root, value)
    
    def _insert_recursive(self, node, value):
        if value < node.value:
            if node.left is None:
                node.left = BSTNode(value)
            else:
                self._insert_recursive(node.left, value)
        elif value > node.value:
            if node.right is None:
                node.right = BSTNode(value)
            else:
                self._insert_recursive(node.right, value)
    
    def search(self, value):
        return self._search_recursive(self.root, value)
    
    def _search_recursive(self, node, value):
        if not node or node.value == value:
            return node
        
        if value < node.value:
            return self._search_recursive(node.left, value)
        else:
            return self._search_recursive(node.right, value)
    
    def inorder(self):
        """Returns sorted list of values"""
        result = []
        self._inorder_recursive(self.root, result)
        return result
    
    def _inorder_recursive(self, node, result):
        if node:
            self._inorder_recursive(node.left, result)
            result.append(node.value)
            self._inorder_recursive(node.right, result)

# Test BST
bst = BinarySearchTree()
values = [50, 30, 70, 20, 40, 60, 80]

for value in values:
    bst.insert(value)

print(f"Inorder traversal: {bst.inorder()}")  # [20, 30, 40, 50, 60, 70, 80] - sorted!
print(f"Search 40: {bst.search(40) is not None}")  # True

The beauty of BSTs is that an inorder traversal gives you the elements in sorted order, and searching takes O(log n) time in balanced trees.

Real-World Tree Applications

File System Tree

Trees naturally represent hierarchical file systems:

class FileSystemNode:
    def __init__(self, name, is_directory=False, size=0):
        self.name = name
        self.is_directory = is_directory
        self.children = [] if is_directory else None
        self.size = size
    
    def add_child(self, child):
        if not self.is_directory:
            raise ValueError("Cannot add child to file")
        self.children.append(child)
    
    def calculate_size(self):
        if not self.is_directory:
            return self.size
        
        return sum(child.calculate_size() for child in self.children)
    
    def find_files(self, pattern):
        import fnmatch
        results = []
        
        if not self.is_directory and fnmatch.fnmatch(self.name, pattern):
            results.append(self)
        
        if self.is_directory:
            for child in self.children:
                results.extend(child.find_files(pattern))
        
        return results

# Build file system
root = FileSystemNode("root", is_directory=True)
documents = FileSystemNode("documents", is_directory=True)
root.add_child(documents)

file1 = FileSystemNode("resume.pdf", size=1024)
file2 = FileSystemNode("notes.txt", size=512)
documents.add_child(file1)
documents.add_child(file2)

print(f"Total size: {root.calculate_size()} bytes")
pdf_files = root.find_files('*.pdf')
print(f"PDF files: {[f.name for f in pdf_files]}")

Decision Tree

Trees are perfect for representing decision-making processes:

class DecisionNode:
    def __init__(self, question=None, true_branch=None, false_branch=None, result=None):
        self.question = question
        self.true_branch = true_branch
        self.false_branch = false_branch
        self.result = result  # For leaf nodes
    
    def is_leaf(self):
        return self.result is not None

def make_decision(node, data):
    if node.is_leaf():
        return node.result
    
    # Simple evaluation: check if data value meets condition
    if evaluate_condition(node.question, data):
        return make_decision(node.true_branch, data)
    else:
        return make_decision(node.false_branch, data)

def evaluate_condition(question, data):
    # Simplified: "income > 50000"
    parts = question.split()
    feature, operator, value = parts[0], parts[1], parts[2]
    
    if operator == '>':
        return data.get(feature, 0) > int(value)
    return False

# Build decision tree for loan approval
loan_tree = DecisionNode(
    question="income > 50000",
    true_branch=DecisionNode(result="Approved"),
    false_branch=DecisionNode(result="Rejected")
)

# Test decision
applicant = {"income": 60000, "credit_score": 750}
decision = make_decision(loan_tree, applicant)
print(f"Loan decision: {decision}")

Trees make complex decision logic clear and easy to modify.

When to Use Trees

Use trees when your data has natural hierarchical relationships, you need fast searching with the ability to maintain sorted order, you’re implementing parsers or expression evaluators, or you need to represent decision logic.

Consider alternatives when data is primarily linear (use lists), you need constant-time operations (use hash tables), memory usage is critical (trees have pointer overhead), or you rarely search and mostly iterate (use lists).

What’s Next

In Part 10, we’ll explore heaps and priority queues. You’ll learn about binary heaps, heap operations, and how to implement efficient priority queues for task scheduling and algorithm optimization.

Heaps provide powerful tools for implementing priority systems, efficient sorting algorithms, and solving problems that require maintaining ordered collections with dynamic updates.