Performance Optimization

Performance optimization in WebAssembly is different from both traditional Go and JavaScript optimization. I learned this the hard way when my first “optimized” WebAssembly app was slower than the JavaScript version it replaced. The problem wasn’t WebAssembly - it was my assumptions about what makes WebAssembly fast.

WebAssembly excels at CPU-intensive tasks with predictable memory patterns. It’s not automatically faster at everything, and the boundary between JavaScript and WebAssembly has real costs.

Understanding WebAssembly Performance

The biggest performance insight: WebAssembly is fast at sustained computational work, but crossing the boundary between JavaScript and WebAssembly has overhead. If you’re making thousands of tiny function calls, you might be slower than pure JavaScript.

Here’s a simple benchmark that demonstrates this:

// Efficient: batch processing
func ProcessBatch(this js.Value, args []js.Value) interface{} {
    data := args[0]
    length := data.Get("length").Int()
    
    result := 0.0
    for i := 0; i < length; i++ {
        value := data.Index(i).Float()
        result += math.Sqrt(value*value + 1) // Heavy computation
    }
    
    return result
}

// Inefficient: many small calls
func ProcessSingle(this js.Value, args []js.Value) interface{} {
    value := args[0].Float()
    return math.Sqrt(value*value + 1)
}

The batch version processes 10,000 items in ~5ms. Calling the single version 10,000 times takes ~200ms due to boundary crossing overhead.

Memory Management Optimization

Go’s garbage collector runs in WebAssembly, but memory allocation patterns affect performance differently than in native Go. I’ve learned to minimize allocations in hot paths:

type OptimizedProcessor struct {
    buffer []float64 // Reuse this buffer
}

func (op *OptimizedProcessor) Process(data []float64) []float64 {
    // Reuse buffer if it's large enough
    if cap(op.buffer) < len(data) {
        op.buffer = make([]float64, len(data))
    }
    op.buffer = op.buffer[:len(data)]
    
    for i, value := range data {
        op.buffer[i] = value * 2.0 // Some computation
    }
    
    return op.buffer
}

Reusing buffers reduces garbage collection pressure and improves performance in WebAssembly environments.

Algorithmic Optimization

Choose algorithms that work well with WebAssembly’s characteristics. Cache-friendly algorithms with predictable memory access patterns perform best:

// Cache-friendly matrix multiplication
func MultiplyMatrices(a, b [][]float64) [][]float64 {
    n := len(a)
    result := make([][]float64, n)
    for i := range result {
        result[i] = make([]float64, n)
    }
    
    // Block-wise multiplication for better cache performance
    blockSize := 64
    for i := 0; i < n; i += blockSize {
        for j := 0; j < n; j += blockSize {
            for k := 0; k < n; k += blockSize {
                // Process block
                for ii := i; ii < min(i+blockSize, n); ii++ {
                    for jj := j; jj < min(j+blockSize, n); jj++ {
                        sum := 0.0
                        for kk := k; kk < min(k+blockSize, n); kk++ {
                            sum += a[ii][kk] * b[kk][jj]
                        }
                        result[ii][jj] += sum
                    }
                }
            }
        }
    }
    
    return result
}

This blocked approach is much faster than naive matrix multiplication for large matrices.

Build-Time Optimizations

Compiler flags significantly impact WebAssembly performance. For production builds, I use:

# Production build with optimizations
GOOS=js GOARCH=wasm go build -ldflags="-s -w" -gcflags="-l=4" -o main.wasm main.go

The -gcflags="-l=4" enables aggressive inlining, which can improve performance at the cost of larger binary size.

Profiling WebAssembly Applications

Profiling WebAssembly apps requires different techniques than native Go. I use a combination of browser dev tools and custom instrumentation:

type Profiler struct {
    timings map[string]time.Duration
}

func (p *Profiler) Time(name string, fn func()) {
    start := time.Now()
    fn()
    duration := time.Since(start)
    
    p.timings[name] = duration
    
    // Log to browser console
    js.Global().Get("console").Call("log", 
        fmt.Sprintf("%s took %v", name, duration))
}

func (p *Profiler) GetReport() map[string]interface{} {
    report := make(map[string]interface{})
    for name, duration := range p.timings {
        report[name] = duration.Milliseconds()
    }
    return report
}

This gives you timing information that shows up in browser dev tools.

Data Transfer Optimization

Minimize data transfer between JavaScript and WebAssembly. Instead of passing individual values, use typed arrays for bulk data:

func ProcessImageData(this js.Value, args []js.Value) interface{} {
    // Receive Uint8ClampedArray directly
    imageData := args[0]
    width := args[1].Int()
    height := args[2].Int()
    
    // Process in place when possible
    for i := 0; i < width*height*4; i += 4 {
        r := imageData.Index(i).Int()
        g := imageData.Index(i + 1).Int()
        b := imageData.Index(i + 2).Int()
        
        // Convert to grayscale
        gray := int(0.299*float64(r) + 0.587*float64(g) + 0.114*float64(b))
        
        imageData.SetIndex(i, gray)
        imageData.SetIndex(i+1, gray)
        imageData.SetIndex(i+2, gray)
    }
    
    return "processed"
}

Processing data in place avoids copying large amounts of data across the boundary.

Concurrency Optimization

Go’s goroutines work in WebAssembly, but they’re cooperative rather than preemptive. Use them for I/O operations and to keep the UI responsive:

func ProcessLargeDataset(this js.Value, args []js.Value) interface{} {
    data := args[0]
    callback := args[1]
    
    go func() {
        // Process data in background
        result := heavyComputation(data)
        
        // Call JavaScript callback with result
        callback.Invoke(result)
    }()
    
    return "processing started"
}

This keeps the main thread responsive while processing happens in the background.

Common Performance Pitfalls

I’ve made every performance mistake possible with WebAssembly:

Too many small function calls: Batch operations instead
Excessive memory allocations: Reuse buffers and objects
Ignoring cache locality: Use cache-friendly algorithms
Not profiling: Measure before optimizing
Premature optimization: Profile first, then optimize hot paths

Measuring Performance

Always measure performance with realistic data and usage patterns. Browser dev tools provide excellent profiling capabilities for WebAssembly:

Use the Performance tab to see where time is spent
Check the Memory tab for garbage collection issues
Monitor network activity for large WebAssembly downloads
Use console.time() for custom measurements

Real-World Optimization Example

Here’s how I optimized an image processing function that was too slow:

// Before: slow due to many small operations
func BlurImageSlow(imageData js.Value, width, height int) {
    for y := 1; y < height-1; y++ {
        for x := 1; x < width-1; x++ {
            // Get surrounding pixels one by one (slow)
            r, g, b := getPixelAverage(imageData, x, y, width)
            setPixel(imageData, x, y, r, g, b)
        }
    }
}

// After: fast due to batch processing
func BlurImageFast(imageData js.Value, width, height int) {
    // Process entire rows at once
    for y := 1; y < height-1; y++ {
        processRow(imageData, y, width)
    }
}

The optimized version is 10x faster because it minimizes JavaScript calls and processes data in larger chunks.

Performance Testing

I always test performance with realistic scenarios:

func BenchmarkProcessing(this js.Value, args []js.Value) interface{} {
    sizes := []int{100, 1000, 10000, 100000}
    results := make(map[string]interface{})
    
    for _, size := range sizes {
        data := generateTestData(size)
        
        start := time.Now()
        processData(data)
        duration := time.Since(start)
        
        results[fmt.Sprintf("size_%d", size)] = duration.Milliseconds()
    }
    
    return results
}

This helps identify performance characteristics at different scales.

Performance optimization in WebAssembly requires a different mindset than traditional Go optimization. The boundary between Go and JavaScript dominates performance characteristics, so design your applications to minimize crossings and batch operations effectively.

Debugging comes next - tools and techniques for finding and fixing issues in WebAssembly applications.