Performance Optimization
Performance optimization in WebAssembly is different from both traditional Go and JavaScript optimization. I learned this the hard way when my first “optimized” WebAssembly app was slower than the JavaScript version it replaced. The problem wasn’t WebAssembly - it was my assumptions about what makes WebAssembly fast.
WebAssembly excels at CPU-intensive tasks with predictable memory patterns. It’s not automatically faster at everything, and the boundary between JavaScript and WebAssembly has real costs.
Understanding WebAssembly Performance
The biggest performance insight: WebAssembly is fast at sustained computational work, but crossing the boundary between JavaScript and WebAssembly has overhead. If you’re making thousands of tiny function calls, you might be slower than pure JavaScript.
Here’s a simple benchmark that demonstrates this:
// Efficient: batch processing
func ProcessBatch(this js.Value, args []js.Value) interface{} {
data := args[0]
length := data.Get("length").Int()
result := 0.0
for i := 0; i < length; i++ {
value := data.Index(i).Float()
result += math.Sqrt(value*value + 1) // Heavy computation
}
return result
}
// Inefficient: many small calls
func ProcessSingle(this js.Value, args []js.Value) interface{} {
value := args[0].Float()
return math.Sqrt(value*value + 1)
}
The batch version processes 10,000 items in ~5ms. Calling the single version 10,000 times takes ~200ms due to boundary crossing overhead.
Memory Management Optimization
Go’s garbage collector runs in WebAssembly, but memory allocation patterns affect performance differently than in native Go. I’ve learned to minimize allocations in hot paths:
type OptimizedProcessor struct {
buffer []float64 // Reuse this buffer
}
func (op *OptimizedProcessor) Process(data []float64) []float64 {
// Reuse buffer if it's large enough
if cap(op.buffer) < len(data) {
op.buffer = make([]float64, len(data))
}
op.buffer = op.buffer[:len(data)]
for i, value := range data {
op.buffer[i] = value * 2.0 // Some computation
}
return op.buffer
}
Reusing buffers reduces garbage collection pressure and improves performance in WebAssembly environments.
Algorithmic Optimization
Choose algorithms that work well with WebAssembly’s characteristics. Cache-friendly algorithms with predictable memory access patterns perform best:
// Cache-friendly matrix multiplication
func MultiplyMatrices(a, b [][]float64) [][]float64 {
n := len(a)
result := make([][]float64, n)
for i := range result {
result[i] = make([]float64, n)
}
// Block-wise multiplication for better cache performance
blockSize := 64
for i := 0; i < n; i += blockSize {
for j := 0; j < n; j += blockSize {
for k := 0; k < n; k += blockSize {
// Process block
for ii := i; ii < min(i+blockSize, n); ii++ {
for jj := j; jj < min(j+blockSize, n); jj++ {
sum := 0.0
for kk := k; kk < min(k+blockSize, n); kk++ {
sum += a[ii][kk] * b[kk][jj]
}
result[ii][jj] += sum
}
}
}
}
}
return result
}
This blocked approach is much faster than naive matrix multiplication for large matrices.
Build-Time Optimizations
Compiler flags significantly impact WebAssembly performance. For production builds, I use:
# Production build with optimizations
GOOS=js GOARCH=wasm go build -ldflags="-s -w" -gcflags="-l=4" -o main.wasm main.go
The -gcflags="-l=4"
enables aggressive inlining, which can improve performance at the cost of larger binary size.
Profiling WebAssembly Applications
Profiling WebAssembly apps requires different techniques than native Go. I use a combination of browser dev tools and custom instrumentation:
type Profiler struct {
timings map[string]time.Duration
}
func (p *Profiler) Time(name string, fn func()) {
start := time.Now()
fn()
duration := time.Since(start)
p.timings[name] = duration
// Log to browser console
js.Global().Get("console").Call("log",
fmt.Sprintf("%s took %v", name, duration))
}
func (p *Profiler) GetReport() map[string]interface{} {
report := make(map[string]interface{})
for name, duration := range p.timings {
report[name] = duration.Milliseconds()
}
return report
}
This gives you timing information that shows up in browser dev tools.
Data Transfer Optimization
Minimize data transfer between JavaScript and WebAssembly. Instead of passing individual values, use typed arrays for bulk data:
func ProcessImageData(this js.Value, args []js.Value) interface{} {
// Receive Uint8ClampedArray directly
imageData := args[0]
width := args[1].Int()
height := args[2].Int()
// Process in place when possible
for i := 0; i < width*height*4; i += 4 {
r := imageData.Index(i).Int()
g := imageData.Index(i + 1).Int()
b := imageData.Index(i + 2).Int()
// Convert to grayscale
gray := int(0.299*float64(r) + 0.587*float64(g) + 0.114*float64(b))
imageData.SetIndex(i, gray)
imageData.SetIndex(i+1, gray)
imageData.SetIndex(i+2, gray)
}
return "processed"
}
Processing data in place avoids copying large amounts of data across the boundary.
Concurrency Optimization
Go’s goroutines work in WebAssembly, but they’re cooperative rather than preemptive. Use them for I/O operations and to keep the UI responsive:
func ProcessLargeDataset(this js.Value, args []js.Value) interface{} {
data := args[0]
callback := args[1]
go func() {
// Process data in background
result := heavyComputation(data)
// Call JavaScript callback with result
callback.Invoke(result)
}()
return "processing started"
}
This keeps the main thread responsive while processing happens in the background.
Common Performance Pitfalls
I’ve made every performance mistake possible with WebAssembly:
- Too many small function calls: Batch operations instead
- Excessive memory allocations: Reuse buffers and objects
- Ignoring cache locality: Use cache-friendly algorithms
- Not profiling: Measure before optimizing
- Premature optimization: Profile first, then optimize hot paths
Measuring Performance
Always measure performance with realistic data and usage patterns. Browser dev tools provide excellent profiling capabilities for WebAssembly:
- Use the Performance tab to see where time is spent
- Check the Memory tab for garbage collection issues
- Monitor network activity for large WebAssembly downloads
- Use console.time() for custom measurements
Real-World Optimization Example
Here’s how I optimized an image processing function that was too slow:
// Before: slow due to many small operations
func BlurImageSlow(imageData js.Value, width, height int) {
for y := 1; y < height-1; y++ {
for x := 1; x < width-1; x++ {
// Get surrounding pixels one by one (slow)
r, g, b := getPixelAverage(imageData, x, y, width)
setPixel(imageData, x, y, r, g, b)
}
}
}
// After: fast due to batch processing
func BlurImageFast(imageData js.Value, width, height int) {
// Process entire rows at once
for y := 1; y < height-1; y++ {
processRow(imageData, y, width)
}
}
The optimized version is 10x faster because it minimizes JavaScript calls and processes data in larger chunks.
Performance Testing
I always test performance with realistic scenarios:
func BenchmarkProcessing(this js.Value, args []js.Value) interface{} {
sizes := []int{100, 1000, 10000, 100000}
results := make(map[string]interface{})
for _, size := range sizes {
data := generateTestData(size)
start := time.Now()
processData(data)
duration := time.Since(start)
results[fmt.Sprintf("size_%d", size)] = duration.Milliseconds()
}
return results
}
This helps identify performance characteristics at different scales.
Performance optimization in WebAssembly requires a different mindset than traditional Go optimization. The boundary between Go and JavaScript dominates performance characteristics, so design your applications to minimize crossings and batch operations effectively.
Debugging comes next - tools and techniques for finding and fixing issues in WebAssembly applications.