Lab 12: Performance Architecture

Time: 60 minutes | Level: Architect | Docker: golang:1.22-alpine

Overview

Go performance at scale: zero-allocation patterns (sync.Pool, stack allocation, avoid interface boxing), SIMD via assembly stubs, //go:noescape pragma, memory-mapped files, bytes.Buffer vs strings.Builder benchmarks, and pprof profiling.


Step 1: Zero-Allocation Patterns

package perf

import (
	"sync"
)

// Pattern 1: sync.Pool — reuse allocations
// Cost: ~15ns per Get/Put (vs ~50ns new alloc)
var bufferPool = sync.Pool{
	New: func() interface{} {
		buf := make([]byte, 0, 4096)
		return &buf
	},
}

func processRequest(data []byte) []byte {
	// Get from pool (might be reused buffer)
	bufPtr := bufferPool.Get().(*[]byte)
	buf := (*bufPtr)[:0]  // Reset length, keep capacity
	defer func() {
		*bufPtr = buf
		bufferPool.Put(bufPtr)
	}()

	// Use buf without allocation
	buf = append(buf, data...)
	buf = append(buf, " processed"...)

	// Copy result before returning buf to pool
	result := make([]byte, len(buf))
	copy(result, buf)
	return result
}

// Pattern 2: Avoid interface boxing — use concrete types
// BAD: each fmt.Sprintf allocates for interface{}
//   func process(v interface{}) string { return fmt.Sprintf("%v", v) }

// GOOD: concrete type, no boxing
func processInt(v int) string {
	return strconv.Itoa(v)  // Stack-allocated, no escape
}

// Pattern 3: Slice capacity hints — avoid repeated copying
func collectItems(source <-chan string, expected int) []string {
	items := make([]string, 0, expected) // Pre-allocate
	for item := range source {
		items = append(items, item)
	}
	return items
}

Step 2: strings.Builder vs bytes.Buffer


Step 3: Escape Analysis — Stack vs Heap


Step 4: SIMD and Assembly Stubs


Step 5: Memory-Mapped Files


Step 6: Profiling with pprof


Step 7: Concurrency Patterns for Performance


Step 8: Capstone — Allocation Benchmarks

📸 Verified Output:


Summary

Technique
Mechanism
Benefit

sync.Pool

Reuse heap objects

Reduce GC pressure

strings.Builder

Grow() + single final alloc

O(1) string builds

Stack allocation

Return by value, no pointers

0 GC cost

Escape analysis

-gcflags="-m"

Find hidden allocs

Assembly SIMD

//go:noescape stubs

4-8× throughput

Memory-map

syscall.Mmap

Zero-copy file reads

Batch processing

Group items before send

Amortize overhead

go test -bench

-cpuprofile -memprofile

Measure before optimizing

Last updated