Skip to content

perf: v1.2.2 — core allocation reduction + security + DX#147

Merged
FumingPower3925 merged 11 commits intomainfrom
perf/v1.2.2-alloc-reduction
Apr 1, 2026
Merged

perf: v1.2.2 — core allocation reduction + security + DX#147
FumingPower3925 merged 11 commits intomainfrom
perf/v1.2.2-alloc-reduction

Conversation

@FumingPower3925
Copy link
Copy Markdown
Contributor

@FumingPower3925 FumingPower3925 commented Apr 1, 2026

Summary

Comprehensive v1.2.2 release combining allocation reduction, security hardening, developer experience improvements, and code quality cleanup.

Performance (v1.2.1 → v1.2.2)

Middleware Before After Speedup Allocs
Logger 722 ns / 10 319 ns / 0 2.3x / -100%
Recovery 252 ns / 10 92 ns / 0 2.7x / -100%
CORS Preflight 359 ns / 10 141 ns / 0 2.5x / -100%
CORS Simple 322 ns / 12 96 ns / 0 3.4x / -100%
RateLimit 406 ns / 13 173 ns / 1 2.3x / -92%
RequestID 375 ns / 14 157 ns / 2 2.4x / -86%
Timeout 764 ns / 17 657 ns / 8 1.2x / -53%
BodyLimit 340 ns / 12 97 ns / 0 3.5x / -100%
BasicAuth 451 ns / 17 144 ns / 2 3.1x / -88%

Celeris beats Fiber v3 on all 9 middleware benchmarks.

Security

  • CL-CL smuggling prevention: H1 parser rejects duplicate Content-Length headers with conflicting values (RFC 7230 §3.3.3)
  • Symlink traversal prevention: FileFromDir resolves symlinks via filepath.EvalSymlinks + prefix recheck
  • Directory listing prevention: FileFromDir rejects directory paths with IsDir() check
  • BasicAuth buffer guard: DecodedLen pre-check prevents panic on oversized credentials

Core Optimizations

  • Inline respHdrBuf / paramBuf / handlerBuf / hdrBuf backing arrays
  • Pre-allocated keys map, lazy Data/OutboundBuffer in NewStream
  • Stream recycling in celeristest via ResetForPool
  • Config pooling with inline buffers in celeristest
  • Pre-allocated notFound/methodNotAllowed handler chains
  • Hoisted CRLF/cookie replacers, inline fast-path scans in SetHeader/AddHeader
  • clear() on backing arrays in reset for GC hygiene
  • Consolidated 100MB body limit constants

DX

  • Header(key) auto-lowercases uppercase keys
  • FormValueOK (renamed from FormValueOk, deprecated alias kept)
  • QueryBool, QueryInt64, ParamDefault convenience methods

Code Quality

  • Removed dead internal/timer/ package
  • SetCookie Builder.Grow(128) pre-allocation
  • Mock ResponseWriter copies headers (prevents clear() aliasing)

Test plan

  • go test -race ./... passes (all packages except pre-existing h2spec)
  • go vet ./... clean
  • golangci-lint clean
  • Cross-platform build (linux/amd64, linux/arm64, darwin/arm64)
  • 32-agent verification sweep (security, correctness, DX, docs, concurrency)
  • Security review: no vulnerabilities found
  • Benchstat comparison with n=10 against v1.2.1 baseline
  • Full 5-framework cross-benchmark (Celeris, Fiber, Echo, Chi, Stdlib)

Initialize respHeaders to use the inline respHdrBuf backing array in
the context pool constructor and reset(). This eliminates the heap
allocation triggered by the first SetHeader() call via append().

Fix the aliasing hazard in Blob() where respHeaders and respHdrBuf now
share the same backing array — copy user headers to a stack temporary
before overwriting the buffer with content-type and content-length.
Initialize keys with make(map[string]any, 4) in the pool constructor
and use clear() in reset() instead of niling the map. This retains
the hash table buckets across requests, eliminating the heap allocation
on the first c.Set() call.
Remove eager getBuf() for OutboundBuffer from NewStream(). The buffer
is only needed when flow control prevents immediate sends. BufferOutbound()
already has a nil guard and now uses getBuf() (pool) instead of
new(bytes.Buffer).

Saves 1 alloc per NewStream() call — all test contexts and H2 streams
that don't need flow-control buffering.
Add hdrBuf [16][2]string to Stream struct and use it as the backing
array for Headers in NewStream, NewH1Stream, Release, ResetH1Stream,
and ResetH2StreamInline. This eliminates the make([][2]string, 0, N)
heap allocation on first pool retrieval and re-anchors to the inline
buffer on release (in case append grew beyond 16).
Use base64.StdEncoding.Decode with a [128]byte stack buffer instead of
DecodeString which heap-allocates. Convert the auth payload string to
[]byte via unsafe.Slice(unsafe.StringData(...)) without allocation
(read-only, safe for Decode which only reads src).

Eliminates the intermediate []byte heap allocation; the two returned
strings (username, password) are individually smaller than the full
decoded buffer.
Bundle ResponseRecorder and recorderWriter into a single recorderCombo
struct managed by sync.Pool. NewContext gets the combo from the pool
instead of allocating two heap objects per call. ReleaseContext extracts
the combo via the new ctxkit.GetResponseWriter hook and returns it.

WriteResponse reuses the Body slice via append(w.rec.Body[:0], body...)
instead of allocating a new []byte each time.
Comprehensive allocation elimination across the request lifecycle:

- Stream.Data lazy allocation: remove eager getBuf() from NewStream;
  Data is allocated on first write via GetBuf()/AddData()
- Stream.ResetForPool: new function for test recycling without Cancel
  (avoids race with context.WithTimeout propagation goroutines)
- Stream.HasDoneCh: detect derived contexts for safe recycling decisions
- Context.handlerBuf [8]HandlerFunc: inline handler chain buffer;
  SetHandlers uses it for chains <=8 (avoids make([]HandlerFunc))
- ctxkit.GetStream hook: enables celeristest to extract and recycle
  the stream before context release
- celeristest config pooling: sync.Pool with inline headersBuf[4] and
  handlersBuf[4]; WithHandlers uses inline buffer for <=4 handlers
- celeristest headers: append to hdrBuf instead of slice literal
- celeristest ReleaseContext: recycles stream to pool (ResetForPool)
  for streams without derived contexts; Cancel-only for streams with
  doneCh to avoid goroutine reference races
- BasicAuth: single string allocation (decoded[:i], decoded[i+1:])
  instead of two separate string() conversions
- stdlib.go, tests: s.Data.Write → s.GetBuf().Write for lazy Data

Result: 0 allocs/op on 6 of 9 middleware benchmarks (was 10-17).
Performance:
- Embedded paramBuf [4]Param on Context for inline route parameter storage
- Hoisted stripCRLF/stripCookieUnsafe replacers to package-level vars
- Inline fast-path scans in AddHeader/SetHeader (skip non-inlineable calls)
- Consolidated three 100MB constants into single maxBodySize
- Scheme() fast-path for common "http"/"https" values
- Pre-allocated notFound/methodNotAllowed handler chains on routerAdapter
- Clear backing arrays (respHdrBuf, paramBuf, hdrBuf, Trailers) in reset
  for GC hygiene — prevents stale strings from being pinned
- SetCookie Builder.Grow(128) pre-allocation

Security:
- H1 parser: reject duplicate Content-Length with conflicting values
  (CL-CL request smuggling prevention, RFC 7230 §3.3.3)
- FileFromDir: resolve symlinks via filepath.EvalSymlinks + recheck
  prefix to prevent symlink-based directory traversal
- FileFromDir: reject directory paths with IsDir() check

DX:
- Header(key) auto-lowercases uppercase keys (net/http compat)
- FormValueOK canonical name (FormValueOk deprecated alias kept)
- QueryBool(key, default) convenience method
- QueryInt64(key, default) convenience method
- ParamDefault(key, default) convenience method

Code quality:
- Removed dead internal/timer/ package
- Mock ResponseWriter copies headers (prevents clear() aliasing)
- Blob header assembly uses len(c.respHdrBuf) instead of magic 8

Tests:
- TestQueryBool (16 cases), TestQueryInt64 (8 cases), TestParamDefault
- TestWithHandlers (4 tests: chain order, many handlers, error, abort)
- TestFormValueOkDeprecated
- TestParseRequest_DuplicateContentLength (3 subtests × 2 modes)
base64.StdEncoding.Decode panics (not returns error) when the decoded
output exceeds the destination buffer. Add a DecodedLen pre-check
before decoding to gracefully return ok=false for credentials exceeding
the 128-byte stack buffer, preventing a panic on crafted Authorization
headers with long payloads.
@FumingPower3925 FumingPower3925 self-assigned this Apr 1, 2026
@FumingPower3925 FumingPower3925 merged commit 00274dc into main Apr 1, 2026
10 checks passed
@FumingPower3925 FumingPower3925 deleted the perf/v1.2.2-alloc-reduction branch April 1, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment