Every test earns its keep. How certree thinks about testing, and why the test suite is shaped the way it is.
Most projects treat test count as a proxy for quality. More tests, more confidence, right? Not exactly.
certree learned this the hard way. The test suite grew to roughly three lines of test code for every line of source. That sounds thorough. In practice, it meant:
- Property tests that used gopter but tested deterministic functions with two possible inputs. Random generation over a boolean is not property-based testing -- it is a coin flip with extra steps.
- Unit tests that verified the same behavior as a property test sitting in the next file over. Two tests, one truth, double the maintenance.
- Structural checks dressed up as properties. "Every flag has metadata" is a fact you can assert once, not a property you need to verify across random inputs.
The test suite was not wrong. It was wasteful. A systematic optimization cut the test suite by roughly a third while maintaining the same coverage -- in some areas, coverage actually improved because the remaining tests were better targeted. Every surviving test earns its place. Tests that do not catch bugs you would not otherwise catch are not earning their keep. They are overhead: slower CI, harder refactors, more noise when something actually breaks.
certree organizes tests into three layers. Each layer has a job, and tests belong in the layer where they do the most good.
/\
/ \ Integration tests (handful)
/ \ (component workflows, E2E)
/------\
/ \ Property tests (modest layer)
/ \ (invariants across generated inputs)
/------------\
/ \ Unit tests (vast majority)
/________________\ (specific examples, edge cases, error paths)
The bulk of the suite is unit tests. Property tests form a smaller but critical middle layer. Integration tests are intentionally few -- each one exercises a cross-component boundary that neither unit nor property tests can reach. A separate benchmark suite tracks performance regressions across all three packages.
Unit tests verify that specific inputs produce expected outputs. They are fast, focused, and easy to debug when they fail. A failing unit test tells you exactly what broke.
Use unit tests for:
- Known input/output pairs ("this PEM produces this certificate")
- Edge cases (empty input, nil values, boundary conditions)
- Error paths ("malformed PEM returns an error, not a panic")
- Structural facts ("every flag belongs to exactly one group")
Unit tests are the right choice when the input space is small or when you need to verify a specific scenario. If there are only three valid inputs, write three test cases. Do not spin up a random generator to pick among them.
Property tests define invariants that must hold across all valid inputs, then verify those invariants against randomly generated examples. They are certree's primary tool for catching bugs that handwritten examples miss.
A good property test earns its keep by exploring a combinatorial space that no reasonable number of unit tests could cover. The chain builder, for example, must correctly identify trust paths regardless of certificate order, chain depth, or cross-signing topology. Writing unit tests for every combination is impractical. Writing a property that says "every certificate in the input appears in at least one output path" and letting gopter generate thousands of certificate sets -- that scales.
Use property tests for:
- Invariants across large input spaces ("parsed certificates preserve all fields")
- Roundtrip properties ("parse then serialize produces the original")
- Relational properties ("adding a cross-signed intermediate multiplies paths")
- Boundary exploration ("chain depth limits are enforced for any depth value")
Do not use property tests for:
- Deterministic functions with trivially small input spaces
- Behavior that depends on a single known configuration
- Third-party library behavior (stdlib, pflag, etc.)
- Structural facts that can be asserted once in a unit test
The litmus test: if you replaced the random generator with a single hardcoded example and the test would be equally effective, it should be a unit test.
certree uses gopter for
property-based testing. gopter provides generators for building random
inputs and a shrinking mechanism for minimizing failing cases. When a
property test fails, it gives you a minimal counterexample -- the simplest
input that violates the invariant. This is often more valuable than a
passing test, because it reveals assumptions you did not know you were
making. The project maintains custom generators in pkg/certree/testutil
for domain-specific types like certificates, trust paths, and analysis
results.
Property tests run with reduced iterations in -short mode (10 instead
of 100) so development feedback stays fast. They are never skipped
entirely -- even a few iterations catch regressions that unit tests miss.
Integration tests exercise multiple components together in realistic workflows. They are the slowest and most expensive layer, so certree uses them sparingly -- but they catch problems that no amount of unit or property testing reveals.
Integration tests verify that the parser, chain builder, validator, and render components work together correctly. They use real certificate data and exercise the full pipeline from input to output.
These tests are never deleted during optimization. They represent end-to-end confidence that the system works as a whole.
Tests live next to the code they test, following Go convention:
parser.go->parser_test.go(unit tests)parser.go->parser_property_test.go(property tests)integration_test.go(integration tests, one per package)
Each source file has at most one property test file, no exceptions. During the optimization, several split-out property test files were merged back into their parent property test files to enforce this rule. Bugfix regression tests live alongside the other property tests for the same source file, organized into logical sections within the file.
pkg/certree/testutil provides shared certificate generators, PEM
encoding helpers, and gopter generators. Before writing a local test
helper, check testutil -- if it does not have what you need, add it there
rather than duplicating in your test file. See the package godoc or
pkg/certree/testutil/testutil.go for the full API.
Mocks and helpers that reference package-internal types (like
*Certificate, TrustStore, AIAFetcher) cannot live in testutil due to
Go's circular import restriction. These stay as local helpers in the test
files that define them, and are shared across test files within the same
package via Go's standard _test.go compilation model.
Generating certificates is expensive (RSA key generation, signing). Tests that need many certificates -- especially property tests running hundreds of iterations -- should cache generated certificates rather than creating fresh ones each time.
The project uses patterns like sync.Once templates and count-keyed
caches to avoid redundant key generation. This keeps property tests fast
enough to run on every commit without sacrificing the breadth of random
exploration.
Benchmarks follow the same "earn their keep" philosophy. They exist to catch performance regressions and guide optimization decisions -- not as vanity metrics.
Each package that has performance-sensitive code has a single
bench_test.go file. The library benchmarks cover the core hot paths:
certificate wrapping, fingerprinting, PEM encoding, JSON serialization,
parsing at multiple scales, chain building, cross-sign detection, trust
store queries, validation, simulation, analysis creation, single-source
pipeline, and batch processing. The render benchmarks cover tree rendering
(default and detailed modes), comparison, diff, impact summary, and
ANSI-aware string utilities. The CLI benchmarks cover flag registration,
config construction, analyzer building, the full E2E pipeline, tree and
JSON render dispatch, field parsing, and CN filtering with compiled
pattern matching. Sub-benchmarks (like BenchmarkParsePEM/1cert and
BenchmarkParsePEM/10certs) test scaling without duplicating setup.
# Run benchmarks for a specific package
go test -bench=. -benchmem -run='NOMATCH' ./pkg/certree/If you have not read Go benchmark output before: ns/op is how long the
function takes, B/op is how much memory it allocates, and allocs/op
is how many heap allocations it makes. Fewer allocations means less
garbage collector pressure.
A well-tested certree component has:
- Unit tests for specific scenarios, edge cases, and error paths
- Property tests for invariants across the input space -- but only where random generation provides value beyond what handwritten examples would
- No redundancy between the two -- if a property test covers a space, unit tests focus on edge cases the property might miss, not on re-verifying the same invariant
- Shared infrastructure from testutil, not local helpers that duplicate it
- Benchmarks for hot-path functions, living in
bench_test.go, catching regressions that correctness tests cannot
The test suite should be something you trust. When tests pass, you should believe the code works. When a test fails, you should believe there is a real bug. Tests that cry wolf -- that fail for irrelevant reasons or pass despite real problems -- undermine that trust.
See also: Design Philosophy, Certificate Trust Paths, gopter, Go Testing.