Problem
The current random() function in policyengine_core/commons/formulas.py uses a global execution counter (count_random_calls) to differentiate random streams:
seeds = np.abs(entity_ids * 100 + population.simulation.count_random_calls)
This creates a "ripple effect": adding, removing, or reordering variables that call random() changes the random values for ALL subsequent variables. This makes it impossible to:
- Compare policy versions with confidence (random noise shifts underneath)
- Isolate the effect of a specific policy change
- Run variables in parallel without counter synchronization
Proposed Solution: Name-Based Salting
Replace the global counter with the variable name (accessible via population.simulation.tracer.stack[-1]["name"]):
base_seed = stable_hash(f"{variable_name}:{per_variable_call_count}")
seeds = entity_ids ^ base_seed
Benefits:
- Order-independent: Adding/removing variables doesn't affect others
- True reproducibility: Same variable + entity ID = same value, always
- Parallelizable: No global state to synchronize
Breaking Change
This will change random values for all existing simulations using random(). Downstream packages (policyengine-us, policyengine-uk) will see different takeup modeling results.
Questions for Maintainers
- Is this change acceptable given the breaking nature?
- Should we provide a
legacy_random() for transition?
- Any concerns about the tracer stack approach?
Problem
The current
random()function inpolicyengine_core/commons/formulas.pyuses a global execution counter (count_random_calls) to differentiate random streams:This creates a "ripple effect": adding, removing, or reordering variables that call
random()changes the random values for ALL subsequent variables. This makes it impossible to:Proposed Solution: Name-Based Salting
Replace the global counter with the variable name (accessible via
population.simulation.tracer.stack[-1]["name"]):Benefits:
Breaking Change
This will change random values for all existing simulations using
random(). Downstream packages (policyengine-us, policyengine-uk) will see different takeup modeling results.Questions for Maintainers
legacy_random()for transition?