Commit d4e7201
authored
* feat(api,ui): expose seeder date range and scale controls (#82) (#83)
Surface the existing GenerateParams knobs in the admin Data Seeder panel
(scenario, date range, store/product counts, seed, sparsity) so operators
no longer have to drop to the CLI to seed a different year. Form state
persists in localStorage and a reset-to-defaults button is provided.
Also fixes a latent service-layer bug: when overriding stores/products on
a scenario preset, _build_config_from_params replaced the whole
DimensionConfig and silently dropped scenario-customized store_regions,
store_types, product_categories, and product_brands. Now uses
dataclasses.replace so only the count fields change. Adds two regression
tests covering holiday_rush + custom store/product counts.
* feat(docs,repo): land claude.md and docs/_base reference suite (#86) (#87)
Closes #86.
Generated via the /w7_generating-claudemd skill in HEURISTIC_MODE
(docs/_kB/repo-map/ KB not yet present).
Adds:
- CLAUDE.md (116 lines, 812 words; references .claude/rules/* and
docs/_base/* via @imports — within the 150-line / 1800-word skill
budget)
- docs/_base/ARCHITECTURE.md (system boundaries, components, comm
patterns, deploy chain)
- docs/_base/API_CONTRACTS.md (HTTP surface across 12 slices +
WebSocket + external integrations)
- docs/_base/RUNBOOKS.md (common incidents, release/rollback,
WSL/pnpm traps from prior session HANDOFF)
- docs/_base/SECURITY.md (threat model, hard rules from
security-patterns.md, scanning matrix)
- docs/_base/RULES.md (Change Authority Matrix + invariants +
forbidden patterns, consolidated from .claude/rules/*)
- docs/_base/DOMAIN_MODEL.md (bounded contexts, aggregates,
invariants, ubiquitous language)
- docs/_base/DEV_GUIDE.md (human-maintained stub — {FILL IN} markers
for a maintainer to complete)
- docs/_base/REPO_MAP_INDEX.md (index across README, PHASE docs,
ADRs, PRPs, .claude/, docs/_base/)
- docs/_base/PIPELINE_CONTRACT.md (CI/CD stages, merge gates,
release flow)
.gitignore adjustments:
- Remove `CLAUDE.md` (was blocking the doc from being shared and
from being read by Claude in fresh clones)
- Add `CLAUDE.local.md` (personal-prefs file — local-only by design)
- Stale `.claude` duplicates on lines 2 and 5 left for a separate
cleanup PR (deduping won't change behavior since `.claude/` is
already tracked)
Re-run the skill after a future mapping-repo-context run to drop the
remaining 5 [UNVERIFIED] meta-flags.
* feat(data,api): seeder Phase 1 — exogenous signals, multi-seasonality, changepoints, returns, substitution (#88)
* fix(ci): pin third-party github actions by sha (#84)
Closes #84.
Per .claude/rules/security-patterns.md: "Pin third-party GitHub Actions by
full 40-char SHA"; first-party actions/* may use major-version.
Pinned (third-party):
- googleapis/release-please-action@v5
→ @45996ed1f6d02564a971a2fa1b5860e934307cf7 # v5.0.0
- astral-sh/setup-uv@v7 (×8 across all five workflows)
→ @37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
- github/codeql-action/upload-sarif@v4
→ @c6f931105cb2c34c8f901cc885ba1e2e259cf745 # v4.34.0
Left as major-tag (first-party actions/* — rule-permitted):
- actions/checkout@v6
- actions/upload-artifact@v7
Dependabot watches .github/workflows/ weekly and will bump these forward.
* chore(repo): gitignore local session artifacts (#90)
* fix(ci): pin uv run with --frozen to stop transient resolution failures (#95) (#96)
every uv run invocation in ci.yml, schema-validation.yml, and
phase-snapshot.yml now uses --frozen. without it, uv re-resolves the
dependency graph at command time and crashes when a freshly published
pydantic-ai-slim version's [mistral] extra requires a mistralai version
that does not yet exist on PyPI — observed on PR #93's most recent push
where all five blocking CI jobs went red 75 minutes after a green run
on the same branch with the same lockfile.
dependency-check.yml's pip-audit calls deliberately retain the
re-resolve behavior; that workflow's purpose is to pick up newly
published vulnerabilities.
uv sync --frozen --all-extras --dev was already in place to install
the lock; this patch propagates the same intent to every subsequent
uv run.
* feat(data,db): seeder phase 2 — retail-depth foundation + lifecycle generator (#92) (#93)
* feat(data,db): seeder phase 2 chunk A — retail-depth schema + configs (#92)
Lays the foundation for Phase 2 retail depth without changing any
generator behaviour:
- Alembic migration a8b9c0d1e234 adds sales_daily.channel (NOT
NULL, server default 'in_store'), product lifecycle fields
(lifecycle_stage, launch_date, discontinue_date,
pack_size, subcategory), promotion kind discriminator
with JSONB bundle_member_product_ids, and a new
replenishment_event table. All additive; retail_standard
rows are unchanged.
- ORM mirrors the schema, including a load-bearing
JSONB(none_as_null=True) so the bundle-members CHECK fires.
- Five new config dataclasses (ChannelConfig, LifecycleConfig,
BundleConfig, MarkdownConfig, LeadTimeConfig) wired to
SeederConfig with disabled defaults so all existing scenarios
produce byte-identical row counts.
- 25 integration tests cover the new CHECK + nullability
constraints; 8 unit tests guard the config defaults + regression
invariant across every ScenarioPreset.
* feat(data): seeder phase 2 chunk B (1/5) — product lifecycle generator (#92)
First slice of Phase 2 generators. Strict regression invariant: with
``LifecycleConfig.enable=False`` (default) ProductGenerator's output
and rng-draw sequence are byte-identical to pre-Phase-2.
- ProductGenerator gains optional ``lifecycle_config`` + ``date_range``
parameters. When enabled, each product row carries
``subcategory``, ``pack_size``, ``launch_date``,
``discontinue_date``, ``lifecycle_stage``.
- New ``LifecycleGenerator`` (pure compute, no DB) computes per-(product,
date) demand multipliers across intro/growth/maturity/decline/
discontinued segments. Disabled path returns 1.0 without touching rng.
- 14 new unit tests cover the regression invariant + each ramp segment
+ discontinue override + reproducibility under enabled mode.
Remaining chunk B work (next commits on this branch):
- BundleGenerator (BOGO + bundle promotions)
- MarkdownGenerator (clearance pricing)
- ReplenishmentGenerator (lead-time-driven replenishment_event rows)
- SalesDailyGenerator channel split + lifecycle multiplier integration
* feat(data): seeder phase 2 chunk B (2/5) — bundle/BOGO generator (#92)
Second slice of Phase 2 generators. Same regression invariant as B 1/5:
with ``BundleConfig.enable=False`` (default) ``BundleGenerator.apply``
leaves both the promotion list and the rng state byte-identical.
- New ``BundleGenerator`` (pure compute, no DB) wraps
``PromotionGenerator``'s output: per-promo ``bundle_probability``
chance to convert to ``kind='bundle'`` or ``kind='bogo'`` (split by
``bogo_share_within_bundles``), drawing 2–``max_bundle_size`` member
product IDs (host excluded) and a discount in
``[bundle_discount_pct_min, bundle_discount_pct_max]`` quantized to
``Numeric(5, 4)``. ``discount_amount`` is cleared on converted rows
to keep the row internally consistent with the new ``discount_pct``.
- Locked rng order per converted promo: ``random()`` (convert?) →
``random()`` (bogo?) → ``randint()`` (n_members) → ``sample()``
(members) → ``uniform()`` (discount). Per-host pool-too-small skip
happens before any rng draw so the stream stays stable across runs
where only the product pool shrinks.
- 18 new unit tests cover the regression invariant (no mutation, no
rng consumption) + kind allow-list + member-pool sourcing + count
+ discount range + BOGO/bundle split at extremes + reproducibility
+ best-effort skip for small pools + config validation.
Remaining chunk B work:
- MarkdownGenerator (clearance pricing — needs Open Q on inventory
age coupling resolved before starting)
- ReplenishmentGenerator (lead-time-driven replenishment_event rows)
- SalesDailyGenerator channel split + lifecycle multiplier integration
* feat(data): seeder phase 2 chunk B (3/5) — markdown generator (#92)
Third slice of Phase 2 generators. Same regression invariant as B 1/5
and B 2/5: with ``MarkdownConfig.enable=False`` (default) the
generator emits empty containers and consumes zero rng state.
- New ``MarkdownGenerator`` (pure compute, no DB) emits
``Promotion(kind='markdown')`` rows + companion ``PriceHistory``
drop rows + a ``markdown_dates`` lookup keyed by
``(store_id, product_id)`` for the future ``SalesDailyGenerator``
lift integration in chunk B 5/5.
- Two triggers ship in this slice:
- ``lifecycle_decline`` — chain-wide markdown (``store_id=None``)
starting on the first date a product enters the ``decline`` stage
according to a passed-in ``LifecycleGenerator``. Skips products
without lifecycle attrs; emits no rows when lifecycle is disabled.
- ``stockout_risk`` — per-``(store, product)`` markdown ending the
day before each observed stockout, lasting ``markdown_duration_days``
days, clamped to the seeded range start. Overlapping windows are
deduped within each ``(store, product)`` series.
- ``trigger='age_days'`` is deferred — raises ``NotImplementedError``
pointing at issue #94 (follow-up). The default trigger remains
``lifecycle_decline`` so scenarios that just flip the enable bit
still produce meaningful output.
- Even the enabled path is fully deterministic (no rng draws). The
``rng`` constructor parameter is kept for API consistency with peer
Phase 2 generators in case future variants need randomness.
- 21 new unit tests cover the regression invariant + lifecycle_decline
correctness (chain-wide, skipping missing lifecycle, clamp-to-range,
no decline = no output) + stockout_risk correctness (per-store,
end-day-before-stockout, overlap dedupe, clamp-to-start, unknown
product, dict-order independence) + age_days NotImplementedError +
config validation (depth bounds, duration bounds).
Remaining chunk B work:
- ReplenishmentGenerator (lead-time-driven replenishment_event rows)
- SalesDailyGenerator channel + lifecycle multiplier integration
* feat(data): seeder phase 2 chunk B (4/5) — replenishment generator (#92)
Fourth slice of Phase 2 generators. Same regression invariant: with
``LeadTimeConfig.enable=False`` (default) the generator returns ``[]``
and consumes zero rng state.
- New ``ReplenishmentGenerator`` (pure compute, no DB) emits
``replenishment_event`` dicts. Per ``(store, product)`` it places
a PO every ``order_frequency_days`` starting at ``dates[0]``. Each
PO consumes two locked rng draws:
``gauss(mean_lead_time_days, lead_time_sigma_days)`` clamped to
``>= 0`` → ``gauss(fill_rate_mean, fill_rate_sigma)`` clamped to
``[0, 1]``. ``ordered_qty = base_demand * (order_frequency_days +
safety_stock_days)``; ``received_qty = round(ordered_qty *
fill_rate)`` defensively clamped to ``[0, ordered_qty]``.
- Receipts whose ``date_received = date_placed + lead_time_days``
fall past ``dates[-1]`` are dropped to keep the FK to ``calendar``
valid.
- Sorted iteration over ``(store_id, product_id)`` makes the rng
stream stable regardless of input ordering.
- 21 new unit tests cover the regression invariant + record shape +
ordered_qty formula + dates-within-range + reproducibility +
input-order independence + extreme fill rates (zero/full) + zero
lead time + output sort order + 7 config-validation cases.
Downstream coupling: a follow-up commit will adjust
``InventorySnapshotGenerator`` to consume these events so realistic
stockout windows emerge between scheduled receipts. This slice only
emits the rows.
Remaining chunk B work:
- SalesDailyGenerator channel split + lifecycle multiplier integration
* feat(data): seeder phase 2 chunk B (5a/6) — lifecycle multiplier into sales (#92)
First half of B 5/5 (split per Open Q3 — channel integration deferred
until semantics are confirmed). Wires the LifecycleGenerator multiplier
into ``SalesDailyGenerator`` while preserving the byte-identical
regression invariant.
- ``SalesDailyGenerator.__init__`` gains optional
``lifecycle: LifecycleGenerator | None = None``. Defaults preserve
pre-Phase-2 behavior for every existing caller.
- ``SalesDailyGenerator.generate`` gains optional
``product_lifecycle_data: dict[int, tuple[date | None, date | None]]
| None = None``. Missing or unspecified entries fall back to
``(None, None)`` so the multiplier evaluates to 1.0.
- ``_compute_demand`` gains ``product_discontinue_date`` and applies
the lifecycle multiplier guarded by ``self.lifecycle is not None
and self.lifecycle.enabled``. The pre-Phase-2 ``new_product_ramp_days``
linear ramp is suppressed when lifecycle is enabled, preventing
double-attenuation at launch.
- 10 new tests cover the regression invariant (no kwargs / explicit
None / disabled config / no rng consumption when disabled), enabled
correctness (pre-launch zero, post-discontinue zero, intro < maturity,
decline < maturity), legacy-ramp suppression (no double-apply
when lifecycle on; still fires when lifecycle is None), and the
lookup fallback (missing product_id evaluates to 1.0).
The B 5b/6 channel integration is held until Open Q3 resolves
between (b) dominant per row, (c) random per row from channel_mix
weights, or (d) aggregated with primary channel column.
Remaining Phase 2 work:
- B 5b/6 — SalesDailyGenerator channel split (pending Q3)
- Chunk C — DataSeeder orchestration + endpoints + integration tests
* feat(data): seeder phase 2 chunk B (5b/6) — channel split into sales (#92)
Second half of B 5/5. Resolves Open Q3 with semantic (c): each emitted
``sales_daily`` row gets its ``channel`` drawn from ``channel_mix`` via
``rng.choices``, preserving the existing ``(date, store, product)``
grain.
- ``SalesDailyGenerator.__init__`` gains optional
``channels: ChannelConfig | None = None``. Disabled / unset path
consumes zero new rng draws and emits rows without a ``channel`` key
(DB ``server_default='in_store'`` applies), preserving the
byte-identical regression invariant.
- ``generate()`` runs ``_validate_channels()`` once at entry. Rejects
channels outside the SQL allow-list, negative weights, all-zero mix,
negative ``online_promo_uplift``, or ``online_substitution_to_instore``
outside ``[0, 1]``.
- Per emitted row (after stockout-skip): ``_maybe_apply_channel``
builds the effective mix (``online_substitution_to_instore`` shifts
weight from in_store → online during promos), draws a channel via
``rng.choices``, and applies ``online_promo_uplift`` to online rows
on promo dates. One rng draw per emitted row.
- 19 new tests cover regression invariant (no kwarg, disabled config,
no rng consumption) + channel distribution (subset of mix keys,
single-channel deterministic, dominant most common, zero-weight
never chosen) + online promo uplift (fires for online + promo,
not for in_store) + substitution shift (more online during promo,
zero substitution = no shift) + 6 validation cases + row shape
(channel key present/absent).
Phase 2 chunk B complete (5/6 paired slices + 1/6 follow-up #94).
Next: Chunk C — DataSeeder orchestration + new endpoints + integration
tests + docs.
* feat(data,api): seeder phase 2 chunk c1 — orchestration + endpoints (#92)
extend GenerateParams with 5 enable flags + channel_mix / lifecycle /
bundle / markdown / lead-time fields; channel_mix validator enforces the
SQL allow-list and at least one positive weight. Service layer translates
the new params into ChannelConfig / LifecycleConfig / BundleConfig /
MarkdownConfig / LeadTimeConfig overrides.
DataSeeder.generate_full now wires LifecycleGenerator + BundleGenerator
+ MarkdownGenerator + ReplenishmentGenerator + ChannelConfig. Product
lifecycle dates are fetched alongside base_price in a single query and
threaded into SalesDailyGenerator. A new _normalize_promotion_records
helper enforces a uniform key set across the mixed pct_off / bundle /
bogo / markdown promo records so the bulk pg_insert builds a valid
multi-row VALUES clause. delete_data drops replenishment_event first
(leaf table). verify_data_integrity gains 3 Phase 2 invariants: bundle
member-ID consistency, lifecycle date ordering, replenishment fill
rate. append_data mirrors the new return signature and fetches
lifecycle dates from existing products.
new endpoints: GET /seeder/channels returns the SQL allow-list; GET
/dimensions/products/{id}/lifecycle-curve returns the reference
demand-multiplier curve via LifecycleGenerator.multiplier_for, using
default LifecycleConfig ramp parameters and the product's own
launch_date / discontinue_date. SeederStatus + SeederResult both grow
a replenishment_events count.
disabled-path regression invariant preserved: every Phase 2 flag
defaults off and consumes zero rng when off.
* feat(data,docs): seeder phase 2 chunk c2 — integration tests + docs (#92)
test_phase2_integration.py covers the disabled-path regression
(no Phase 2 rows when toggles are off), per-feature enabled tests
(lifecycle populates dates, bundles convert promotions with
bundle_member_product_ids non-NULL, markdowns can emit rows when
lifecycle is also on, replenishment respects received_qty <=
ordered_qty, multichannel writes distinct channels), full-on
verify_data_integrity returning an empty error list, and delete
ordering that wipes replenishment_event without FK violations.
Tests are marked @pytest.mark.integration so they only run against
the real docker-compose Postgres.
docs/DATA-SEEDER.md adds a Phase 2 retail-depth section documenting
all five toggles with example JSON payloads, the two new endpoints
(GET /seeder/channels, GET /dimensions/products/{id}/lifecycle-curve),
and three new Data Integrity checks.
* feat(release): trigger v0.2.8 release for seeder phases 1+2 (#98)
1 parent 7f97c5e commit d4e7201
0 file changed
0 commit comments