Commit d4e7201

authored

feat(release): trigger v0.2.8 release for seeder phases 1+2 (#98) (#99)

* feat(api,ui): expose seeder date range and scale controls (#82) (#83) Surface the existing GenerateParams knobs in the admin Data Seeder panel (scenario, date range, store/product counts, seed, sparsity) so operators no longer have to drop to the CLI to seed a different year. Form state persists in localStorage and a reset-to-defaults button is provided. Also fixes a latent service-layer bug: when overriding stores/products on a scenario preset, _build_config_from_params replaced the whole DimensionConfig and silently dropped scenario-customized store_regions, store_types, product_categories, and product_brands. Now uses dataclasses.replace so only the count fields change. Adds two regression tests covering holiday_rush + custom store/product counts. * feat(docs,repo): land claude.md and docs/_base reference suite (#86) (#87) Closes #86. Generated via the /w7_generating-claudemd skill in HEURISTIC_MODE (docs/_kB/repo-map/ KB not yet present). Adds: - CLAUDE.md (116 lines, 812 words; references .claude/rules/* and docs/_base/* via @imports — within the 150-line / 1800-word skill budget) - docs/_base/ARCHITECTURE.md (system boundaries, components, comm patterns, deploy chain) - docs/_base/API_CONTRACTS.md (HTTP surface across 12 slices + WebSocket + external integrations) - docs/_base/RUNBOOKS.md (common incidents, release/rollback, WSL/pnpm traps from prior session HANDOFF) - docs/_base/SECURITY.md (threat model, hard rules from security-patterns.md, scanning matrix) - docs/_base/RULES.md (Change Authority Matrix + invariants + forbidden patterns, consolidated from .claude/rules/*) - docs/_base/DOMAIN_MODEL.md (bounded contexts, aggregates, invariants, ubiquitous language) - docs/_base/DEV_GUIDE.md (human-maintained stub — {FILL IN} markers for a maintainer to complete) - docs/_base/REPO_MAP_INDEX.md (index across README, PHASE docs, ADRs, PRPs, .claude/, docs/_base/) - docs/_base/PIPELINE_CONTRACT.md (CI/CD stages, merge gates, release flow) .gitignore adjustments: - Remove `CLAUDE.md` (was blocking the doc from being shared and from being read by Claude in fresh clones) - Add `CLAUDE.local.md` (personal-prefs file — local-only by design) - Stale `.claude` duplicates on lines 2 and 5 left for a separate cleanup PR (deduping won't change behavior since `.claude/` is already tracked) Re-run the skill after a future mapping-repo-context run to drop the remaining 5 [UNVERIFIED] meta-flags. * feat(data,api): seeder Phase 1 — exogenous signals, multi-seasonality, changepoints, returns, substitution (#88) * fix(ci): pin third-party github actions by sha (#84) Closes #84. Per .claude/rules/security-patterns.md: "Pin third-party GitHub Actions by full 40-char SHA"; first-party actions/* may use major-version. Pinned (third-party): - googleapis/release-please-action@v5 → @45996ed1f6d02564a971a2fa1b5860e934307cf7 # v5.0.0 - astral-sh/setup-uv@v7 (×8 across all five workflows) → @37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0 - github/codeql-action/upload-sarif@v4 → @c6f931105cb2c34c8f901cc885ba1e2e259cf745 # v4.34.0 Left as major-tag (first-party actions/* — rule-permitted): - actions/checkout@v6 - actions/upload-artifact@v7 Dependabot watches .github/workflows/ weekly and will bump these forward. * chore(repo): gitignore local session artifacts (#90) * fix(ci): pin uv run with --frozen to stop transient resolution failures (#95) (#96) every uv run invocation in ci.yml, schema-validation.yml, and phase-snapshot.yml now uses --frozen. without it, uv re-resolves the dependency graph at command time and crashes when a freshly published pydantic-ai-slim version's [mistral] extra requires a mistralai version that does not yet exist on PyPI — observed on PR #93's most recent push where all five blocking CI jobs went red 75 minutes after a green run on the same branch with the same lockfile. dependency-check.yml's pip-audit calls deliberately retain the re-resolve behavior; that workflow's purpose is to pick up newly published vulnerabilities. uv sync --frozen --all-extras --dev was already in place to install the lock; this patch propagates the same intent to every subsequent uv run. * feat(data,db): seeder phase 2 — retail-depth foundation + lifecycle generator (#92) (#93) * feat(data,db): seeder phase 2 chunk A — retail-depth schema + configs (#92) Lays the foundation for Phase 2 retail depth without changing any generator behaviour: - Alembic migration a8b9c0d1e234 adds sales_daily.channel (NOT NULL, server default 'in_store'), product lifecycle fields (lifecycle_stage, launch_date, discontinue_date, pack_size, subcategory), promotion kind discriminator with JSONB bundle_member_product_ids, and a new replenishment_event table. All additive; retail_standard rows are unchanged. - ORM mirrors the schema, including a load-bearing JSONB(none_as_null=True) so the bundle-members CHECK fires. - Five new config dataclasses (ChannelConfig, LifecycleConfig, BundleConfig, MarkdownConfig, LeadTimeConfig) wired to SeederConfig with disabled defaults so all existing scenarios produce byte-identical row counts. - 25 integration tests cover the new CHECK + nullability constraints; 8 unit tests guard the config defaults + regression invariant across every ScenarioPreset. * feat(data): seeder phase 2 chunk B (1/5) — product lifecycle generator (#92) First slice of Phase 2 generators. Strict regression invariant: with ``LifecycleConfig.enable=False`` (default) ProductGenerator's output and rng-draw sequence are byte-identical to pre-Phase-2. - ProductGenerator gains optional ``lifecycle_config`` + ``date_range`` parameters. When enabled, each product row carries ``subcategory``, ``pack_size``, ``launch_date``, ``discontinue_date``, ``lifecycle_stage``. - New ``LifecycleGenerator`` (pure compute, no DB) computes per-(product, date) demand multipliers across intro/growth/maturity/decline/ discontinued segments. Disabled path returns 1.0 without touching rng. - 14 new unit tests cover the regression invariant + each ramp segment + discontinue override + reproducibility under enabled mode. Remaining chunk B work (next commits on this branch): - BundleGenerator (BOGO + bundle promotions) - MarkdownGenerator (clearance pricing) - ReplenishmentGenerator (lead-time-driven replenishment_event rows) - SalesDailyGenerator channel split + lifecycle multiplier integration * feat(data): seeder phase 2 chunk B (2/5) — bundle/BOGO generator (#92) Second slice of Phase 2 generators. Same regression invariant as B 1/5: with ``BundleConfig.enable=False`` (default) ``BundleGenerator.apply`` leaves both the promotion list and the rng state byte-identical. - New ``BundleGenerator`` (pure compute, no DB) wraps ``PromotionGenerator``'s output: per-promo ``bundle_probability`` chance to convert to ``kind='bundle'`` or ``kind='bogo'`` (split by ``bogo_share_within_bundles``), drawing 2–``max_bundle_size`` member product IDs (host excluded) and a discount in ``[bundle_discount_pct_min, bundle_discount_pct_max]`` quantized to ``Numeric(5, 4)``. ``discount_amount`` is cleared on converted rows to keep the row internally consistent with the new ``discount_pct``. - Locked rng order per converted promo: ``random()`` (convert?) → ``random()`` (bogo?) → ``randint()`` (n_members) → ``sample()`` (members) → ``uniform()`` (discount). Per-host pool-too-small skip happens before any rng draw so the stream stays stable across runs where only the product pool shrinks. - 18 new unit tests cover the regression invariant (no mutation, no rng consumption) + kind allow-list + member-pool sourcing + count + discount range + BOGO/bundle split at extremes + reproducibility + best-effort skip for small pools + config validation. Remaining chunk B work: - MarkdownGenerator (clearance pricing — needs Open Q on inventory age coupling resolved before starting) - ReplenishmentGenerator (lead-time-driven replenishment_event rows) - SalesDailyGenerator channel split + lifecycle multiplier integration * feat(data): seeder phase 2 chunk B (3/5) — markdown generator (#92) Third slice of Phase 2 generators. Same regression invariant as B 1/5 and B 2/5: with ``MarkdownConfig.enable=False`` (default) the generator emits empty containers and consumes zero rng state. - New ``MarkdownGenerator`` (pure compute, no DB) emits ``Promotion(kind='markdown')`` rows + companion ``PriceHistory`` drop rows + a ``markdown_dates`` lookup keyed by ``(store_id, product_id)`` for the future ``SalesDailyGenerator`` lift integration in chunk B 5/5. - Two triggers ship in this slice: - ``lifecycle_decline`` — chain-wide markdown (``store_id=None``) starting on the first date a product enters the ``decline`` stage according to a passed-in ``LifecycleGenerator``. Skips products without lifecycle attrs; emits no rows when lifecycle is disabled. - ``stockout_risk`` — per-``(store, product)`` markdown ending the day before each observed stockout, lasting ``markdown_duration_days`` days, clamped to the seeded range start. Overlapping windows are deduped within each ``(store, product)`` series. - ``trigger='age_days'`` is deferred — raises ``NotImplementedError`` pointing at issue #94 (follow-up). The default trigger remains ``lifecycle_decline`` so scenarios that just flip the enable bit still produce meaningful output. - Even the enabled path is fully deterministic (no rng draws). The ``rng`` constructor parameter is kept for API consistency with peer Phase 2 generators in case future variants need randomness. - 21 new unit tests cover the regression invariant + lifecycle_decline correctness (chain-wide, skipping missing lifecycle, clamp-to-range, no decline = no output) + stockout_risk correctness (per-store, end-day-before-stockout, overlap dedupe, clamp-to-start, unknown product, dict-order independence) + age_days NotImplementedError + config validation (depth bounds, duration bounds). Remaining chunk B work: - ReplenishmentGenerator (lead-time-driven replenishment_event rows) - SalesDailyGenerator channel + lifecycle multiplier integration * feat(data): seeder phase 2 chunk B (4/5) — replenishment generator (#92) Fourth slice of Phase 2 generators. Same regression invariant: with ``LeadTimeConfig.enable=False`` (default) the generator returns ``[]`` and consumes zero rng state. - New ``ReplenishmentGenerator`` (pure compute, no DB) emits ``replenishment_event`` dicts. Per ``(store, product)`` it places a PO every ``order_frequency_days`` starting at ``dates[0]``. Each PO consumes two locked rng draws: ``gauss(mean_lead_time_days, lead_time_sigma_days)`` clamped to ``>= 0`` → ``gauss(fill_rate_mean, fill_rate_sigma)`` clamped to ``[0, 1]``. ``ordered_qty = base_demand * (order_frequency_days + safety_stock_days)``; ``received_qty = round(ordered_qty * fill_rate)`` defensively clamped to ``[0, ordered_qty]``. - Receipts whose ``date_received = date_placed + lead_time_days`` fall past ``dates[-1]`` are dropped to keep the FK to ``calendar`` valid. - Sorted iteration over ``(store_id, product_id)`` makes the rng stream stable regardless of input ordering. - 21 new unit tests cover the regression invariant + record shape + ordered_qty formula + dates-within-range + reproducibility + input-order independence + extreme fill rates (zero/full) + zero lead time + output sort order + 7 config-validation cases. Downstream coupling: a follow-up commit will adjust ``InventorySnapshotGenerator`` to consume these events so realistic stockout windows emerge between scheduled receipts. This slice only emits the rows. Remaining chunk B work: - SalesDailyGenerator channel split + lifecycle multiplier integration * feat(data): seeder phase 2 chunk B (5a/6) — lifecycle multiplier into sales (#92) First half of B 5/5 (split per Open Q3 — channel integration deferred until semantics are confirmed). Wires the LifecycleGenerator multiplier into ``SalesDailyGenerator`` while preserving the byte-identical regression invariant. - ``SalesDailyGenerator.__init__`` gains optional ``lifecycle: LifecycleGenerator | None = None``. Defaults preserve pre-Phase-2 behavior for every existing caller. - ``SalesDailyGenerator.generate`` gains optional ``product_lifecycle_data: dict[int, tuple[date | None, date | None]] | None = None``. Missing or unspecified entries fall back to ``(None, None)`` so the multiplier evaluates to 1.0. - ``_compute_demand`` gains ``product_discontinue_date`` and applies the lifecycle multiplier guarded by ``self.lifecycle is not None and self.lifecycle.enabled``. The pre-Phase-2 ``new_product_ramp_days`` linear ramp is suppressed when lifecycle is enabled, preventing double-attenuation at launch. - 10 new tests cover the regression invariant (no kwargs / explicit None / disabled config / no rng consumption when disabled), enabled correctness (pre-launch zero, post-discontinue zero, intro < maturity, decline < maturity), legacy-ramp suppression (no double-apply when lifecycle on; still fires when lifecycle is None), and the lookup fallback (missing product_id evaluates to 1.0). The B 5b/6 channel integration is held until Open Q3 resolves between (b) dominant per row, (c) random per row from channel_mix weights, or (d) aggregated with primary channel column. Remaining Phase 2 work: - B 5b/6 — SalesDailyGenerator channel split (pending Q3) - Chunk C — DataSeeder orchestration + endpoints + integration tests * feat(data): seeder phase 2 chunk B (5b/6) — channel split into sales (#92) Second half of B 5/5. Resolves Open Q3 with semantic (c): each emitted ``sales_daily`` row gets its ``channel`` drawn from ``channel_mix`` via ``rng.choices``, preserving the existing ``(date, store, product)`` grain. - ``SalesDailyGenerator.__init__`` gains optional ``channels: ChannelConfig | None = None``. Disabled / unset path consumes zero new rng draws and emits rows without a ``channel`` key (DB ``server_default='in_store'`` applies), preserving the byte-identical regression invariant. - ``generate()`` runs ``_validate_channels()`` once at entry. Rejects channels outside the SQL allow-list, negative weights, all-zero mix, negative ``online_promo_uplift``, or ``online_substitution_to_instore`` outside ``[0, 1]``. - Per emitted row (after stockout-skip): ``_maybe_apply_channel`` builds the effective mix (``online_substitution_to_instore`` shifts weight from in_store → online during promos), draws a channel via ``rng.choices``, and applies ``online_promo_uplift`` to online rows on promo dates. One rng draw per emitted row. - 19 new tests cover regression invariant (no kwarg, disabled config, no rng consumption) + channel distribution (subset of mix keys, single-channel deterministic, dominant most common, zero-weight never chosen) + online promo uplift (fires for online + promo, not for in_store) + substitution shift (more online during promo, zero substitution = no shift) + 6 validation cases + row shape (channel key present/absent). Phase 2 chunk B complete (5/6 paired slices + 1/6 follow-up #94). Next: Chunk C — DataSeeder orchestration + new endpoints + integration tests + docs. * feat(data,api): seeder phase 2 chunk c1 — orchestration + endpoints (#92) extend GenerateParams with 5 enable flags + channel_mix / lifecycle / bundle / markdown / lead-time fields; channel_mix validator enforces the SQL allow-list and at least one positive weight. Service layer translates the new params into ChannelConfig / LifecycleConfig / BundleConfig / MarkdownConfig / LeadTimeConfig overrides. DataSeeder.generate_full now wires LifecycleGenerator + BundleGenerator + MarkdownGenerator + ReplenishmentGenerator + ChannelConfig. Product lifecycle dates are fetched alongside base_price in a single query and threaded into SalesDailyGenerator. A new _normalize_promotion_records helper enforces a uniform key set across the mixed pct_off / bundle / bogo / markdown promo records so the bulk pg_insert builds a valid multi-row VALUES clause. delete_data drops replenishment_event first (leaf table). verify_data_integrity gains 3 Phase 2 invariants: bundle member-ID consistency, lifecycle date ordering, replenishment fill rate. append_data mirrors the new return signature and fetches lifecycle dates from existing products. new endpoints: GET /seeder/channels returns the SQL allow-list; GET /dimensions/products/{id}/lifecycle-curve returns the reference demand-multiplier curve via LifecycleGenerator.multiplier_for, using default LifecycleConfig ramp parameters and the product's own launch_date / discontinue_date. SeederStatus + SeederResult both grow a replenishment_events count. disabled-path regression invariant preserved: every Phase 2 flag defaults off and consumes zero rng when off. * feat(data,docs): seeder phase 2 chunk c2 — integration tests + docs (#92) test_phase2_integration.py covers the disabled-path regression (no Phase 2 rows when toggles are off), per-feature enabled tests (lifecycle populates dates, bundles convert promotions with bundle_member_product_ids non-NULL, markdowns can emit rows when lifecycle is also on, replenishment respects received_qty <= ordered_qty, multichannel writes distinct channels), full-on verify_data_integrity returning an empty error list, and delete ordering that wipes replenishment_event without FK violations. Tests are marked @pytest.mark.integration so they only run against the real docker-compose Postgres. docs/DATA-SEEDER.md adds a Phase 2 retail-depth section documenting all five toggles with example JSON payloads, the two new endpoints (GET /seeder/channels, GET /dimensions/products/{id}/lifecycle-curve), and three new Data Integrity checks. * feat(release): trigger v0.2.8 release for seeder phases 1+2 (#98)

1 parent 7f97c5e commit d4e7201Copy full SHA for d4e7201

0 file changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit d4e7201

File tree

0 commit comments