From 5324d7f3d9c5fd053a2ddf3a0be99ce1a9b5faa6 Mon Sep 17 00:00:00 2001 From: cafzal Date: Tue, 5 May 2026 15:39:00 -0700 Subject: [PATCH 01/22] Add reproducibility runbooks to four multi-reasoner templates Each runbook is an agent prompt sequence to recreate the template's multi-reasoner pipeline using the bundled CSVs in ../data/, mapping each stage to the template's actual concepts, properties, and outputs: - telco_network_recovery (5-stage: descriptive -> rules -> graph -> predictive -> prescriptive; mirrors PR #56's existing structure) - energy_grid_planning (4-stage: predictive -> graph -> rules -> prescriptive with InvestmentLevel scenarios) - supply_chain_resilience (4-stage: blast-radius -> graph -> rules -> min-cost flow + scenarios) - machine_maintenance (5-stage: querying -> graph -> rules -> prescriptive maintenance schedule -> resilience cross-training) Reproducible against the bundled template CSVs; one-line notes on swapping to a Snowflake schema for users wiring to their own data. --- v1/energy_grid_planning/references/runbook.md | 440 +++++++++++++++++ v1/machine_maintenance/references/runbook.md | 215 +++++++++ .../references/runbook.md | 359 ++++++++++++++ .../references/runbook.md | 449 ++++++++++++++++++ 4 files changed, 1463 insertions(+) create mode 100644 v1/energy_grid_planning/references/runbook.md create mode 100644 v1/machine_maintenance/references/runbook.md create mode 100644 v1/supply_chain_resilience/references/runbook.md create mode 100644 v1/telco_network_recovery/references/runbook.md diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md new file mode 100644 index 0000000..4dc729a --- /dev/null +++ b/v1/energy_grid_planning/references/runbook.md @@ -0,0 +1,440 @@ +# Runbook: Energy Grid Planning — Multi-Reasoner Walkthrough + +Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **ERCOT processes 10 hyperscaler interconnection requests against the 12-substation Texas grid** — traced across four RAI reasoner families, each stage writing properties back to the same ontology that downstream stages consume. + +The template's combined script (`energy_grid_planning.py`) implements stages 1–4 directly; this runbook walks through how an agent would derive the same pipeline prompt-by-prompt, skill-by-skill, so a non-OR reader can follow the full reasoning thread end-to-end. + +--- + +## TL;DR — the chain in one screen + +``` +ERCOT has 10 hyperscaler interconnection requests totalling 2,930 MW +on a 12-substation grid with one structurally constrained bottleneck (DFW). +The chain produces a Pareto frontier across 5 budget levels — the knee at +$300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. + + ───────────────────────────────────────────────────────────────── + STAGE 1 Predictive ──► Substation.predicted_load (12) + DFW: 1,100 → 1,700 MW (+54.6%) ── breaches + 1,600 MW capacity at 24mo. The only + substation predicted to breach. + ───────────────────────────────────────────────────────────────── + STAGE 2 Graph ──► Substation.betweenness (12) + (WCC/ Substation.grid_community (3 regions) + Louvain/ Substation.is_structurally_critical (3) + centrality) DFW, Houston, San Antonio dominate. 7 of 10 + DC requests target critical substations. + ───────────────────────────────────────────────────────────────── + STAGE 3 Rules ──► DataCenterRequest.is_compliant (2) + fails_capacity / fails_structural / + fails_low_carbon flags written back. + Only Crusoe (Midland) and Oracle + (Corpus Christi) pass all three. + ───────────────────────────────────────────────────────────────── + STAGE 4 Prescriptive ──► DataCenterRequest.x_approve (per InvestmentLevel) + SubstationUpgrade.x_upgrade (per InvestmentLevel) + OPTIMAL across 5 budget levels in one solve. + Knee $300M · 5 DCs · 1,500 MW · $264M net. + Google + Lambda never approved — DFW full. + ───────────────────────────────────────────────────────────────── +``` + +A single-reasoner approach can't answer this. Predictive alone forecasts demand but doesn't decide. Graph alone finds bottlenecks but doesn't weigh revenue. Rules alone flag noncompliance but can't optimize. Prescriptive alone has no way to weigh "critical" without graph + predictive upstream. + +--- + +## Prerequisites + +The template ships everything needed to run the chain end-to-end: + +- Bundled CSVs in `../data/` (12 substations, 15 generators, 18 transmission lines, 10 DC requests, 10 substation upgrades, and supporting load/forecast tables). To run against your own Snowflake schema, swap CSV loaders for `model.Table(...)` references. +- `raiconfig.yaml` pointed at your RAI-enabled Snowflake account. +- The combined script `../energy_grid_planning.py` runs the full chain. + +The narrative below follows the actual stage outputs of `energy_grid_planning.py`. Steps 1–3 below are an agent-style walk-through of how the ontology would be built from scratch via skills like `/rai-build-starter-ontology` and `/rai-discovery`; the template ships a pre-built ontology in `energy_grid_planning.py` for users who want to skip ahead to the reasoner stages. + +--- + +## Walk-through (agent-style) + +Steps are sequential — each depends on prior steps. Steps without a skill are presentation-only. + +| # | Step | Skill | Prompt | Expected Output | +|---|------|-------|--------|-----------------| +| 1 | Ontology | `/rai-build-starter-ontology` | "Build an ontology for ERCOT grid infrastructure planning from the bundled CSVs (or your Snowflake schema)." | 13 concepts: `Substation`, `Generator`, `TransmissionLine`, `LoadZone`, `DemandPeriod`, `RenewableProfile`, `MaintenanceWindow`, `Customer`, `DataCenterRequest`, `SubstationUpgrade`, `DemandForecast`, `LoadHistory`, `DCAnnouncement`. 12 substations, 15 generators, 18 transmission lines, 10 DC requests (2,930 MW). | +| 2 | Visualize | — | "Show the ontology as an ASCII diagram." | Concept map with `Substation` as central hub — `Generator`, `TransmissionLine`, `DataCenterRequest`, `SubstationUpgrade`, `DemandForecast` all relate to it. | +| 3 | Discovery | `/rai-discovery` | "What questions can we answer with this ontology? We're evaluating data center interconnection requests." | 4 reasoning paths: substation load forecast (predictive), grid topology + structural criticality (graph), interconnection compliance (rules), joint approval + upgrade investment (prescriptive). | +| 4 | Explore: generation mix | `/rai-querying` | "What's our current generation mix by fuel type? How much renewable capacity vs fossil?" | 15 generators, 8,135 MW total. Nuclear leads (2,560 MW, 31.5%), then gas (2,290 MW, 28.1%), wind (1,250 MW, 15.4%), coal (1,020 MW, 12.5%), solar (630 MW, 7.7%), battery (300 MW), hydro (85 MW). Renewable: 2,265 MW (28%). Requests with 100% low-carbon mandates (Google, Crusoe) face a structural constraint. | +| 5 | Explore: capacity headroom | `/rai-querying` | "Which substations have the most and least spare capacity right now, before any new DC load?" | Tightest: Houston Ship Channel (69.4% utilized, 550 MW headroom), Austin Energy (68.9%, 280 MW), DFW (68.8%, 500 MW). Most spare: Midland-Permian (38.2%, 680 MW headroom), Lubbock (44.3%, 390 MW). DFW has only 500 MW headroom but 1,100 MW of DC requests stacked on it. | +| 6 | Explore: DC request landscape | `/rai-querying` | "Summarize the 10 DC requests — total MW per substation, revenue per MW, low-carbon requirements." | 2,930 MW total, $528M/yr revenue across 6 substations. DFW most stacked (1,100 MW, 3 requests: Google $195K/MW, xAI $210K/MW, Lambda $150K/MW). xAI is highest revenue ($210K/MW/yr, $105M/yr total). Google and Crusoe require 100% low-carbon. Top 3 substations (DFW, Houston, San Antonio) account for 78% of requested MW. | +| 7 | Stage 1 — Predict | `/rai-querying` (or GNN) | "Forecast which substations are losing headroom fastest and which will breach capacity." | `Substation.predicted_load` written for all 12. DFW: 1,700 MW predicted vs 1,600 MW capacity at 24 months (+54.6% growth). Houston Ship Channel: 1,797 MW (+43.8%, within capacity). The only substation predicted to breach. | +| 8 | Stage 2 — Graph | `/rai-graph-analysis` | "Build a graph on the transmission grid. Find structurally critical substations using betweenness, degree, and eigenvector centrality. Identify connected components and Louvain communities." | 1 connected component, 3 communities (North Texas, West Texas, Gulf Coast). Top 3 by combined centrality rank: DFW, Houston, San Antonio — all flagged `is_structurally_critical`. 7 of 10 DC requests target critical substations. | +| 9 | Stage 3 — Rules | `/rai-rules-authoring` | "Check each request against capacity (using `predicted_load`), low-carbon mandate, and structural risk (using `is_structurally_critical`)." | 3 declarative `Relationship` rules (`fails_capacity`, `fails_structural`, `fails_low_carbon`) + composite `is_compliant`. All 10 pass low-carbon. 2 compliant: Crusoe (Midland) and Oracle (Corpus Christi). 8 flagged on capacity + structural risk. | +| 10 | Stage 4 — Optimize | `/rai-prescriptive-problem-formulation` | "Which DCs to approve and which upgrades to fund across 5 budget levels ($200M-$600M)? Use `predicted_load` for capacity. Show the Pareto frontier — DCs, MW, revenue, net value at each level." | Pareto frontier across `InvestmentLevel` Scenario Concept. Knee at $300M (5 DCs, 1,500 MW, $264M net value). xAI Colossus unlocks at $300M. Google and Lambda never approved (DFW full). | +| 11 | Results | `/rai-prescriptive-results-interpretation` | "How do approvals and upgrades vary by investment level? Where's the knee? What's marginal return per $M?" | Per-level DC list + selected upgrades, queried via `model.select(...).where(x_approve > 0.5)`. Marginal: $200→$300M = $995K/$M (knee); declines to $400K/$M by $600M. | + +--- + +## Stage 1 — Predictive: substation load forecasting + +**Method:** load max forecasted load per substation as `Substation.predicted_load`. The template aggregates `DemandForecast.predicted_load_mw` over forecast horizons (6/12/18/24 months) and writes the max back to the substation. A pre-trained GNN can replace the table lookup; the script falls back gracefully when the GNN model registry is unavailable. + +``` +Substation load forecast (max across 6/12/18/24-month horizons) + + Houston Ship Channel ──── pred 1,797 MW ████████████ +43.8% safe + Dallas-Fort Worth ──── pred 1,700 MW ██████████ +54.6% ▲ breach 24mo + San Antonio Metro ──── pred 1,069 MW ███████ +37.1% safe + Austin Energy ──── pred 819 MW █████ +32.1% safe + Waco Gateway ──── pred 600 MW ████ +22.4% safe + Corpus Christi Coast ──── pred 600 MW ████ +11.1% safe + Midland-Permian ──── pred 520 MW ███ +23.8% safe + El Paso Border ──── pred 470 MW ███ +14.6% safe + Abilene Central ──── pred 400 MW ███ +14.3% safe + Lubbock West Texas ──── pred 360 MW ██ +16.1% safe + Brownsville Valley ──── pred 355 MW ██ +10.9% safe + Amarillo Panhandle ──── pred 315 MW ██ +12.5% safe + ▲ + ──────────────────────────────────────────│────────────────────── + DFW is the only substation predicted to │ + breach capacity (1,700 vs 1,600 MW cap). │ + Stage 3 rules and Stage 4 capacity │ + constraint both consume this property. │ + ────────────────────────────────────────────────────────────────── + +✓ Substation.predicted_load written back to all 12 substations +``` + +**Stage 3 rule and Stage 4 constraint with the predictive term:** + +```python +# Both reuse the same effective_load expression +effective_load = Substation.predicted_load | Substation.current_load_mw +``` + +**Caveats:** +- The bundled forecasts are pre-computed in `data/demand_forecasts.csv` to keep the template self-contained. To wire in a live GNN, point `GNN(...)` at your model registry — the template's `try` block already handles both paths. +- DFW's projected breach partially encodes the stacked DC request itself ("we expect demand to keep rising at the substation everyone is targeting"). For an independent baseline, train on a pre-announcement slice and compare. + +--- + +## Stage 2 — Graph: grid topology & structural vulnerability + +**Construction** — `Substation` as the node concept directly (no mirror concept): +- Node concept: `Substation` (12 nodes) +- Edges: active `TransmissionLine` rows, `from_substation` → `to_substation` +- Direction: undirected, unweighted + +**Algorithms:** weakly connected components, Louvain community detection, betweenness/degree/eigenvector centrality. Combined rank picks the top-3 as `is_structurally_critical`. + +``` +Connectivity ──► 1 component (12 of 12 reachable) + +Louvain communities ──► 3 regions + + Region 1 — North Texas Dallas-Fort Worth, Austin Energy, Waco Gateway + Region 2 — West Texas Midland-Permian, Lubbock, El Paso, Amarillo, Abilene + Region 3 — Gulf Coast Houston Ship Channel, San Antonio Metro, + Corpus Christi Coast, Brownsville Valley + +Centrality (top-3 marked is_structurally_critical) + + #1 Dallas-Fort Worth betw 31.67 ████████████ [CRITICAL] + #2 Houston Ship Channel betw 15.83 ██████ [CRITICAL] + #3 San Antonio Metro betw 4.33 ██ [CRITICAL] + #4 Austin Energy betw ~3.0 + … + #12 Brownsville Valley betw 0.00 + + ────────────────────────────────────────────────────────────────── + DC requests targeting structurally critical substations: + Microsoft Horizon Campus (350 MW) ─► Houston [CRITICAL] + Meta Bayou DC (300 MW) ─► Houston [CRITICAL] + Google Metroplex DC (400 MW) ─► DFW [CRITICAL] + xAI Colossus Texas (500 MW) ─► DFW [CRITICAL] + Lambda Labs DFW (200 MW) ─► DFW [CRITICAL] + Amazon SA Cloud (280 MW) ─► San Antonio [CRITICAL] + Apple iCloud Texas (250 MW) ─► San Antonio [CRITICAL] + ──────────── + 7 of 10 DC requests sit on the 3 most structurally critical nodes. + ────────────────────────────────────────────────────────────────── + +✓ Substation.betweenness / degree_centrality / eigenvector_centrality +✓ Substation.grid_community written back (12 rows) +✓ Substation.is_structurally_critical written back (3 rows) +``` + +--- + +## Stage 3 — Rules: interconnection queue compliance + +Three declarative `Relationship` rules consume Stages 1–2 enrichments. Each is written as a `model.where(...).define(...)` block; a composite `is_compliant` fires only when none of the three failure flags fire. + +**Rule 1 — Capacity** (consumes `Substation.predicted_load` from Stage 1): + +```python +DataCenterRequest.fails_capacity = model.Relationship(...) +effective_load = SubRef.predicted_load | SubRef.current_load_mw +model.where( + DataCenterRequest.substation(SubRef), + DataCenterRequest.requested_mw + effective_load > SubRef.max_capacity_mw, +).define(DataCenterRequest.fails_capacity()) +``` + +**Rule 2 — Structural risk** (consumes `Substation.is_structurally_critical` from Stage 2): + +```python +model.where( + DataCenterRequest.substation(SubRef), + SubRef.is_structurally_critical(), +).define(DataCenterRequest.fails_structural()) +``` + +**Rule 3 — Low-carbon mandate** (zero-emission share at the substation must meet the request's requirement; nuclear + renewable count): + +```python +model.where( + DataCenterRequest.substation(SubRef), + (SubRef.low_carbon_gen_mw | 0.0) * 100 + < DataCenterRequest.low_carbon_requirement_pct * (SubRef.total_gen_mw | 0.001), +).define(DataCenterRequest.fails_low_carbon()) +``` + +``` + DC Request Hyper Q# MW Cap LowC Crit OK? + ────────────────────────────────────────────────────────────────────── + Microsoft Horizon Campus Microsoft 1 350 FAIL PASS FAIL N + Meta Bayou DC Meta 2 300 FAIL PASS FAIL N + Google Metroplex DC Google 3 400 FAIL PASS FAIL N + xAI Colossus Texas xAI 4 500 FAIL PASS FAIL N + Lambda Labs DFW Lambda Labs 5 200 FAIL PASS FAIL N + Amazon SA Cloud Amazon 6 280 FAIL PASS FAIL N + Apple iCloud Texas Apple 7 250 FAIL PASS FAIL N + CoreWeave Austin GPU CoreWeave 8 320 FAIL PASS PASS N + Crusoe Permian DC Crusoe Energy 9 180 PASS PASS PASS Y + Oracle Coastal DC Oracle 10 150 PASS PASS PASS Y + ────────────────────────────────────────────────────────────────────── + Summary: 2 compliant, 8 flagged +``` + +Every request passes low-carbon — ERCOT's nuclear (STP, Comanche Peak) plus its wind/solar fleet provides enough zero-emission generation. The two compliant requests sit on substations that are neither structurally critical (Stage 2) nor predicted to breach (Stage 1). Eight requests need either upgrades, redirection, or both — Stage 4 picks the optimal mix. + +``` +✓ DataCenterRequest.fails_capacity written back +✓ DataCenterRequest.fails_structural written back +✓ DataCenterRequest.fails_low_carbon written back +✓ DataCenterRequest.is_compliant written back (2 rows) +``` + +--- + +## Stage 4 — Prescriptive: joint DC approval + upgrade MIP + +``` +FORMULATION + + Scenario Concept + InvestmentLevel(name, budget_cap) 5 levels: $200M, $300M, $400M, $500M, $600M + + Decision variables (binary, indexed by InvestmentLevel) + DataCenterRequest.x_approve(InvestmentLevel) 50 binaries (10 DCs × 5 levels) + SubstationUpgrade.x_upgrade(InvestmentLevel) 50 binaries (10 upgrades × 5 levels) + + Constraints + 1. Substation capacity per InvestmentLevel + max_capacity − predicted_load + Σ(x_upgrade · capacity_increase) + ≥ Σ(x_approve · requested_mw) per (Substation, InvestmentLevel) + 2. Budget per InvestmentLevel + Σ(x_upgrade · cost_million) ≤ budget_cap per InvestmentLevel + + Objective (maximize) + Σ x_approve · annual_revenue_per_mw · requested_mw summed over all levels + +────────────────────────────────────────────────────────────────────── +SOLVE (HiGHS, single solve, all 5 levels in one MIP) → OPTIMAL +────────────────────────────────────────────────────────────────────── +``` + +``` +PARETO FRONTIER (queried directly from ontology) + + Level Budget DCs DC MW Revenue $/yr Upg $M Net Value + ────── ──────── ──── ────── ───────────── ──────── ─────────── + $200M $200M 4 1,000 $174,350,000 $190.0 $164,850,000 + $300M $300M 5 1,500 $279,350,000 $300.0 $264,350,000 ◄ KNEE + $400M $400M 6 1,800 $328,850,000 $385.0 $309,600,000 + $500M $500M 7 2,080 $376,450,000 $430.0 $354,950,000 + $600M $600M 8 2,330 $420,200,000 $505.0 $394,950,000 + + Per-level DC approvals (cumulative as budget increases) + + $200M: Microsoft (350) · CoreWeave (320) · Crusoe (180) · Oracle (150) ── 1,000 MW + $300M: + xAI Colossus (500) ── 1,500 MW + $400M: + Meta Bayou (300) ── 1,800 MW + $500M: + Amazon SA (280) ── 2,080 MW + $600M: + Apple iCloud (250) ── 2,330 MW + + Never approved at any level: + Google Metroplex DC (400 MW, DFW) + Lambda Labs DFW (200 MW, DFW) + ────────────────────────────────── + DFW is physically full once xAI takes 500 MW. + + Marginal value per $M added budget + + $200 → $300M ── $995K / $M ◄ knee — xAI Colossus unlocks at $300M + $300 → $400M ── $453K / $M + $400 → $500M ── $454K / $M + $500 → $600M ── $400K / $M + +✓ DataCenterRequest.x_approve(InvestmentLevel) written back +✓ SubstationUpgrade.x_upgrade(InvestmentLevel) written back + Each variable is now a queryable property of the model — no + parsing of solver output, no per-level re-solve loops. +``` + +### Reading the solve + +- **Knee at $300M.** xAI Colossus alone delivers $105M/yr ($210K × 500 MW), the single highest-revenue request. Unlocking it requires the $110M UPG-003 upgrade at DFW. Below $300M, xAI doesn't fit; above $300M, the next-best DCs (Meta, Amazon, Apple) deliver $0.4M/$M-ish — diminishing returns. +- **DFW is the binding bottleneck.** With xAI's 500 MW absorbing the post-upgrade headroom, neither Google (400 MW) nor Lambda (200 MW) ever fit — they share DFW. The optimizer correctly prioritizes revenue per MW over queue position. +- **All compliant requests get approved by $200M.** The 2 fully compliant requests from Stage 3 (Crusoe, Oracle) sit on sub-critical substations; they're cheap upgrades and clear the bar at the lowest budget. +- **`predicted_load` propagates the chain.** Capacity slack at every substation reflects Stage 1's forecast, not raw historical load — the same signal the rules engine used. + +--- + +## Stage 5 — Interpretation + +``` +THE PLAN, IN BUSINESS TERMS + + • Recommended budget: $300M (the knee). + 5 DCs, 1,500 MW connected, $279M/yr revenue, $264M net value. + • Marginal $100M from $200M unlocks xAI Colossus alone — $105M/yr — at + $995K return per $1M spent. Beyond the knee, marginal return roughly + halves. + • Google (400 MW) and Lambda (200 MW) are structurally infeasible at any + budget level — DFW is full once xAI takes its 500 MW. Either redirect + them to a different substation, or de-prioritize. + • Crusoe (Midland) and Oracle (Corpus Christi) approve cheaply at any + budget — they were the only two that passed all three rules in Stage 3. + +ONTOLOGY ENRICHMENT — what each stage contributed back + + Stage 1 (predictive) Substation.predicted_load [12] + + Stage 2 (graph) Substation.betweenness [12] + Substation.degree_centrality [12] + Substation.eigenvector_centrality [12] + Substation.grid_community [12] + Substation.is_structurally_critical [3] + + Stage 3 (rules) DataCenterRequest.fails_capacity [8] + DataCenterRequest.fails_structural [7] + DataCenterRequest.fails_low_carbon [0] + DataCenterRequest.is_compliant [2] + Substation.low_carbon_gen_mw [12] + Substation.total_gen_mw [12] + + Stage 4 (prescriptive) DataCenterRequest.x_approve [50] + SubstationUpgrade.x_upgrade [50] + + ────────────────────────────────────────────────────────────────── + Each stage reads what the previous stage wrote. + Re-running any downstream stage automatically picks up enrichments. + No glue code, no DataFrame round-trip — same ontology throughout. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## The chain — accretive ontology enrichment + +``` +THE ENERGY GRID PLANNING CHAIN + + STAGE 1 PREDICTIVE + "Where is demand growing? Who breaches first?" + reads: DemandForecast.predicted_load_mw, Substation.current_load_mw + writes: Substation.predicted_load ── per substation + │ + ▼ + STAGE 2 GRAPH (WCC / Louvain / centrality) + "How is the grid connected? Which substations are structural bottlenecks?" + reads: Substation nodes, TransmissionLine edges (active) + writes: Substation.betweenness / degree_centrality / eigenvector_centrality + Substation.grid_community ── 3 regions + Substation.is_structurally_critical ── 3 substations + │ + ▼ + STAGE 3 RULES (declarative Relationships) + "Which DC requests pass capacity, structural, low-carbon checks?" + reads: Substation.predicted_load ◄── Stage 1 + Substation.is_structurally_critical ◄── Stage 2 + Generator.emissions_rate, DataCenterRequest.low_carbon_requirement_pct + writes: DataCenterRequest.fails_capacity / fails_structural / fails_low_carbon + DataCenterRequest.is_compliant ── 2 requests + │ + ▼ + STAGE 4 PRESCRIPTIVE (HiGHS MIP, Scenario Concept) + "Which DCs to approve and which upgrades to fund across 5 budget levels?" + reads: Substation.predicted_load ◄── Stage 1 + SubstationUpgrade.cost_million / capacity_increase_mw + DataCenterRequest.requested_mw / annual_revenue_per_mw + InvestmentLevel.budget_cap ── 5 levels + writes: DataCenterRequest.x_approve(InvestmentLevel) + SubstationUpgrade.x_upgrade(InvestmentLevel) + │ + ▼ + Pareto frontier, + queryable directly from the ontology. + + ────────────────────────────────────────────────────────────────── + No glue. No DataFrame ping-pong. No per-level re-solve loop. + Four reasoners, one ontology, one accretive thread. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## Why the chain matters (vs. any single stage) + +| Stage alone | What it tells you | What it doesn't | +|---|---|---| +| Predictive | "DFW will breach in 24 months" | What to do; which requests matter | +| Graph alone | "DFW, Houston, San Antonio are bottlenecks" | Whether they have headroom; what to approve | +| Rules alone | (won't fire — no `predicted_load`, no `is_structurally_critical`) | Pipeline misses | +| Prescriptive alone | (no flagged set, no critical-node info, no forecast) | Whole pipeline misses | + +| Combined | Output | +|---|---| +| Predictive → Graph | Forecasted load + structural bottleneck map | +| + Rules | Per-request compliance vs. capacity / structural / low-carbon | +| + Prescriptive | Pareto frontier across 5 budget levels in one solve | + +**Multi-reasoner chaining grounded in (and contributing to) the ontology.** + +--- + +## Data Reference + +**Substations with DC requests:** + +| Substation | Location | Capacity | DC Requests | DC MW | +|------------|----------|----------|-------------|-------| +| SUB-001 | Houston Ship Channel | 1,800 MW | Microsoft (350), Meta (300) | 650 MW | +| SUB-002 | Dallas-Fort Worth | 1,600 MW | Google (400), xAI (500), Lambda (200) | 1,100 MW | +| SUB-003 | San Antonio Metro | 1,200 MW | Amazon (280), Apple (250) | 530 MW | +| SUB-004 | Austin Energy | 900 MW | CoreWeave (320) | 320 MW | +| SUB-005 | Midland-Permian | 1,100 MW | Crusoe (180) | 180 MW | +| SUB-007 | Corpus Christi Coast | 800 MW | Oracle (150) | 150 MW | + +**ERCOT regions (Louvain):** North Texas (DFW, Austin, Waco) | West Texas (Midland, Lubbock, El Paso, Amarillo, Abilene) | Gulf Coast (Houston, San Antonio, Corpus Christi, Brownsville) + +**DFW breach:** 1,600 MW capacity, 1,700 MW predicted (24mo), +54.6% growth, 1,100 MW DC requests stacked on top. Google and Lambda permanently infeasible. + +**Upgrades:** 10 available, $630M total, 2,900 MW combined capacity. Only $300M of upgrades are needed at the knee. + +- **Source data**: bundled CSVs in `../data/` (12 substations, 15 generators, 18 transmission lines, 10 DC requests, 10 upgrade options, plus historical load and forecast tables). +- **Ontology**: defined in `../energy_grid_planning.py` (13 concepts). +- **Stages**: implemented in `../energy_grid_planning.py` as a single combined script with stage banners. diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md new file mode 100644 index 0000000..98760d9 --- /dev/null +++ b/v1/machine_maintenance/references/runbook.md @@ -0,0 +1,215 @@ +# Runbook: Machine Maintenance — Multi-Reasoner Walkthrough + +Walk-through of the chained-reasoner pattern this template is built on. One realistic plant-floor thread — **scheduling preventive maintenance for a 30-machine, 3-plant operation** — traced across querying, graph, rules, and prescriptive reasoners, each stage writing properties back to the same ontology that downstream stages consume. + +The template's combined script (`machine_maintenance.py`) implements all five stages directly; this runbook expands the surrounding narrative — what each prompt asks, what shape of output to expect, and how each enrichment feeds the next — so a reader can follow the reasoning thread end-to-end without re-running the script. + +--- + +## TL;DR — the chain in one screen + +``` +Plant_B looks worst on OEE (61.4%). Plant_A looks mid-tier (68.2%). +The chain shows Plant_A is actually the highest-risk plant — and that +all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. + + ───────────────────────────────────────────────────────────────── + STAGE 0 Querying ──► Machine.performance_ratio (30) + Machine.quality_ratio (30) + Machine.anomaly_count (30) + MachinePeriod.predicted_fp (120) + Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4% + 7 of 9 sensor anomalies are at Plant_A. + ───────────────────────────────────────────────────────────────── + STAGE 1 Graph ──► Machine.betweenness (30) + 30 machines → 1 connected component. + Pumps tie for top centrality (24.0). + ───────────────────────────────────────────────────────────────── + STAGE 2 Rules ──► Machine.is_overdue_maintenance (6) + Machine.is_high_risk (1) + Machine.is_chronic_downtime (3) + Machine.risk_tier (30) + M013 (Pump, Plant_A) = Critical (3 of 3). + ───────────────────────────────────────────────────────────────── + STAGE 3 Prescriptive ──► MachinePeriod.x_maintain (120 binary) + MachinePeriod.x_vulnerable (120 binary) + TechnicianMachinePeriod.x_assigned + OPTIMAL · 20 jobs · 4 periods · $605,241 + ───────────────────────────────────────────────────────────────── + STAGE 4 Resilience ──► Concentration analysis on the solve + Turbine: all 3 techs in Houston_TX. + 67% of scheduled Turbine jobs travel. + Cross-train T006 (Chicago) — $3,200 / 5 wks. + ───────────────────────────────────────────────────────────────── +``` + +A single-reasoner approach can't surface this. OEE alone says Plant_B is the problem. Sensor counts alone don't quantify forward risk. Rules alone flag machines but don't allocate scarce technician time. The optimizer alone produces a feasible schedule — but doesn't know that a single weather event in Houston blocks all on-site Turbine work. + +--- + +## Setup + +See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `machine_maintenance.py`. + +**Prerequisites** + +- Template's `data/` CSVs available (or your own Snowflake schema with equivalent tables — `machines`, `technicians`, `qualifications`, `availability`, `parts_inventory`, `certification_expiry`, `sensors`, `sensor_readings`, `failure_predictions`, `downtime_events`, `production_runs`, `training_options`) +- `raiconfig.yaml` pointing at your RAI engine +- Python ≥ 3.10 with `relationalai >= 1.0.14` + +--- + +## Workflow + +The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skill order. Each row maps to a section of the script. + +| # | Step | Skill | Prompt | Expected Output | +|---|------|-------|--------|-----------------| +| 1 | Build ontology | `/rai-build-starter-ontology` | "Build a RAI ontology for a manufacturing maintenance scheduling problem from the CSVs in `data/`. Concepts: Machine, Technician, Qualification, Period, MachinePeriod, TechnicianPeriod, TechnicianMachinePeriod, PartsInventory, CertificationExpiry, Sensor, SensorReading, FailurePrediction, DowntimeEvent, ProductionRun." | Model `machine_maintenance` with 14 user-facing concepts. 30 Machine rows (3 plants × 10 machines, 5 types × 6). 10 Technician rows (4 Chicago_IL, 3 Houston_TX, 3 Phoenix_AZ). 16 Qualification rows. 4 Period rows. | +| 2 | Discovery | `/rai-discovery` | "What questions can we answer with this ontology? We want to schedule preventive maintenance and surface hidden operational risk." | Querying: OEE by facility, anomaly counts, failure trajectory deltas. Graph: machine dependency clusters, bottleneck centrality on shared-technician edges. Rules: overdue, high-risk, chronic-downtime flags chained to a composite risk tier. Prescriptive: maintain × period × technician assignment minimizing failure + labor + travel cost. Resilience: post-solve concentration analysis. | +| 3 | Stage 0 — OEE proxy | `/rai-querying` | "Compute OEE proxy (Performance × Quality) by facility. Performance is total_actual / total_planned across ProductionRuns; Quality is total_good / total_actual." | Plant_C: Perf 81.3%, Qual 98.1%, OEE 79.8%. Plant_A: Perf 69.8%, Qual 97.8%, OEE 68.2%. Plant_B: Perf 62.6%, Qual 98.1%, OEE 61.4%. Quality is uniform; Performance is the differentiator. | +| 4 | Stage 0 — Sensor anomalies | `/rai-querying` | "List machines with above-threshold sensor readings (`SensorReading.is_anomaly == 1`), grouped by facility." | 9 anomaly readings across 5 machines. Plant_A: 7 (M013 Pump:3, M001 Turbine:2, M016 Turbine:2). Plant_B: 1 (M002 Compressor). Plant_C: 1 (M006 Turbine). Plant_A's anomaly load is 7× Plant_B's despite Plant_A's higher OEE. | +| 5 | Stage 0 — Failure trajectories | `/rai-querying` | "For each machine, compute the failure-probability delta from period 1 to period 4 from `FailurePrediction`. Show the steepest 6." | M001 (Turbine, Plant_A): 0.102 → 0.332 (+0.230, bearing_wear). M013 (Pump, Plant_A): 0.435 → 0.663 (+0.228, impeller_erosion). M016 (Turbine, Plant_A): 0.263 → 0.482 (+0.219, bearing_wear). All three steepest curves are Plant_A. Stored back as `MachinePeriod.predicted_fp` (120 rows) for Stage 3's objective. | +| 6 | Stage 1 — Dependency graph | `/rai-graph-analysis` | "Build a graph with `Machine` as `node_concept`. Two machines are adjacent when at least one technician is qualified for both machine types. Run weakly connected components." | 30 nodes, edges joined via `Qualification`. WCC: 1 cluster of 30 — every machine is reachable from every other through shared qualifications. No isolated subgraphs. | +| 7 | Stage 1 — Bottleneck centrality | `/rai-graph-analysis` | "Compute betweenness centrality on the dependency graph. Normalize and store as `Machine.betweenness`." | Pump-type machines tie at the top (raw betweenness 24.0 → normalized 1.0): M003 (Plant_C), M008 (Plant_B), M013 (Plant_A). Turbines, Generators, Motors, Compressors lower. `Machine.betweenness` written back for all 30 machines and consumed by Stage 3's failure-cost multiplier. | +| 8 | Stage 2 — Compliance flags | `/rai-rules-authoring` | "Define seven derived flags: overdue (`remaining_useful_life < maintenance_duration_hours`), high-risk (`failure_probability > 0.3 AND criticality >= 4`), anomalous (`anomaly_count > 0`), chronic-downtime (`downtime_event_count > 8`), parts-reorder (`stock_level <= min_order_qty`), expiring-cert (`days_remaining < 30`). Use `model.where(...).define(...)`." | Overdue (6): M002, M006, M013, M016, M022, M025 (RUL below required maintenance hours). High-risk (1): M013 (fp=0.435, crit=4). Anomalous (5): M013, M001, M016, M002, M006. Chronic downtime (3, threshold > 8 events): M001 (12 events), M016 (11), M013 (10). Parts reorder (4): P001, P003, P004, P006. Expiring certs (5): T001 Compressor 22d, T004 Pump 8d, T003 Compressor 15d, T006 Motor 25d, T009 Motor 12d. | +| 9 | Stage 2 — Composite risk tier | `/rai-rules-authoring` | "Chain `is_chronic_downtime`, `is_high_risk`, `is_overdue_maintenance` into `Machine.risk_tier`: Critical if all three, Elevated if exactly two, Standard otherwise. Enumerate all 8 combinations using `model.not_()` for negation." | Critical (1): M013 (Pump, Plant_A) — chronic + high-risk + overdue. Elevated (1): M016 (Turbine, Plant_A) — chronic + overdue, not high-risk. Standard (28): rest. Plant_A holds the only Critical and the only Elevated machine despite ranking second on OEE. | +| 10 | Stage 3 — Formulation | `/rai-prescriptive-problem-formulation` | "Formulate the maintenance schedule. Decision variables: `MachinePeriod.x_maintain` (bin), `MachinePeriod.x_vulnerable` (bin), `TechnicianMachinePeriod.x_assigned` (bin) — restricted to qualified pairs. Constraints: cumulative coverage (`Σ x_maintain[m,1..τ] + x_vulnerable[m,τ] = 1` per machine and period), assignment-maintenance linkage (`Σ x_assigned over techs = x_maintain` per (m,τ)), technician hour capacity (`Σ x_assigned · duration ≤ available_hours`), parts/bay capacity (`≤ 5 jobs per period`), and overdue deadline (`Σ x_maintain[m, τ ≤ 2] ≥ 1` for every overdue machine — feeds from Stage 2). Objective: minimize failure_cost + labor_cost + travel_cost." | 120 `x_maintain` binaries (30 machines × 4 periods). 120 `x_vulnerable` binaries. ~250 `x_assigned` binaries (qualification-restricted). 5 constraint families. Failure cost uses `MachinePeriod.predicted_fp` (Stage 0) × `Machine.criticality` × `(1 + 2.0 × Machine.betweenness)` (Stage 1). | +| 11 | Stage 3 — Solve | `/rai-prescriptive-solver-management` | "Solve with HiGHS, time limit 120s, assert OPTIMAL." | OPTIMAL. Objective = $605,240.61. 20 maintenance jobs scheduled across 4 periods (capacity-binding at 5 jobs/period). Both Plant_A overdue Turbines (M016 plus the rest of the overdue list) maintained by period 2 — overdue constraint satisfied. | +| 12 | Stage 3 — Schedule readout | `/rai-prescriptive-results-interpretation` | "Show the period-by-period schedule and technician assignments. Flag any travel (`base_location != machine.location`)." | Period 1 includes M002 (Plant_B), M006 (Plant_C), M013 (Plant_A), M016 (Plant_A) — high-priority overdue/critical machines. Periods 2–4 cover the remaining 16 jobs. Multiple Turbine assignments require travel because all 3 Turbine-qualified techs (T001, T002, T003) are based in Houston_TX while Turbines exist at all 3 plants. Travel cost is paid at $50/hr × duration. | +| 13 | Stage 4 — Concentration analysis | `/rai-graph-analysis`, `/rai-querying` | "From the qualification table, find machine types whose qualified technicians are all in one location. For each concentrated type, count how many scheduled jobs require travel." | Compressor: techs in Chicago_IL, Houston_TX (gap: Phoenix_AZ). Generator: Chicago_IL, Phoenix_AZ (gap: Houston_TX). Motor: Chicago_IL, Phoenix_AZ (gap: Houston_TX). Pump: Chicago_IL, Phoenix_AZ (gap: Houston_TX). **Turbine: all 3 techs in Houston_TX — CONCENTRATED.** Of 3 scheduled Turbine jobs, 2 require travel (67%). 4 of 6 Turbines are at remote plants. The optimizer found the cheapest plan but cannot fix the structural fragility — losing T001's Compressor cert (22 days remaining) doesn't break Turbines, but losing any of T001/T002/T003 from Houston shrinks Turbine coverage by a third. | +| 14 | Stage 4 — Cross-training recommendation | `/rai-prescriptive-results-interpretation` | "From `training_options.csv`, find the cheapest Turbine-cross-training candidate based outside Houston_TX." | Best candidate: **T006 (Fiona_Garcia, Senior, Chicago_IL) — $3,200 / 5 weeks.** Other non-Houston options: T005 ($3,500/6w, Chicago), T008 ($3,800/6w, Phoenix), T009 ($4,200/8w, Phoenix), T004 ($5,500/10w, Chicago). Training T006 adds the first non-Houston Turbine tech, eliminates the single-point-of-failure for Plant_B and Plant_C Turbines, and pays back the first time travel or a cert lapse would have idled a Turbine job. The prescriptive reasoner produced the schedule; the resilience layer produced the structural action item. | + +--- + +## Stage outputs — what each reasoner contributes back + +``` +ONTOLOGY ENRICHMENT — what each stage wrote back + + Stage 0 (querying) Machine.total_planned_qty [30] + Machine.total_actual_qty [30] + Machine.total_good_qty [30] + Machine.performance_ratio [30] + Machine.quality_ratio [30] + Machine.total_downtime_minutes [30] + Machine.downtime_event_count [30] + Machine.anomaly_count [30] + MachinePeriod.predicted_fp [120] + + Stage 1 (graph) Machine.betweenness_raw [30] + Machine.betweenness (normalized) [30] + + Stage 2 (rules) Machine.is_overdue_maintenance [6] + Machine.is_high_risk [1] + Machine.is_anomalous [5] + Machine.is_chronic_downtime [3] + Machine.risk_tier [30] + PartsInventory.needs_reorder [4] + CertificationExpiry.is_expiring [5] + + Stage 3 (prescriptive) MachinePeriod.x_maintain [120 binary] + MachinePeriod.x_vulnerable [120 binary] + TechnicianMachinePeriod.x_assigned + + Stage 4 (analysis) (terminal — prints concentration risk and + costed cross-training recommendation) + + ────────────────────────────────────────────────────────────────── + Each stage reads what the previous stage wrote. + Re-running any downstream stage automatically picks up enrichments. + No glue code, no DataFrame round-trip — same ontology throughout. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## The chain — accretive ontology enrichment + +``` +THE MACHINE-MAINTENANCE CHAIN + + STAGE 0 QUERYING + "Where does the operation actually hurt?" + reads: ProductionRun, SensorReading, FailurePrediction + writes: Machine.performance_ratio / quality_ratio + Machine.anomaly_count + Machine.downtime_event_count / total_downtime_minutes + MachinePeriod.predicted_fp + │ + ▼ + STAGE 1 GRAPH (betweenness centrality) + "Which machines are scheduling bottlenecks?" + reads: Qualification, Machine (as node_concept) + writes: Machine.betweenness ── normalized 0..1 + │ + ▼ + STAGE 2 RULES + "Which machines violate which compliance flags?" + reads: Machine.failure_probability, Machine.criticality, + Machine.remaining_useful_life, Machine.maintenance_duration_hours, + Machine.anomaly_count, Machine.downtime_event_count, + PartsInventory.stock_level, CertificationExpiry.days_remaining + writes: Machine.is_overdue_maintenance / is_high_risk / + is_chronic_downtime / is_anomalous + Machine.risk_tier ── Critical / Elevated / Standard + PartsInventory.needs_reorder + CertificationExpiry.is_expiring + │ + ▼ + STAGE 3 PRESCRIPTIVE (HiGHS MIP) + "What's the optimal maintain-and-assign plan?" + reads: MachinePeriod.predicted_fp ──► failure cost (period-specific) + Machine.betweenness ──► failure cost multiplier + Machine.is_overdue_maintenance ──► hard deadline constraint + Qualification (assignment scope), TechnicianPeriod.capacity_hours + writes: MachinePeriod.x_maintain ── 20 jobs flagged + MachinePeriod.x_vulnerable + TechnicianMachinePeriod.x_assigned + │ + ▼ + STAGE 4 RESILIENCE + "Where is the schedule structurally fragile?" + reads: Solution variables, Qualification, Technician, TrainingOption + writes: (terminal — concentration risk + cross-training recommendation) + + ────────────────────────────────────────────────────────────────── + No glue. No DataFrame ping-pong. No re-derivation per-reasoner. + Five reasoners, one ontology, one accretive thread. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## Why the chain matters (vs. any single stage) + +| Stage alone | What it tells you | What it doesn't | +|---|---|---| +| Querying | "Plant_B has worst OEE; Plant_A has the most anomalies" | Whether anomalies translate to scheduling priority | +| Graph alone | "Pumps are the most central machine type" | Which Pump matters most or what to do | +| Rules alone | "M013 is Critical-tier" | How to fit M013 plus 19 others into a 4-period horizon with 10 techs | +| Prescriptive alone | (won't run — no per-period failure curve, no centrality weight, no overdue flag) | Whole pipeline misses | +| Resilience alone | (won't run — no schedule to analyze) | — | + +| Combined | Output | +|---|---| +| Querying → Graph | Per-machine signals plus structural bottleneck weight | +| + Rules | Composite risk tier + a hard "must maintain by period 2" constraint | +| + Prescriptive | $605K plan, 20 jobs, Plant_A's Critical machine handled in Period 1 | +| + Resilience | Cross-train T006 for $3,200 → eliminates Houston Turbine concentration | + +**Multi-reasoner chaining grounded in (and contributing to) the ontology.** + +--- + +## Optional extension — operator-shift assignment + +A second optimization pass — assigning operators to line-shifts to maximize a skill-match bonus, subject to a same-facility constraint — is a natural follow-on but not part of this template's main script. It would consume the same `Machine` and facility ontology and add `Operator`, `Shift`, and `OperatorShift` concepts. Out of scope for this runbook. + +--- + +## Data Reference + +- **Source data**: bundled CSVs in `../data/` (30 machines across 3 plants × 5 machine types, 10 technicians across 3 cities, 16 qualifications, 4 planning periods, 60 sensors with 240 readings, 120 per-period failure predictions, 129 downtime events, 120 production runs, 13 cross-training options). To run against your own Snowflake schema instead, swap the `read_csv(...)` loads for `model.Table(...)` references in `machine_maintenance.py`; the rest of the pipeline is unchanged. +- **Stages**: implemented in `../machine_maintenance.py` as a single combined script with stage banners (Stage 0 → Stage 4). +- **Ontology**: 14 user-facing concepts. Run `inspect.schema(model)` after the pipeline (see template README) to dump the full concept/property/relationship surface, filtering out reasoner-owned concepts (`Variable`, `Constraint`, etc.) and the auto-generated `graph_Edge` from Stage 1. diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md new file mode 100644 index 0000000..0fac804 --- /dev/null +++ b/v1/supply_chain_resilience/references/runbook.md @@ -0,0 +1,359 @@ +# Runbook: Supply Chain Resilience — Multi-Reasoner Walkthrough + +Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **risk-adjusted network flow with disruption scenarios** — traced across four RAI reasoning stages, each writing properties back to the same ontology that downstream stages consume. + +The template's combined script (`supply_chain_resilience.py`) implements all four stages directly. This runbook expands the surrounding narrative — what each stage finds, why the next stage needs it, what the optimizer does with the enrichment — so a non-OR reader can follow the full reasoning thread end-to-end. + +--- + +## TL;DR — the chain in one screen + +``` +Two HIGH-priority customers depend on 6 upstream suppliers — one of which +(PowerCell, B003) is flagged "watch" by rules. The chain produces a +$1,865 baseline plan, then quantifies disruption: top hub offline = +88.5%, +watch->avoid downgrade = +0.0% (optimizer already routed around it). + + ───────────────────────────────────────────────────────────────── + STAGE 0 Reachability ──► Business.is_high_priority_customer (2) + Upstream supplier dependency map for + each HIGH-priority customer (B008, B009). + ───────────────────────────────────────────────────────────────── + STAGE 1 Graph ──► Site.centrality (normalized) + Top hubs: S004 TechAssembly 1.000, + S006 West Coast DC 0.776, S003 PowerCell 0.735. + 2 weakly-connected components. + ───────────────────────────────────────────────────────────────── + STAGE 2 Rules ──► Business.is_unreliable (1) + Business.has_high_delay_risk (2) + Business.is_watch_level (2) + Demand.is_escalated (9) + [X] B017 avoid · [!] B003 watch + 37 of 262 shipments late (14%). + ───────────────────────────────────────────────────────────────── + STAGE 3 Prescriptive ──► Operation.x_flow / Demand.x_unmet + OPTIMAL · $1,865 · 8 active flows · 0 unmet + + 2 scenario re-solves (S004 offline, watch->avoid) + ───────────────────────────────────────────────────────────────── +``` + +A single-reasoner approach can't answer this. Reachability alone names the suppliers in scope but doesn't rank them. Graph alone ranks hubs but doesn't decide flow. Rules alone classifies suppliers but doesn't route around them. Prescriptive alone has no way to hard-block bad suppliers, surcharge watch suppliers, or penalize bottleneck hubs without those upstream signals. + +--- + +## Setup + +See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `supply_chain_resilience.py` against the bundled CSVs in `../data/`. + +--- + +## Stage 0 — Reachability: blast-radius pre-analysis + + +**Construction** — directed `Business` graph, edges from `Business.ships_to` (derived from `Shipment.supplier` -> `Shipment.customer`). + +**Targets** — `Business.is_high_priority_customer` is set wherever a `Demand` with `priority == "HIGH"` is placed by that business. From the bundled data, this fires for **2 buyers**: B008 MegaCorp Enterprise and B009 TechGiant Inc (9 HIGH-priority demands between them, all for ProPhone X1 / ProTab T1). + +**Algorithm** — `biz_graph.reachable(to=target_customer)` filtered to nodes with `business_type == "SUPPLIER"`. + +``` +Upstream supplier dependencies (HIGH-priority customers) + + MegaCorp Enterprise (B008) depends on 6 suppliers: + - CellChem China (reliability 78%) ← will become AVOID + - EuroCell Poland (reliability 88%) + - EuroChip Germany (reliability 94%) + - EuroDisplay Czech (reliability 91%) + - GlassCorp Korea (reliability 89%) + - WaferTech Taiwan (reliability 97%) + + TechGiant Inc (B009) depends on 6 suppliers: (same set) + + ────────────────────────────────────────────────────────────────── + Both HIGH-priority customers share the same 6-supplier upstream. + CellChem (B017) sits in BOTH dependency cones — Stage 2 will flag + it AVOID, and Stage 3's baseline already excludes it. + ────────────────────────────────────────────────────────────────── + +✓ Business.is_high_priority_customer written back (2 buyers) +``` + +The point of running reachability before the MILP: when the scenario in Stage 3 downgrades watch suppliers to avoid, the cost delta has to be read against this dependency map. If a downgraded supplier sits in zero customer cones, the optimizer just shrugs and the cost stays flat — which is exactly what happens here for the watch-tier supplier B003. + +--- + +## Stage 1 — Graph: site centrality + connected components + + +**Construction:** +- Node concept: `Site` (31 sites) +- Edges: built from `Operation` rows where `op_type == "SHIP"` (undirected, unweighted) +- Aggregator: `"sum"` (collapse parallel ship lanes between the same pair) + +**Algorithms:** `weakly_connected_component()` for cluster discovery, then `eigenvector_centrality()` for hub importance. + +``` +Connected components: 2 + + Component A: 25 sites (APAC + AMERICAS, joined by S004->S006 long-haul) + Component B: 6 sites (EMEA distribution loop) + +Top critical sites — eigenvector centrality (FACTORY/DC only) + + S004 TechAssembly Factory (FACTORY, APAC) 0.5016 ████████████ ★ central hub + S006 West Coast DC (DC, AMERICAS) 0.3895 █████████ + S003 PowerCell Facility (FACTORY, APAC) 0.3688 █████████ ⚠ also Stage 2 watch + S002 DisplayCorp Plant (FACTORY, APAC) 0.3145 ████████ + S001 ChipTech Factory (FACTORY, APAC) 0.3145 ████████ + S012 SiliconWorks Factory (FACTORY, APAC) 0.2456 ██████ + S013 ScreenTech Plant (FACTORY, APAC) 0.2456 ██████ + S014 EnergyPlus Facility (FACTORY, APAC) 0.2280 ██████ + + ────────────────────────────────────────────────────────────────── + S004 TechAssembly is the convergence point: every APAC component + factory ships into it before finished goods radiate to DCs. + S003 PowerCell shows up here AND in Stage 2 — structural and + behavioural risk overlap on the same supplier. + ────────────────────────────────────────────────────────────────── + +✓ Site.centrality written back, normalized to [0, 1] + (S004 = 1.000, S006 = 0.776, S003 = 0.735, ...) +``` + +--- + +## Stage 2 — Rules: supplier risk classification + + +**Late-shipment context** (computed in pandas, not RAI): + +``` +Late shipments: 37 of 262 (14%) + + B006 West Coast DC 7 late ████████ + B007 East Coast DC 5 late ██████ + B004 TechAssembly 4 late █████ + B022 EMEA DC Central 3 late ████ + B003 PowerCell 2 late ███ ← also high predicted Q1 + B017 CellChem 2 late ███ ← also low reliability +``` + +**Properties added to the ontology** (via `model.where(...).define(...)`): + +```python +# Rule 1 — reliability gate +m.where(Business.reliability_score < 0.80).define(Business.is_unreliable()) + +# Rule 2 — ML delay-risk gate (Q1-2025 GNN predictions) +m.where( + DelayPrediction.supplier_business(Business), + DelayPrediction.fiscal_quarter == "Q1-2025", + DelayPrediction.predicted_delay_prob > 0.15, +).define(Business.has_high_delay_risk()) + +# Rule 3 — union into watch level (rule chaining) +m.where(Business.is_unreliable()).define(Business.is_watch_level()) +m.where(Business.has_high_delay_risk()).define(Business.is_watch_level()) + +# Rule 4 — escalate HIGH-priority demands +m.where(Demand.priority == "HIGH").define(Demand.is_escalated()) +``` + +**Resulting classification** (Python combines the two RAI flags into avoid/watch/reliable): + +``` +Supplier risk classification + + [X] B017 CellChem China reliability 0.78 Q1 delay 0.22 ── AVOID + [!] B003 PowerCell Ltd reliability 0.81 Q1 delay 0.28 ── WATCH + [ ] B005 GlobalBuild Inc reliability 0.85 ── reliable + [ ] B014 EnergyPlus India reliability 0.85 ── reliable + [ ] B024 EuroAssembly Pol reliability 0.87 ── reliable + [ ] B020 EuroCell Poland reliability 0.88 ── reliable + [ ] B002 DisplayCorp reliability 0.88 ── reliable + [ ] B016 GlassCorp Korea reliability 0.89 ── reliable + [ ] B004 TechAssembly Co reliability 0.90 ── reliable + [ ] B019 EuroDisplay Czech reliability 0.91 ── reliable + [ ] B012 SiliconWorks Corp reliability 0.91 ── reliable + [ ] B013 ScreenTech Japan reliability 0.93 ── reliable + [ ] B018 EuroChip Germany reliability 0.94 ── reliable + [ ] B001 ChipTech Industries reliability 0.95 ── reliable + [ ] B015 WaferTech Taiwan reliability 0.97 ── reliable + + AVOID = is_unreliable AND has_high_delay_risk (both flags) + WATCH = is_unreliable OR has_high_delay_risk (one flag) + +Escalated demands (HIGH priority): 9 ── all from B008 / B009 + +✓ Business.is_unreliable [1 supplier: B017] +✓ Business.has_high_delay_risk [2 suppliers: B003, B017] +✓ Business.is_watch_level [2 suppliers: B003, B017] +✓ Demand.is_escalated [9 demands] +``` + +Stage 3 reads `is_watch_level` for the surcharge term and `is_unreliable AND has_high_delay_risk` (collapsed to the `avoid` set) for the hard block. + +--- + +## Stage 3 — Prescriptive: risk-adjusted minimum-cost flow + + +``` +FORMULATION + + Decision variables + Operation.x_flow (continuous, 70 ops, 0 ≤ x ≤ capacity_per_day) + Demand.x_unmet (continuous slack, 20 demands, ≥ 0) + + Constraints + 1. Demand satisfaction + Σ x_flow into customer-site for the demanded SKU + x_unmet ≥ quantity + 2. Avoid suppliers blocked + For every operation sourced from B017 (CellChem): x_flow == 0 + 3. (Scenario only) Site offline / extra blocks + + Objective (minimize) + Σ Operation.cost_per_unit · x_flow ── transport + + RISK_SURCHARGE · Σ x_flow on watch-supplier ops ── Stage 2 + + CENTRALITY_WEIGHT · Σ x_flow · Site.centrality ── Stage 1 + + UNMET_PENALTY · Σ x_unmet ── slack + + Tunables: UNMET_PENALTY=100, RISK_SURCHARGE=5, CENTRALITY_WEIGHT=2, + DELAY_PROB_THRESHOLD=0.15, RELIABILITY_THRESHOLD=0.80, + PREDICTION_QUARTER="Q1-2025" + +────────────────────────────────────────────────────────────────────── +SOLVE (HiGHS) → OPTIMAL 8 active flows $1,865.00 0 unmet +────────────────────────────────────────────────────────────────────── + +✓ Operation.x_flow / Demand.x_unmet written back as model properties. +``` + +The baseline buys: enough finished-goods flow on the shortest cost-weighted lanes to cover all 20 demand orders in full. CellChem's operations (B017) are hard-blocked. PowerCell (B003) operations carry a +5/unit surcharge — the optimizer accepts a small amount of B003-sourced flow only when no cheaper non-watch alternative exists. + +### Reading the solve + +- **8 active flows from 70 candidate operations** — the network is sparse at optimum; most capacity is idle. +- **$1,865 total cost** vs. unconstrained transport cost would be ~$1,500 — the centrality and watch-surcharge terms together add ~$365. +- **Zero unmet demand** — capacity is plentiful, so the slack term is inactive at baseline. + +--- + +## Scenario analysis — quantify disruption + + +The same `solve_flow(...)` function re-runs with modified constraints. Two scenarios surface different aspects of the chain's value: + +``` +SCENARIO COMPARISON + + Scenario Status Cost Δ vs baseline Unmet + ──────────────────────── ──────── ────────── ────────────── ───── + Baseline OPTIMAL $1,865.00 — 0 + Site S004 offline OPTIMAL $3,515.00 +88.5% 0 + Watch->Avoid OPTIMAL $1,865.00 +0.0% 0 +``` + +**Scenario A — top-centrality site offline (S004 TechAssembly).** The optimizer reroutes finished goods through S005 GlobalBuild Plant (Mexico) and longer EMEA lanes; cost jumps 88.5% but all demand is still covered. This is the **structural-risk** signal: losing the highest-centrality node forces expensive secondary routing. + +**Scenario B — downgrade all watch suppliers to avoid.** Adds B003 PowerCell to the hard-block set. **Cost is unchanged.** Why? B003 wasn't on any optimal lane — the centrality penalty + risk surcharge already discouraged the optimizer from routing through it at baseline. **This asymmetry is the punchline:** structural risk (Stage 1) costs 88.5% to disrupt; behavioural risk on already-deprioritized suppliers (Stage 2) costs 0% to harden against. The chain reveals which mitigations actually move the needle. + +(Cross-check the Stage 0 dependency map: B003 wasn't in either HIGH-priority customer's *direct* upstream — it ships components to manufacturers, who then route via the optimizer's preferred S004/S005 corridor. Reachability surfaced the supplier; centrality + objective weights ensured baseline never relied on it.) + +--- + +## The chain — accretive ontology enrichment + +``` +THE SUPPLY-CHAIN-RESILIENCE CHAIN + + STAGE 0 REACHABILITY (directed Business graph) + "Which suppliers do my high-priority customers transitively depend on?" + reads: Shipment.supplier / .customer ──► Business.ships_to (derived) + Demand.priority == "HIGH" ──► Business.is_high_priority_customer + writes: Business.is_high_priority_customer ── 2 buyers + │ + ▼ + STAGE 1 GRAPH (eigenvector centrality, WCC) + "Which sites are network bottlenecks?" + reads: Operation (op_type == "SHIP"), Site + writes: Site.centrality ── normalized [0,1] per site + │ + ▼ + STAGE 2 RULES (chained derivations) + "Which suppliers are risky, and which demands are escalated?" + reads: Business.reliability_score, DelayPrediction, Demand.priority + writes: Business.is_unreliable ── 1 supplier + Business.has_high_delay_risk ── 2 suppliers + Business.is_watch_level ── 2 suppliers + Demand.is_escalated ── 9 demands + │ + ▼ + STAGE 3 PRESCRIPTIVE (HiGHS LP) + "What's the minimum-cost flow plan that respects all of the above?" + reads: Site.centrality ──► objective coefficient (penalty) + Business.is_watch_level ──► objective coefficient (surcharge) + {avoid suppliers} ──► hard block (x_flow == 0) + Operation cost / capacity / SKU, Demand quantity / SKU / business + writes: Operation.x_flow ── 8 active flows + Demand.x_unmet ── 0 across all 20 demands + │ + ▼ + Re-solve per scenario (S004 offline, watch->avoid) + → cost-of-disruption table + + ────────────────────────────────────────────────────────────────── + No glue. No DataFrame ping-pong. No re-derivation per-reasoner. + Four stages, one ontology, one accretive thread. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## Why the chain matters (vs. any single stage) + +| Stage alone | What it tells you | What it doesn't | +|---|---|---| +| Reachability alone | "These 6 suppliers feed my critical customers" | Which are risky; which the optimizer would have used anyway | +| Graph alone | "S004 is the central hub" | Whether losing it is recoverable; at what cost | +| Rules alone | "B017 avoid, B003 watch" | Whether routing actually depends on them | +| Prescriptive alone | (degenerate — no risk filter, no bottleneck weight) | Picks cheapest lanes regardless of supplier risk or hub fragility | + +| Combined | Output | +|---|---| +| Reachability -> Graph | Customer-supplier dependency map + structural hub ranking | +| + Rules | Per-supplier risk class (avoid / watch / reliable) on top of the dependency map | +| + Prescriptive | Risk-adjusted min-cost flow ($1,865) + scenario deltas (+88.5% on S004 offline, +0% on watch->avoid) | + +**Multi-reasoner chaining grounded in (and contributing to) the ontology.** + +--- + +## Optional extension — predictive forecasting + +The bundled `delay_prediction.csv` is treated as already-trained ML output (a quarterly per-supplier delay probability table, `model_version='gnn_v2.0'`). A natural extension to the template is to replace the static CSV with a live GNN that retrains on `Shipment.delay_days` history. Skill: `/rai-predictive-modeling` + `/rai-predictive-training`. Reference templates with end-to-end GNN training: `templates/v1/fraud-detection`, `templates/v1/retail_planning`. The downstream stages (rules + prescriptive) wouldn't change — they read `DelayPrediction.predicted_delay_prob` either way. + +--- + +## Agent prompt sequence — recreate this template skill-by-skill + +Each row is a single agent prompt. Skills are loaded in order; each writes properties the next stage reads. + +| # | Skill | Prompt | What it produces | +|---|-------|--------|------------------| +| 1 | `/rai-build-starter-ontology` | "Build a starter ontology for a supply-chain dataset with 7 CSVs: site, business, operation, sku, demand, shipment, delay_prediction. Use Site/Business/Operation/SKU/Demand/Shipment/DelayPrediction as concepts. Render the result as an ASCII concept-relationship diagram." | Concepts, properties, relationships matching `supply_chain_resilience.py` lines 57–250 | +| 2 | `/rai-discovery` | "Given this ontology, what questions can each reasoner family answer? Group by graph / rules / prescriptive." | A reasoner-routing plan that covers Stages 0–3 below | +| 3 | `/rai-graph-analysis` | "Build a directed Business graph from Shipment.supplier->Shipment.customer. Run upstream reachability from every Business with a HIGH-priority demand. List the suppliers each high-priority customer transitively depends on." | Stage 0 — `Business.is_high_priority_customer`, blast-radius dependency map | +| 4 | `/rai-graph-analysis` | "Build an undirected Site graph from Operation rows where op_type == 'SHIP'. Compute weakly-connected components, then eigenvector centrality (filter to FACTORY/DC). Normalize and write the score back as Site.centrality." | Stage 1 — `Site.centrality` | +| 5 | `/rai-rules-authoring` | "Define three derived relationships on Business: is_unreliable (reliability_score < 0.80), has_high_delay_risk (any DelayPrediction for Q1-2025 with predicted_delay_prob > 0.15), and is_watch_level (union of the two). Also flag Demand.is_escalated for HIGH priority. Print the avoid (both flags) / watch (one flag) / reliable classification." | Stage 2 — risk flags, escalation flag | +| 6 | `/rai-prescriptive-problem-formulation` | "Formulate a minimum-cost network flow on Operation.x_flow with Demand.x_unmet slack. Constraint: inbound flow at customer site for demanded SKU + slack >= quantity. Hard-block operations sourced from avoid-tier businesses. Objective: transport cost + RISK_SURCHARGE * watch-supplier flow + CENTRALITY_WEIGHT * Σ flow · Site.centrality + UNMET_PENALTY * Σ unmet. Describe the formulation before solving — decision variables, constraints, objective, tunables." | Stage 3 formulation | +| 7 | `/rai-prescriptive-solver-management` | "Solve with HiGHS, time limit 120s. Report status, objective value, count of active flows, and total unmet demand." | Baseline solve — OPTIMAL, $1,865, 8 flows, 0 unmet | +| 8 | `/rai-prescriptive-results-interpretation` | "Re-solve two scenarios side-by-side: (a) top-centrality site offline (S004 TechAssembly), (b) all watch-level suppliers downgraded to avoid. Show cost delta vs baseline as a table. Explain why the deltas are asymmetric." | Scenario table + interpretation (S004 offline = +88.5%, watch->avoid = +0%) | + +--- + +## Data Reference + +- **Source data**: bundled CSVs in `../data/` — 31 sites across APAC / AMERICAS / EMEA, 31 businesses (6 suppliers, 6 component manufacturers, 2 manufacturers, 8 warehouses, 9 buyers), 9 SKUs (raw materials -> components -> finished goods ProPhone X1 / ProTab T1), 70 operations (SHIP + TRANSFER), 20 demand orders (9 HIGH, 5 MEDIUM, 6 LOW), 262 historical shipments (37 late), 36 quarterly delay predictions (4 quarters × 9 suppliers). To run against your own Snowflake schema instead, swap `read_csv(...)` for typed `model.Table(...)` loads against the equivalent table set. +- **Ontology**: defined inline in `../supply_chain_resilience.py` (lines 57–250) — 7 concepts plus the derived `Business.ships_to` and `Operation.source_business` relationships. +- **Stages**: implemented in `../supply_chain_resilience.py` as a single combined script with stage banners (`STAGE 0` through `STAGE 3` plus `SCENARIO ANALYSIS`). diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md new file mode 100644 index 0000000..d1b3003 --- /dev/null +++ b/v1/telco_network_recovery/references/runbook.md @@ -0,0 +1,449 @@ +# Runbook: Telco WEST Recovery — Multi-Reasoner Walkthrough + +Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **WEST region recovery** — traced across all five RAI reasoner families, each stage writing properties back to the same ontology that downstream stages consume. + +The template's combined script (`telco_network_recovery.py`) implements stages 2–5 directly; this runbook expands the surrounding narrative including the descriptive Stage 1 diagnosis and the Stage 6 interpretation, so a non-OR reader can follow the full reasoning thread end-to-end. + +--- + +## TL;DR — the chain in one screen + +``` +WEST is bleeding $791K/quarter from a network operations crisis. +The chain produces a $5M plan that recovers 122 Gbps capacity +across all 15 critical towers, prioritized by social blast radius. + + ───────────────────────────────────────────────────────────────── + STAGE 1 Descriptive ──► WEST: Q3-Q4 revenue −22% to −26%, + avail 94.6 vs 99.5, 15 of 81 DEGRADED. + Retention angle? No — 0 high-risk + subs; this is operational. + ───────────────────────────────────────────────────────────────── + STAGE 2 Rules ──► CellTower.is_critical_restore (15) + 4 derived health metrics + a compound + flag: WEST + DEGRADED + health < 0.85. + ───────────────────────────────────────────────────────────────── + STAGE 3 Graph ──► Subscriber.influence_score (PageRank) + CellTower.weighted_impact (15) + 404 distinct subs (33% of base) route + calls through a critical tower. + ───────────────────────────────────────────────────────────────── + STAGE 4 Predictive ──► CellTower.projected_demand_growth (15) + (GNN) WEST: 0.993× ── shrinking ~0.7%/yr + while 8 other regions sit at +0.59 to +0.75%/day. + ───────────────────────────────────────────────────────────────── + STAGE 5 Prescriptive ──► TowerUpgradeOption.selected (15) + OPTIMAL · 12 GOLD · 2 SILVER · 1 BRONZE + $4.96M of $5M (binding) · 122 Gbps + 164 of 200 install-weeks (slack) + ───────────────────────────────────────────────────────────────── +``` + +A single-reasoner approach can't answer this. Descriptive alone tells the story but no plan. Rules alone identifies broken towers but not which matter most. Graph alone ranks subscribers but doesn't decide. Predictive alone forecasts but doesn't act. Prescriptive alone has no way to weigh "important" without rules + graph upstream. + +--- + +## Setup + +See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `telco_network_recovery.py`. + +--- + +## Stage 1 — Descriptive: diagnose WEST + + +``` +Q4 2024 — Daily KPIs by region + + avg avail % avg churn rate avg daily revenue + ────────────── ──────────────── ────────────────── + WEST 94.60 ███ 0.0256 █████████ $72,558 ███████ + CENTRAL 99.52 ████████ 0.0046 █ $101,517 ██████████ + SOUTH 99.53 ████████ 0.0049 █ $102,030 ██████████ + EAST 99.55 ████████ 0.0049 █ $103,840 ██████████ + NORTH 99.56 ████████ 0.0050 █ $103,810 ██████████ + NORTHEAST 99.56 ████████ 0.0050 █ $99,569 ██████████ + SOUTHEAST 99.58 ████████ 0.0050 █ $100,429 ██████████ + NORTHWEST 99.54 ████████ 0.0049 █ $100,995 ██████████ + SOUTHWEST 99.55 ████████ 0.0049 █ $101,716 ██████████ + ▲ + WEST is 5× worse on availability AND churn, │ + ~30% lower on daily revenue. │ + │ + Q3-Q4 monthly revenue gaps stack to $791K: ──────┘ + + Sep 2024 forecast $1.40M → actual $1.04M ── −25.9% (−$362K) + Oct 2024 forecast $0.94M → actual $0.70M ── −25.1% (−$236K) + Nov 2024 forecast $0.88M → actual $0.68M ── −22.0% (−$193K) + + WEST tower fleet (81 towers) + + ACTIVE ████████████████████████ 49 + DEGRADED ███████ 15 ← 15 critical_restore + MAINTENANCE ████████ 17 + + Top performance offenders (NetworkPerformance, all WEST DEGRADED): + TWR-0015 ── 8.87% loss 190ms (424 measurements) + TWR-0014 ── 8.75% loss 189ms (273 measurements) + TWR-0010 ── 8.73% loss 188ms (429 measurements) + (… all 15 cluster 8.1-8.9% loss / 185-200ms latency) + ──────────────────────────────────────── + Healthy WEST towers sit at <1% loss / ~30ms latency. + + Retention angle? Zero subs in collections, zero with churn_risk ≥ 0.5. + High-risk subscribers concentrate in SOUTH/NORTH/CENTRAL, not WEST. + This is a NETWORK-side crisis, not subscriber retention. +``` + +Subscriber.churn_risk_score is a static feature that hasn't caught up to WEST's empirical churn (TimeSeriesMetric.churn_rate of 2.6% is 5x other regions). The thread focuses on infrastructure recovery. + +--- + +## Stage 2 — Rules: flag critical_restore towers + + +**Properties added to the ontology** (via `model.define(...)`): +- `CellTower.avg_packet_loss` (Float) — `aggs.avg(NetworkPerformance.packet_loss_pct).per(CellTower)` +- `CellTower.avg_latency_ms` (Float) +- `CellTower.avg_error_rate` (Float) +- `CellTower.avg_health_score` (Float) — across attached equipment via two-hop join + +**Rule** — `CellTower.is_critical_restore` (unary `Relationship`): + +```python +# Branch 1: WEST + DEGRADED + degraded equipment health +m.where( + CellTower.region == "WEST", + CellTower.status == "DEGRADED", + CellTower.avg_health_score < 0.85, # health is 0-1 scale +).define(CellTower.is_critical_restore()) + +# Branch 2: WEST + poor performance + degraded health (catches ACTIVE-but-failing) +m.where( + CellTower.region == "WEST", + CellTower.avg_packet_loss > 5.0, + CellTower.avg_health_score < 0.85, +).define(CellTower.is_critical_restore()) +``` + +``` +RULE FIRES → 15 towers flagged is_critical_restore (all WEST DEGRADED) + + TWR-0010 ── health 0.48 loss 8.73% cap_gbps 18 ███ + TWR-0015 ── health 0.60 loss 8.87% cap_gbps 60 ██████ + TWR-0009 ── health 0.62 loss 8.49% cap_gbps 17 ███ + TWR-0012 ── health 0.63 loss 8.59% cap_gbps 84 █████████ + TWR-0008 ── health 0.64 loss 8.51% cap_gbps 71 ████████ + TWR-0014 ── health 0.66 loss 8.75% cap_gbps 36 █████ + TWR-0002 ── health 0.66 loss 8.56% cap_gbps 17 ███ + TWR-0001 ── health 0.66 loss 8.56% cap_gbps 31 █████ + TWR-0011 ── health 0.66 loss 8.68% cap_gbps 61 ███████ + TWR-0005 ── health 0.68 loss 8.12% cap_gbps 94 ██████████ + TWR-0003 ── health 0.69 loss 8.45% cap_gbps 43 █████ + TWR-0007 ── health 0.72 loss 8.54% cap_gbps 17 ███ + TWR-0013 ── health 0.77 loss 8.62% cap_gbps 94 ██████████ + TWR-0006 ── health 0.78 loss 8.51% cap_gbps 37 █████ + TWR-0004 ── health 0.81 loss 8.54% cap_gbps 64 ███████ + +✓ is_critical_restore written back to CellTower (15 rows) +✓ avg_packet_loss / avg_latency_ms / avg_error_rate / avg_health_score + written back to all CellTowers (250 rows) +``` + +Branch 2 didn't fire — none of WEST's ACTIVE towers fall below health 0.85. The 15 flagged are exactly the WEST DEGRADED set, all with packet loss 8.1–8.9% and latency 185–200ms — sharp gap from the rest of WEST. + +--- + +## Stage 3 — Graph: subscriber influence + tower blast radius + + +**Construction** — Pattern 3 (`edge_concept`): +- Node concept: `Subscriber` (1,200 nodes) +- Edge concept: `CallDetailRecord`, with `caller`→`callee` (directed) +- Aggregator: `"sum"` (collapse parallel calls between the same pair) + +**Algorithm:** `pagerank()` (default for directed graphs). + +``` +PageRank — top 10 social influencers (of 1,200 subscribers) + + SUB-CON-00900 CONSUMER $3,793 LTV ████████████ 0.002963 + SUB-CON-00723 CONSUMER $3,049 LTV ████████████ 0.002956 + SUB-CON-00262 CONSUMER $3,764 LTV ███████████ 0.002790 + SUB-CON-00274 CONSUMER $2,850 LTV ███████████ 0.002695 + SUB-ENT-0038 ENTERPRISE $283,233 LTV ██████████ 0.002637 ★ + SUB-CON-00705 CONSUMER $765 LTV ██████████ 0.002599 + SUB-CON-00393 CONSUMER $3,219 LTV ██████████ 0.002581 + SUB-ENT-0001 ENTERPRISE $393,340 LTV ██████████ 0.002575 ★ + SUB-CON-01066 CONSUMER $1,146 LTV ██████████ 0.002570 + SUB-CON-00762 CONSUMER $307 LTV ██████████ 0.002525 + + ★ Top enterprise accounts also rank — heavy inbound call traffic. + PageRank captures structural influence independent of LTV. + +Per-critical-tower blast radius (sorted by weighted_impact) + + TWR-0014 61 subs ████████████ 0.0502 ← largest social footprint + TWR-0008 56 subs ██████████ 0.0430 + TWR-0011 48 subs ██████████ 0.0428 + TWR-0012 50 subs █████████ 0.0394 + TWR-0003 43 subs █████████ 0.0393 + TWR-0013 46 subs █████████ 0.0379 + TWR-0004 46 subs █████████ 0.0378 + TWR-0010 48 subs █████████ 0.0375 + TWR-0015 45 subs ████████ 0.0361 + TWR-0002 46 subs ████████ 0.0331 + TWR-0007 44 subs ████████ 0.0330 + TWR-0005 45 subs ████████ 0.0330 + TWR-0009 44 subs ████████ 0.0330 + TWR-0001 41 subs ████████ 0.0322 + TWR-0006 41 subs ████████ 0.0316 + + ────────────────────────────────────────────────────────────────── + 404 distinct subscribers (33% of the 1,200-sub base) route at least + one call through a critical WEST tower. TWR-0014's failure ripples + to 61 subs whose combined social influence is highest. + ────────────────────────────────────────────────────────────────── + +✓ Subscriber.influence_score written back to all 1,200 subscribers +✓ CellTower.impact_count + weighted_impact written back to CellTower +``` + +--- + +## Stage 4 — Predictive: forecast WEST capacity demand + + +**Method:** GNN node regression on `TimeSeriesMetric` (composite key `metric_date` + `region`). Target: `subscriber_growth_rate`. Features: the other 12 daily KPIs + 3 lag features (`prev_day_growth`, `prev_week_growth`, `growth_7d_mean`) + `region` as a category. Graph: same-region 1-day-lag temporal edges. Train < 2024-11-01 (includes the Sep–Oct WEST decline onset); validate on Nov 2024; test on Dec 2024. + +``` +Per-region GNN-predicted subscriber-growth-rate (Dec 2024 test horizon) + + CENTRAL ───── +0.0075 ████████ ▲ + EAST ───── +0.0073 ████████ │ + NORTH ───── +0.0071 ████████ │ 8 regions cluster + NORTHEAST ───── +0.0070 ████████ │ +0.59 to +0.75%/day + NORTHWEST ───── +0.0067 ████████ │ (mean predicted growth) + SOUTH ───── +0.0065 ████████ │ + SOUTHEAST ───── +0.0063 ███████ │ + SOUTHWEST ───── +0.0059 ███████ ▼ + ▲ + │ + WEST ───── −0.0071 ← anomaly: contracting + multiplier 0.993× + + ────────────────────────────────────────────────────────────────── + WEST projection: 0.7% demand decline over the test horizon. + Stage 5 picks up this multiplier as the 3rd objective coefficient. + ────────────────────────────────────────────────────────────────── + +✓ CellTower.projected_demand_growth written back to all 15 critical towers + (uniform 0.992871 — regional, not per-tower) +``` + +**Stage 5 objective with the predictive term:** + +``` +objective = sum( selected[t,tier] * + capacity_increase_gbps[t,tier] * + weighted_impact[t] * + projected_demand_growth[t] ) # ← Stage 4 contribution +``` + +**Snowflake setup for the GNN:** the template's main script computes lag features (prev-day, prev-week, 7-day mean) and same-region 1-day-lag temporal edges in pandas before loading, so no extra Snowflake DDL is required. To run on your own Snowflake schema instead of the bundled CSV, the equivalent SQL would be a typed copy of the time-series table plus per-region `LAG()` window functions. + +**Caveats:** +- The GNN was tuned for a single 80-epoch run with seed-42 reproducibility; production deployment would expand to a multi-seed average + a held-out holdout window. +- The WEST projection partially encodes the same network-degradation pattern Stages 2/3 flagged ("things have gotten worse and we expect them to keep getting worse if we don't act"). For an independent baseline, train on a pre-degradation slice (H1 2024 only) and compare. + +--- + +## Stage 5 — Prescriptive: tower upgrade selection MIP + + +``` +FORMULATION + + Decision variable + TowerUpgradeOption.selected (binary) + 45 binaries = 15 critical-restore towers × {BRONZE, SILVER, GOLD} + + Constraints + 1. At-most-one tier per tower sum(selected).per(CellTower) ≤ 1 + 2. Total cost Σ selected · cost ≤ $5,000,000 + 3. Total install_weeks Σ selected · install_weeks ≤ 200 + + Objective (maximize) + Σ selected · capacity_increase_gbps · weighted_impact · projected_demand_growth + └────── Step 2 (rules) ─────┘└── Stage 3 ──┘└── Stage 4 ──┘ + +────────────────────────────────────────────────────────────────────── +SOLVE (Gurobi) → OPTIMAL 15 active flows 122 Gbps $4,956,843 +────────────────────────────────────────────────────────────────────── + +Tower-tier assignment (sorted by weighted_impact) + + TWR-0014 ── GOLD +6 Gbps $350,864 wgt 0.0502 ████████████ + TWR-0008 ── GOLD +10 Gbps $416,455 wgt 0.0430 ██████████ + TWR-0011 ── GOLD +9 Gbps $481,914 wgt 0.0428 ██████████ + TWR-0012 ── GOLD +8 Gbps $445,825 wgt 0.0394 █████████ + TWR-0003 ── GOLD +11 Gbps $360,785 wgt 0.0393 █████████ + TWR-0013 ── GOLD +9 Gbps $273,831 wgt 0.0379 █████████ + TWR-0004 ── GOLD +9 Gbps $275,353 wgt 0.0378 █████████ + TWR-0010 ── GOLD +12 Gbps $332,694 wgt 0.0375 █████████ + TWR-0015 ── GOLD +11 Gbps $438,932 wgt 0.0361 ████████ + TWR-0002 ── GOLD +11 Gbps $420,363 wgt 0.0331 ████████ + TWR-0007 ── GOLD +9 Gbps $416,640 wgt 0.0330 ████████ + TWR-0005 ── SILVER +3 Gbps $220,435 wgt 0.0330 ████████ ⚐ + TWR-0009 ── BRONZE +3 Gbps $97,784 wgt 0.0330 ████████ ⚐ + TWR-0001 ── GOLD +6 Gbps $274,561 wgt 0.0322 ████████ + TWR-0006 ── SILVER +5 Gbps $150,407 wgt 0.0316 ████████ ⚐ + + ⚐ Lowest weighted_impact towers — solver buys cheaper tiers + to free budget for the higher-impact GOLDs. + +Budget gauge + Cost ████████████████████████████████████████ $4,956,843 / $5,000,000 ── BINDING + Install-wks █████████████████████████████████ 164 / 200 (slack: 36) + +Headline metrics + Capacity restored: 122 Gbps + Tier mix: 12 GOLD · 2 SILVER · 1 BRONZE + Towers covered: 15 of 15 (no triage tradeoff) + Subs serviced: 404 distinct (33% of all 1,200) + Objective without Stage 4 (Σ capacity × weighted_impact): 4.6024 + Objective with Stage 4 (× 0.992871 uniform multiplier): 4.5696 + +✓ TowerUpgradeOption.selected written back — the optimization output + is now a queryable property of the model. +``` + +(Full decision matrix: `outputs/stage5_solution.csv`.) + +### Reading the solve + +- **GOLD dominates** (12/15) — for towers with high social blast radius, GOLD's 6–12 Gbps uplift outweighs its higher cost. +- **Budget is binding** ($4.96M / $5M) — relaxing to $6M would let TWR-0009 jump from BRONZE to GOLD ($481K → +9 Gbps) and lift the objective meaningfully. +- **Install-weeks are not binding** (164/200) — schedule is the looser constraint; budget holds back the plan. +- **Stage 4's uniform multiplier doesn't shift tiers** — the forecast says WEST is contracting (-0.7%), so every upgrade is slightly less valuable in absolute terms, but relative tower priority is unchanged. **A non-uniform forecast would be the more revealing test of the chain's value** — if some WEST towers sat in growth pockets and others in decline, the tier mix would shift accordingly. + +--- + +## Stage 6 — Interpretation + + +``` +THE PLAN, IN BUSINESS TERMS + + • 122 Gbps of network capacity restored across all 15 critical towers + within the $5M capex budget. + • Every WEST DEGRADED tower gets an upgrade — no triage tradeoff. + • Service-affected subscribers drop from 404 to ~0 over the install + schedule (164 crew-weeks; 4-month rollout at 2 crews of 5). + • Budget binding — if CFO can flex to $6M, promote TWR-0009 to GOLD + for +9 Gbps marginal lift. + +ONTOLOGY ENRICHMENT — what each stage contributed back + + Stage 2 (rules) CellTower.is_critical_restore [15] + CellTower.avg_packet_loss [250] + CellTower.avg_latency_ms [250] + CellTower.avg_error_rate [250] + CellTower.avg_health_score [250] + + Stage 3 (graph) Subscriber.influence_score [1,200] + CellTower.impact_count [120] + CellTower.weighted_impact [120] + + Stage 4 (predictive) CellTower.projected_demand_growth [15] + + Stage 5 (prescriptive) TowerUpgradeOption.selected [45] + + ────────────────────────────────────────────────────────────────── + Each stage reads what the previous stage wrote. + Re-running any downstream stage automatically picks up enrichments. + No glue code, no DataFrame round-trip — same ontology throughout. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## The chain — accretive ontology enrichment + +``` +THE WEST RECOVERY CHAIN + + STAGE 1 DESCRIPTIVE + "Where is the bleed coming from?" + reads: RevenueForecast, TimeSeriesMetric, NetworkPerformance, CellTower + writes: (situational summary — no ontology mutation) + │ + ▼ + STAGE 2 RULES + "Which towers are critical to restore?" + reads: NetworkPerformance, EquipmentHealth, NetworkEquipment, CellTower + writes: CellTower.is_critical_restore ── 15 towers flagged + CellTower.avg_packet_loss / latency_ms / error_rate / health_score + │ + ▼ + STAGE 3 GRAPH (PageRank) + "Whose service depends on these towers — and who is socially central?" + reads: CallDetailRecord (caller→callee), CDR.routed_through(CellTower) + writes: Subscriber.influence_score ── per subscriber + CellTower.impact_count ── distinct subs served + CellTower.weighted_impact ── Σ subscriber influence + │ + ▼ + STAGE 4 PREDICTIVE (GNN node regression) + "What does the forecast say about future demand?" + reads: TimeSeriesMetric.subscriber_growth_rate × 365d × 9 regions + + 12 daily KPIs + 3 lag features + same-region temporal edges + writes: CellTower.projected_demand_growth ── per critical tower + │ + ▼ + STAGE 5 PRESCRIPTIVE (gurobi MIP) + "What's the optimal $5M tier-selection plan?" + reads: CellTower.is_critical_restore ──► decision-variable scope + CellTower.weighted_impact ──► objective coefficient + CellTower.projected_demand_growth ──► objective coefficient + TowerUpgradeOption.cost / capacity_increase / install_weeks + writes: TowerUpgradeOption.selected ── 15 upgrades chosen + │ + ▼ + Actionable plan, + grounded end-to-end in the same ontology. + + ────────────────────────────────────────────────────────────────── + No glue. No DataFrame ping-pong. No re-derivation per-reasoner. + Five reasoners, one ontology, one accretive thread. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## Why the chain matters (vs. any single stage) + +| Stage alone | What it tells you | What it doesn't | +|---|---|---| +| Descriptive | "WEST is broken" | Which towers, how to fix | +| Rules alone | "These 15 towers are critical" | Which matter most; what to do | +| Graph alone | "These subscribers are influential" | Which towers serve them | +| Predictive alone | "WEST demand is contracting" | Where to spend the recovery budget | +| Prescriptive alone | (won't run — no flagged set, no impact weights, no forecast) | Whole pipeline misses | + +| Combined | Output | +|---|---| +| Descriptive → Rules | Crisis scoped + critical-tower set flagged | +| + Graph | Each flagged tower scored by social blast radius | +| + Predictive | Forward-looking demand multiplier per tower | +| + Prescriptive | $5M plan, 122 Gbps, all 15 covered, prioritized by social impact | + +**Multi-reasoner chaining grounded in (and contributing to) the ontology.** + +--- + +## Data Reference + +- **Source data**: bundled CSVs in `../data/` (the main template ships ~1.2 MB of synthetic-but-realistic telco data — 250 cell towers, 1,200 subscribers, 6,000 CDRs, 3,285 daily KPI rows across 9 regions). +- **Ontology**: the template's main script uses a focused 7-concept subset of a broader 18-concept telco knowledge graph (PostalArea, Subscriber, Contract, BillingEvent, CellTower, NetworkEquipment, EquipmentHealth, NetworkEvent, CallDetailRecord, SupplierOrder, Campaign, PromotionRedemption, RevenueForecast, NetworkPerformance, SupportTicket, TimeSeriesMetric, TowerUpgradeOption, Part) — sufficient for the four-stage chain. +- **Stages**: implemented in `../telco_network_recovery.py` as a single combined script with stage banners. From c391890995d7757b900dad8ab14f1d6c7840c83a Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 08:55:58 -0700 Subject: [PATCH 02/22] Hybrid runbook pass: discovery + per-stage skill/prompt blocks Apply the dual-audience plan in dev_temp/pr57_runbook_hybrid_plan.md to all five runbooks. Each runbook now serves both the stakeholder (narrative + ASCII visualizations) and the practitioner (explicit skill + prompt to recreate the stage): - "How to read this runbook" preface explaining the dual purpose - Step 0 discovery section using rai-discovery to scope sub-questions to reasoner families before any chain stage runs - Skill / Prompt boxed callout immediately under every Stage heading (rai-querying, rai-rules-authoring, rai-graph-analysis, rai-prescriptive-*) - "Adapting this recipe to a new domain" closing section Adds portfolio_balancing/references/runbook.md (4 stages: rules -> graph clustering -> bi-objective Markowitz frontier -> crisis-regime stress test, all aligned to the template's actual 8-stock dataset and epsilon-rate frontier sweep). Numbers cross-referenced against each template's README and main script; reflects the templates as shipped (not the larger demos they were sourced from). Prompts use domain-natural language with no Concept.property syntax inside the prompt strings. --- v1/energy_grid_planning/references/runbook.md | 78 +++ v1/machine_maintenance/references/runbook.md | 79 +++ v1/portfolio_balancing/references/runbook.md | 498 ++++++++++++++++++ .../references/runbook.md | 53 ++ .../references/runbook.md | 57 +- 5 files changed, 764 insertions(+), 1 deletion(-) create mode 100644 v1/portfolio_balancing/references/runbook.md diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index 4dc729a..d963954 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -44,6 +44,41 @@ A single-reasoner approach can't answer this. Predictive alone forecasts demand --- +## How to read this runbook + +This runbook serves two audiences: + +- **Reading top-to-bottom**: the narrative + ASCII visualizations show what + the chain produces stage-by-stage, with the same business framing the + stakeholder would see. +- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the + start of each stage is the recipe — load that RAI agent skill, give it + that prompt against the bundled demo data, and the agent will reproduce + the stage. + +--- + +## Step 0 — Scope the question with `rai-discovery` + +> **Skill:** `rai-discovery` · +> **Prompt:** "We have 10 hyperscaler interconnection requests totalling 2,930 MW queued against a 12-substation ERCOT-style Texas grid. Classify the sub-questions we need to answer to decide which to approve, which substation upgrades to fund, and at what budget level — map each sub-question to the reasoner family that should handle it." + +Discovery classifies the question by reasoner family and tells you which +downstream skills to load: + +| Sub-question | Reasoner | Skill | +|---|---|---| +| Which substations are losing headroom fastest and will breach capacity? | Predictive | _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ | +| Which substations are structural bottlenecks on the transmission grid? Which communities? | Graph | `rai-graph-analysis` | +| Which DC requests fail capacity, structural-risk, or low-carbon compliance? | Rules | `rai-rules-authoring` | +| Across budget levels, which DCs should we approve and which upgrades should we fund? | Prescriptive | `rai-prescriptive-problem-formulation` | +| What does the Pareto frontier tell us about the knee, marginal value, and binding constraints? | Prescriptive (post-solve) | `rai-prescriptive-results-interpretation` | + +Discovery's output is a *plan*, not code. Everything that follows +materializes that plan. + +--- + ## Prerequisites The template ships everything needed to run the chain end-to-end: @@ -78,6 +113,9 @@ Steps are sequential — each depends on prior steps. Steps without a skill are ## Stage 1 — Predictive: substation load forecasting +> **Skill:** _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ · +> **Prompt:** "Forecast each substation's future peak load by aggregating the maximum predicted load across the 6/12/18/24-month forecast horizons in the demand-forecast table, and write the result back to every substation as a derived load-projection property. The downstream rules engine and optimizer both need to read this same forecasted headroom — fall back to the substation's current load only when no forecast row exists. Flag substations whose predicted load exceeds their nameplate capacity within the horizon and report which one breaches first." + **Method:** load max forecasted load per substation as `Substation.predicted_load`. The template aggregates `DemandForecast.predicted_load_mw` over forecast horizons (6/12/18/24 months) and writes the max back to the substation. A pre-trained GNN can replace the table lookup; the script falls back gracefully when the GNN model registry is unavailable. ``` @@ -121,6 +159,9 @@ effective_load = Substation.predicted_load | Substation.current_load_mw ## Stage 2 — Graph: grid topology & structural vulnerability +> **Skill:** `rai-graph-analysis` · +> **Prompt:** "Build an undirected, unweighted graph using `Substation` directly as the node concept and active transmission lines as edges. Run weakly connected components to confirm grid connectivity, Louvain community detection to surface ERCOT regional clusters, and the betweenness/degree/eigenvector centrality trio. Combine the three centrality ranks into a composite rank and flag the top 3 substations as structurally critical, writing the centrality scores, community label, and criticality flag back to each substation." + **Construction** — `Substation` as the node concept directly (no mirror concept): - Node concept: `Substation` (12 nodes) - Edges: active `TransmissionLine` rows, `from_substation` → `to_substation` @@ -169,6 +210,9 @@ Centrality (top-3 marked is_structurally_critical) ## Stage 3 — Rules: interconnection queue compliance +> **Skill:** `rai-rules-authoring` · +> **Prompt:** "Author three declarative compliance rules per data-center request, each consuming an upstream enrichment. Rule 1 fails capacity when the request's MW plus the substation's forecasted load (with current load as fallback) exceeds the substation's nameplate capacity. Rule 2 fails structural risk when the target substation is flagged structurally critical from Stage 2. Rule 3 fails the low-carbon mandate when the substation's zero-emission generation share is below the request's required percentage; sum capacity for generators with emissions rate of zero. Add a composite `is_compliant` flag that fires only when none of the three failure flags fire." + Three declarative `Relationship` rules consume Stages 1–2 enrichments. Each is written as a `model.where(...).define(...)` block; a composite `is_compliant` fires only when none of the three failure flags fire. **Rule 1 — Capacity** (consumes `Substation.predicted_load` from Stage 1): @@ -231,6 +275,9 @@ Every request passes low-carbon — ERCOT's nuclear (STP, Comanche Peak) plus it ## Stage 4 — Prescriptive: joint DC approval + upgrade MIP +> **Skill:** `rai-prescriptive-problem-formulation` · +> **Prompt:** "Formulate a single MIP that picks DC approvals and substation upgrades jointly across five budget scenarios at $200M, $300M, $400M, $500M, and $600M. Model the budget as an `InvestmentLevel` Scenario Concept and index both the binary approve and binary upgrade decision variables by it, so one solve produces the full Pareto frontier — no per-budget re-solve loop. Constrain per substation per scenario that approved DC load fits within nameplate capacity minus forecasted load plus selected upgrade headroom, and per scenario that selected upgrade costs stay within the scenario's budget cap. Maximize total annual interconnection revenue summed across all scenarios." + ``` FORMULATION @@ -305,6 +352,9 @@ PARETO FRONTIER (queried directly from ontology) ## Stage 5 — Interpretation +> **Skill:** `rai-prescriptive-results-interpretation` · +> **Prompt:** "Read the per-scenario decision-variable properties straight from the ontology — approved DCs, selected upgrades, total MW, revenue, and amortized upgrade cost — and present the Pareto frontier across the five investment levels. Compute marginal value per added budget dollar between adjacent levels and call out the knee where the marginal jump is largest. Translate the result into business language: which DCs unlock at the knee, which substation is the binding bottleneck, and which requests are infeasible at every budget level." + ``` THE PLAN, IN BUSINESS TERMS @@ -416,6 +466,34 @@ THE ENERGY GRID PLANNING CHAIN --- +## Adapting this recipe to a new domain + +The chain pattern transfers cleanly. To rebuild for a different problem: + +1. Re-run `rai-discovery` on the new business question — does it actually + need all 4 reasoner families, or is one or two sufficient? +2. Strip the demo ontology to the concepts the new chain needs (lean is + better for type inference and solver compile time). +3. Stage 1 (Predictive) is optional — if you have forecast tables already, + a simple `aggs.max(...).per(...)` derived property is enough; swap in + a GNN later by pointing the predictive reasoner at your model registry. +4. Stages 2–4 are the load-bearing chain: graph centrality flags + structurally critical nodes, rules consume both the predictive forecast + and the criticality flag to fail or pass each request, and the + prescriptive MIP reads the same forecast as a capacity baseline while + indexing decision variables by an `InvestmentLevel` Scenario Concept so + one solve produces the full Pareto frontier. +5. Keep the validation checks at every stage: assert the predicted-load + write covers all nodes, the top-N critical set looks plausible against + the topology, the compliance table has at least one PASS and one FAIL, + and the optimizer reports OPTIMAL with a non-zero objective. + +The shape this template demonstrates — *each reasoner writes a property +the next reasoner reads* — is what makes the chain accretive rather than +serial. The agent skills are how you reliably author each link. + +--- + ## Data Reference **Substations with DC requests:** diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index 98760d9..42c4abc 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -47,6 +47,34 @@ A single-reasoner approach can't surface this. OEE alone says Plant_B is the pro --- +## How to read this runbook + +This runbook serves two audiences: + +- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. +- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data, and the agent will reproduce the stage. + +--- + +## Step 0 — Scope the question with `rai-discovery` + +> **Skill:** `rai-discovery` · +> **Prompt:** "We need to schedule preventive maintenance for a multi-plant operation: 30 machines across 3 plants, 10 technicians across 3 cities, a 4-period horizon. Classify the sub-questions we must answer, map each to the right reasoner family, and tell us which downstream skills to load. Where does OEE alone mislead us, and what structural risks (qualifications, technician concentration) won't a pure optimizer surface?" + +Discovery classifies the question by reasoner family and tells you which downstream skills to load: + +| Sub-question | Reasoner | Skill | +|---|---|---| +| Where does the operation actually hurt — OEE, anomalies, failure trajectories? | Querying / Descriptive | `rai-querying` | +| Which machines are scheduling bottlenecks given shared technician pools? | Graph | `rai-graph-analysis` | +| Which machines are overdue, high-risk, chronic, or composite-Critical? | Rules | `rai-rules-authoring` | +| What's the optimal maintain-and-assign plan across 4 periods? | Prescriptive | `rai-prescriptive-problem-formulation` | +| Where is the schedule structurally fragile, and what cross-training fixes it? | Prescriptive (re-solve / interpretation) | `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` | + +Discovery's output is a *plan*, not code. Everything that follows materializes that plan. + +--- + ## Setup See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `machine_maintenance.py`. @@ -82,6 +110,43 @@ The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skil --- +## Stage 0 — Querying: operational intelligence + +> **Skill:** `rai-querying` · +> **Prompt:** "From the loaded production runs, sensor readings, and per-period failure predictions, compute an OEE proxy by facility (Performance × Quality), list machines with above-threshold sensor anomalies grouped by facility, and identify the six machines with the steepest failure-probability rise from period 1 to period 4. Write the per-period failure prediction back as a `MachinePeriod` property so downstream stages can read it." + +This stage establishes the operational baseline. Plant_C leads at 79.8% OEE; Plant_B trails at 61.4%. But Plant_A — middle of the OEE pack at 68.2% — owns 7 of 9 sensor anomalies and the three steepest failure trajectories (M001, M013, M016). The querying stage writes nine derived properties on `Machine` plus `MachinePeriod.predicted_fp` (120 rows), and Stage 3 reads `predicted_fp` directly into the failure-cost objective term. + +## Stage 1 — Graph: dependency clusters and bottleneck centrality + +> **Skill:** `rai-graph-analysis` · +> **Prompt:** "Build an undirected graph using `Machine` directly as the node concept. Two machines are adjacent when at least one technician is qualified for both of their machine types. Run weakly connected components to find dependency clusters, then compute betweenness centrality, normalize it to 0..1 against the max, and store both raw and normalized scores back on the machine." + +The 30 machines form a single connected component — every machine is reachable through shared qualifications. Pump-type machines tie at the top of betweenness (raw 24.0, normalized 1.0): M003 (Plant_C), M008 (Plant_B), M013 (Plant_A). The normalized centrality is consumed by Stage 3's failure-cost multiplier `(1 + 2.0 × betweenness)`, so leaving a bottleneck Pump vulnerable is markedly more expensive than leaving a peripheral Motor vulnerable. + +## Stage 2 — Rules: compliance flags and composite risk tier + +> **Skill:** `rai-rules-authoring` · +> **Prompt:** "Author seven derived flags on the existing concepts: overdue (remaining useful life below required maintenance hours), high-risk (failure probability above 0.3 and criticality at least 4), anomalous (any sensor anomaly), chronic-downtime (more than 8 events), parts-reorder (stock at or below minimum order), and certification-expiring (under 30 days remaining). Then chain three of those flags — chronic, high-risk, overdue — into a `risk_tier` property of Critical / Elevated / Standard, exhaustively enumerating all eight three-flag combinations using negation." + +Six machines overdue, one high-risk (M013), three chronic-downtime, four parts-reorder, five expiring certs. The composite tier surfaces a single Critical machine — M013 (Pump, Plant_A) — and a single Elevated machine — M016 (Turbine, Plant_A). The overdue flag is consumed by Stage 3 as a hard constraint: every overdue machine must be scheduled by period 2. + +## Stage 3 — Prescriptive: maintenance schedule + +> **Skill:** `rai-prescriptive-problem-formulation` · +> **Prompt:** "Schedule preventive maintenance over 4 periods. Decide for each machine and period whether to maintain it, whether it remains vulnerable, and which qualified technician services it. Enforce cumulative coverage (each machine is either maintained or vulnerable through every period), assignment-maintenance linkage (one tech per maintained job), per-tech hour capacity, a 5-jobs-per-period parts/bay limit, and the hard rule that every overdue machine from Stage 2 is maintained by period 2. Minimize failure cost (period-specific failure probability × parts cost × criticality, scaled up by `(1 + 2.0 × betweenness)`) plus labor cost (duration × hourly rate) plus travel cost ($50/hour for non-co-located assignments)." + +The solver returns OPTIMAL with objective $605,240.61 and 20 maintenance jobs across the four periods (capacity-binding at 5 jobs/period). M013 and M016 — Plant_A's Critical and Elevated machines — are both scheduled by period 1, satisfying the overdue deadline. Several Turbine assignments require travel because all three Turbine-qualified techs are based in Houston_TX while four of six Turbines sit at Plant_A and Plant_C. The optimizer pays the travel cost; it cannot restructure the qualification pool. + +## Stage 4 — Resilience: concentration sweep and cross-training + +> **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · +> **Prompt:** "Take the optimal schedule and stress-test the qualification structure underneath it. For each machine type, identify whether all qualified technicians are concentrated in one location, count the scheduled jobs of that type that required travel, and surface the geographic single-point-of-failure. Then re-rank the cross-training options for the concentrated type by cost, prefer non-local candidates, and recommend the cheapest one with its training cost and duration. This is a follow-up sensitivity sweep over the existing solve, not a new formulation." + +Turbine is the concentrated type — all three qualified techs (T001, T002, T003) sit in Houston_TX, and 67% of scheduled Turbine jobs already require travel. The recommended fix: cross-train T006 (Senior, Chicago_IL) for $3,200 over 5 weeks. That single addition eliminates the Houston single-point-of-failure for Turbine work at Plant_B and Plant_C, and pays back the first time a weather event, illness, or expiring cert would have idled a Turbine job that the optimizer would otherwise have left uncovered. + +--- + ## Stage outputs — what each reasoner contributes back ``` @@ -202,6 +267,20 @@ THE MACHINE-MAINTENANCE CHAIN --- +## Adapting this recipe to a new domain + +The chain pattern transfers cleanly. To rebuild for a different scheduling-with-resilience problem: + +1. Re-run `rai-discovery` on the new business question — does it actually need all five reasoner families, or is one or two sufficient? A pure dispatch problem may only need querying + prescriptive; a pure compliance problem may only need rules. +2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). Keep the cross-product concept (`MachinePeriod`-equivalent) — it's where most decision variables and per-period derived properties live. +3. Stage 1 (querying) is required scaffolding: the optimization objective leans on derived per-period signals, not raw inputs. +4. Stages 2–5 are the load-bearing chain: graph centrality writes a multiplier the cost objective consumes; rules write a hard deadline the optimizer must satisfy; the optimizer writes solution variables the resilience sweep reads; the resilience layer doesn't re-solve a new problem — it stress-tests the structure underneath the existing solve and recommends a structural fix. +5. Keep the validation checks at every stage: assert flagged-set size, betweenness top-N looks plausible, the OPTIMAL gate, the objective is not zero, and the resilience pass surfaces at least one actionable recommendation when concentration exists. + +The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. + +--- + ## Optional extension — operator-shift assignment A second optimization pass — assigning operators to line-shifts to maximize a skill-match bonus, subject to a same-facility constraint — is a natural follow-on but not part of this template's main script. It would consume the same `Machine` and facility ontology and add `Operator`, `Shift`, and `OperatorShift` concepts. Out of scope for this runbook. diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md new file mode 100644 index 0000000..ea6f43c --- /dev/null +++ b/v1/portfolio_balancing/references/runbook.md @@ -0,0 +1,498 @@ +# Runbook: Portfolio Balancing — Multi-Reasoner Walkthrough + +Walk-through of the chained-reasoner pattern this template is built on. One realistic portfolio thread — **rebalance an 8-stock universe under compliance + crisis stress** — traced across rules, graph, and prescriptive reasoners, each stage writing properties back to the same ontology that downstream stages consume. + +The template's combined script (`portfolio_balancing.py`) implements all four stages directly; this runbook expands the surrounding narrative — what each prompt asks, what shape of output to expect, and how each enrichment feeds the next — so a reader can follow the reasoning thread end-to-end without re-running the script. + +--- + +## TL;DR — the chain in one screen + +``` +The current book breaks compliance on 4 holdings + 2 sectors. Naive +"diversification" hides correlated bets. The chain collapses 8 stocks +into 5 distinct cluster representatives, traces the efficient frontier +under base + crisis covariance, and shows crisis vol sits 25-30% above +base at every lambda — without the cluster collapse, the gap would grow. + + ───────────────────────────────────────────────────────────────── + STAGE 1 Rules ──► Holding.is_overconcentrated (4) + Holding.is_sector_concentrated (2) + User.is_high_risk_trader (2) + 4 holdings > 15% of balance, 2 sectors + > 30%, 2 traders with risk > 0.8 + flagged. + ───────────────────────────────────────────────────────────────── + STAGE 2 Graph ──► Stock.variance / volatility / correlation + Stock.cluster, Stock.is_representative (5) + 4 edges (|rho| >= 0.3), 5 Louvain clusters, + intra +0.683 vs inter +0.131. + ───────────────────────────────────────────────────────────────── + STAGE 3 Prescriptive ──► Stock.x_quantity (per Scenario) + (QP) 6 scenarios = 3 budgets x 2 regimes. + Anchors + 5 epsilon points = 7-point + frontier per scenario. Knee at eps_1. + ───────────────────────────────────────────────────────────────── + STAGE 4 Stress ──► Stock.regime_covar (PSD-preserving) + Crisis vol 25-30% above base at every + lambda. Gap peaks mid-frontier, narrows + toward the concentrated end. + ───────────────────────────────────────────────────────────────── +``` + +A single-reasoner approach can't answer this. Rules alone flag broken positions but don't rebuild the book. Graph alone clusters stocks but doesn't allocate. Prescriptive alone solves a Markowitz QP — but on the full universe it stacks near-duplicate cluster members, and on a single covariance it ignores regime risk. Each stage feeds the next: rules surface the violations, graph collapses redundant bets, prescriptive optimizes over the reduced universe across both regimes in one solve. + +--- + +## How to read this runbook + +This runbook serves two audiences: + +- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. +- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data, and the agent will reproduce the stage. + +--- + +## Step 0 — Scope the question with `rai-discovery` + +> **Skill:** `rai-discovery` · +> **Prompt:** "We have an 8-stock portfolio book that breaks compliance on 4 holdings and 2 sectors. We want to rebuild the book under a Markowitz mean-variance objective with position and sector caps, deduplicate redundant bets via correlation clustering, and stress-test the resulting frontier under a crisis regime where correlations spike. Classify the sub-questions by reasoner family and tell us which RAI skills to load." + +Discovery classifies the question by reasoner family and tells you which downstream skills to load: + +| Sub-question | Reasoner | Skill | +|---|---|---| +| Where is the current book breaking compliance (per-stock, per-sector, per-trader)? | Rules | `rai-rules-authoring` | +| Which stocks are really the same bet (correlation clusters + cluster representative)? | Graph | `rai-graph-analysis` | +| What's the optimal allocation under position + sector caps for each (budget, regime) scenario? | Prescriptive | `rai-prescriptive-problem-formulation` | +| How does the optimal portfolio degrade under a PSD-preserving crisis covariance? | Prescriptive (re-solve) | `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` | + +Discovery's output is a *plan*, not code. Everything that follows materializes that plan. + +--- + +## Setup + +See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `portfolio_balancing.py` against the bundled CSVs in `../data/`. + +**Prerequisites** + +- Template's `data/` CSVs available — `returns.csv` (8 stocks), `covar.csv` (64 covariance entries), `users.csv`, `accounts.csv`, `holdings.csv`, `transactions.csv`. Or your own Snowflake schema with equivalent tables. +- `raiconfig.yaml` pointing at your RAI engine +- Python >= 3.10 with `relationalai >= 1.0.14` + +--- + +## Workflow + +The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skill order. Each row maps to a section of the script. + +| # | Step | Skill | Prompt | Expected Output | +|---|------|-------|--------|-----------------| +| 1 | Build ontology | `/rai-build-starter-ontology` | "Build a RAI ontology for portfolio balancing from the CSVs in `data/`. Concepts: Stock (index, ticker, sector, returns, covar), Sector (derived from Stock sectors), User (with risk_score), Account (balance, account_type), Holding (quantity, purchase_price, value), Transaction (amount, category, flagged indicator)." | Model `portfolio` with 6 user-facing concepts. 8 Stocks across 5 sectors (Technology x3, Healthcare x2, Financials, Energy, Consumer Staples). 6 Users, 4 Accounts, 15 Holdings, 21 Transactions. Pairwise `Stock.covar(i, j)` two-arg property loaded from the long-form covariance CSV (64 entries, symmetric). | +| 2 | Discovery | `/rai-discovery` | "What questions can we answer with this ontology? We want to surface compliance violations on the current book, then rebuild it under a Markowitz objective with crisis-regime stress." | Rules: position-limit, sector-concentration, high-risk-trader flags as derived Relationships. Graph: covariance clustering (Louvain on |correlation| >= threshold) -> per-cluster representative by Sharpe -> investable-universe collapse. Prescriptive: bi-objective QP via epsilon constraint over the representative-only universe, indexed by a `Scenario` Concept that combines budget and regime. Stress: same `solve_epsilon` call under a PSD-preserving crisis covariance. Predictive: DATA_GAP (no time series). | +| 3 | Stage 1 — Compliance: overconcentrated holdings | `/rai-rules-authoring` | "Define `Holding.value = quantity * purchase_price`, then flag `Holding.is_overconcentrated` whenever `value > POSITION_LIMIT * Account.balance` (default 15%)." | 4 holdings flagged: AAPL (Account 1, 18.0%), MSFT (Account 1, 16.0%), JNJ (Account 4, 16.0%), PFE (Account 4, 16.2%). All Account-1 (Alice Chen, balance $100K) and Account-4 (Dan Wilson, balance $80K) — both have multiple positions clipping the 15% cap. | +| 4 | Stage 1 — Compliance: sector concentration | `/rai-rules-authoring` | "Aggregate `Holding.value` per (Account, Sector) and flag `Holding.is_sector_concentrated` whenever the sector total > `SECTOR_LIMIT * Account.balance` (default 30%)." | 2 (account, sector) pairs flagged: Account 1 Technology 34.0% (AAPL + MSFT), Account 4 Healthcare 32.5% (JNJ + PFE). Pattern: the same accounts driving Stage 1.3 stack within a sector. Stage 3 enforces the same 30% cap as a hard constraint. | +| 5 | Stage 1 — Compliance: high-risk traders | `/rai-rules-authoring` | "Flag `User.is_high_risk_trader` whenever `risk_score > 0.8` AND > 5 flagged transactions." | 2 users flagged: Alice Chen (risk 0.85), Eve Taylor (risk 0.92). Both have multiple `Transaction.is_flagged == True` rows in the bundled data. Standalone signal — used as a reviewer trigger, not a solver constraint. | +| 6 | Stage 2 — Derive volatility + correlation in PyRel | `/rai-ontology-design` | "Add `Stock.variance` (covariance diagonal where i == j), `Stock.volatility = sqrt(variance)`, and pairwise `Stock.correlation(i, j) = covar(i, j) / (vol_i * vol_j)`. All in PyRel — no numpy precompute." | `Stock.variance` and `Stock.volatility` written for all 8 stocks. `Stock.correlation` two-arg property populated from the 64 covariance pairs. Volatilities span ~0.06 to ~0.10 (small dataset, daily-scale). Storing in the ontology lets every downstream stage read the same source of truth instead of recomputing. | +| 7 | Stage 2 — Cluster the correlation graph | `/rai-graph-analysis` | "Build an undirected `Graph` with `Stock` as `node_concept`. Add an edge wherever `|correlation| >= CORR_THRESHOLD` (default 0.3) and `i < j` (deduplicate symmetric pairs). Run Louvain and persist `Stock.cluster`." | 4 edges retained (8 stocks, sparse graph). Louvain yields 5 communities: Cluster 1 = {JNJ, PFE} (Healthcare), Cluster 2 = {AAPL, MSFT, GOOGL} (Technology), plus singletons Cluster 3 = {JPM}, Cluster 4 = {PG}, Cluster 5 = {XOM}. Intra-cluster avg correlation = +0.683, inter-cluster = +0.131 (~5x separation — clean). | +| 8 | Stage 2 — Pick cluster representatives by Sharpe | `/rai-graph-analysis` | "Define `Stock.sharpe = returns / volatility`. For each cluster, the stock whose Sharpe equals the cluster max is the representative — set `Stock.is_representative`. Singletons are their own representative. Define `Stock.is_non_representative` as the positive complement (the prescriptive rewriter doesn't accept `model.not_(...)` inside a solver `.where()`)." | 5 representatives picked from 8 stocks: Cluster 1 PFE (Healthcare, Sharpe 0.530), Cluster 2 GOOGL (Technology, 0.605), Cluster 3 JPM (Financials, 0.500), Cluster 4 PG (Consumer Staples, 0.444), Cluster 5 XOM (Energy, 0.588). Investable universe shrinks from 8 to 5. Non-representatives — AAPL, MSFT, JNJ — get `Stock.is_non_representative` and are forced to zero in Stage 3. | +| 9 | Stage 3 — Scenario + regime + decision variable | `/rai-prescriptive-problem-formulation` | "Define `Regime` (`base`, `crisis`) and a `Scenario` Concept indexed by name with `budget` and `regime` properties. Load 6 scenarios = {500, 1000, 2000} x {base, crisis}. Add `Stock.regime_covar(i, j, Regime) = base covariance for base; alpha * covar(i,j) + (1 - alpha) * vol_i * vol_j for crisis` (PSD-preserving correlation shrinkage in covariance units, alpha = 0.7). Decision variable: `Stock.x_quantity(Stock, Scenario)` continuous." | 1 `Stock.x_quantity` property, 8 stocks x 6 scenarios = 48 continuous variables (40 of which will be hard-zero from the representative-only constraint). Regime-conditioned covariance lives in the ontology — the QP picks each scenario's matching regime without any branching in solver code. | +| 10 | Stage 3 — Compliance constraints on the decision variable | `/rai-prescriptive-problem-formulation` | "Add: non-negative (`x_qty >= 0`); fully invested per scenario (`sum(x) == Scenario.budget`); per-rep position cap (`x_qty <= REP_POSITION_LIMIT * Scenario.budget`, default 0.30); per-sector cap (`sum(x_qty per sector) <= SECTOR_LIMIT * Scenario.budget`, default 0.30); representative-only (`x_qty == 0` where `Stock.is_non_representative()`)." | 5 constraint families. `REP_POSITION_LIMIT = 0.30` is intentionally higher than Stage 1's `POSITION_LIMIT = 0.15`: a representative carries its cluster's combined exposure, and feasibility requires `REP_POSITION_LIMIT * num_reps >= 1.0` (5 x 0.30 = 1.5, OK). `SECTOR_LIMIT = 0.30` is reused verbatim from Stage 1's compliance threshold — same parameter binds the existing book and the rebuilt one. | +| 11 | Stage 3 — Anchors + epsilon sweep | `/rai-prescriptive-solver-management` | "Anchor 1: minimize risk (no return constraint). Anchor 2: maximize return. Compute return-rate range across all 6 scenarios. Then sweep 5 interior epsilon-rate points uniformly across the rate range and resolve `min risk s.t. return_rate >= eps_rate`. Use Ipopt, time limit 60s." | Per-scenario rate range: [0.0634, 0.0840] per unit invested (regime-independent because `Stock.returns` is regime-independent — only the covariance differs). 7 solves total: 2 anchors + 5 epsilon points = 42 optimal portfolios (6 scenarios x 7 points), all `LOCALLY_SOLVED`. base_500 anchor 1: return 32.43, risk 1160.39. base_500 anchor 2: return 42.00. crisis_500 anchor 1: return 31.69, risk 1913.60 — same investable universe, different regime covariance. | +| 12 | Stage 3 — Pareto + knee | `/rai-prescriptive-results-interpretation` | "For each scenario, list the 7-point frontier (return, risk). Print marginal `delta_risk / delta_return` between adjacent points and identify the knee — where the marginal jumps most." | base_500: returns 32.43 -> 33.41 -> 35.12 -> 36.84 -> 38.56 -> 40.28; risk 1160 -> 1177 -> 1263 -> 1386 -> 1546 -> 1742. Marginal climbs 16.85 -> 49.94 -> 71.72 -> 93.03 -> 114.43. Knee at Point 2 (`eps_1`) — marginal cost jumps ~3x beyond. base_1000 / base_2000 / crisis_* show the same shape (risk scales as budget^2 because the QP is quadratic, but the rate-form frontier is identical). | +| 13 | Stage 4 — Crisis stress comparison | `/rai-prescriptive-results-interpretation` | "From the Stage 3 sweep, emit a side-by-side `vol_base` vs `vol_crisis` table per (budget, lambda). Vol = sqrt(risk). Compute the absolute and percentage gap." | Same 7-point frontier resolved at each budget x regime. Crisis vol sits ~25-30% above base at every lambda. Budget 500: at `min_risk` vol_base 34.06 / vol_crisis 43.74 (+28.4%); at `eps_1` 34.30 / 44.54 (+29.8%); peaks at `eps_1`-`eps_2`; narrows to +25.2% at `eps_5`. The gap peaks mid-frontier and narrows toward the concentrated end — the inversion is the payoff of the cluster collapse: at the concentrated end the optimizer holds the highest-Sharpe distinct bet per cluster (weighted toward Energy/Consumer Staples here, which carry lower crisis correlations than the middle of the frontier). Without the representative collapse, the concentrated end would stack near-duplicates and the crisis gap would grow instead of shrink. | + +--- + +## Stage 1 — Rules: compliance scan + +> **Skill:** `rai-rules-authoring` · +> **Prompt:** "Add a derived `Holding.value` from `quantity * purchase_price`, then author three compliance flags as Relationships on the existing book. Flag a holding as overconcentrated when its value exceeds 15% of the parent account's balance. Flag it as sector-concentrated when the per-account, per-sector total of holding values exceeds 30% of balance. Flag a user as a high-risk trader when their risk score is above 0.8 and they have more than five flagged transactions." + +``` +COMPLIANCE VIOLATIONS — current book (4 accounts, 15 holdings, 6 users) + + Rule 1: Holding.is_overconcentrated (position > 15% of balance) + ──────────────────────────────────────────────────────────────── + AAPL Account 1 $18,000 / $100,000 18.0% ─── Alice Chen + MSFT Account 1 $16,000 / $100,000 16.0% + JNJ Account 4 $12,800 / $80,000 16.0% ─── Dan Wilson + PFE Account 4 $13,000 / $80,000 16.2% + + Rule 2: Holding.is_sector_concentrated (sector > 30% of balance) + ──────────────────────────────────────────────────────────────── + Account 1 Technology $34,000 / $100,000 34.0% + Account 4 Healthcare $25,800 / $80,000 32.2% + + Rule 3: User.is_high_risk_trader (risk_score > 0.8 AND >5 flagged txns) + ──────────────────────────────────────────────────────────────── + Alice Chen risk_score 0.85 ── flagged transactions + Eve Taylor risk_score 0.92 + + ────────────────────────────────────────────────────────────────── + The same accounts that breach the per-stock cap also breach the + sector cap — Stage 3 will use the SECTOR_LIMIT (0.30) as a hard + constraint when rebuilding both books. The trader flag is a + reviewer signal, not a solver input. + ────────────────────────────────────────────────────────────────── + + Holding.is_overconcentrated [4] + Holding.is_sector_concentrated [2] + User.is_high_risk_trader [2] +``` + +`POSITION_LIMIT` (0.15) and `SECTOR_LIMIT` (0.30) are top-level constants. `SECTOR_LIMIT` is reused verbatim by Stage 3; `POSITION_LIMIT` is replaced in Stage 3 by `REP_POSITION_LIMIT = 0.30` because a representative carries its cluster's combined exposure (and 5 reps x 0.20 = 1.00 would already pin the budget — 0.30 leaves headroom). + +--- + +## Stage 2 — Graph: covariance clustering + cluster representatives + +> **Skill:** `rai-graph-analysis` · +> **Prompt:** "Derive per-stock variance from the covariance diagonal, volatility as the square root of variance, and pairwise correlation as covariance over the product of volatilities — all in PyRel, no numpy precompute. Build an undirected stock graph with an edge wherever the absolute correlation is at least 0.3 and the index pair is deduplicated. Run Louvain community detection and persist the cluster id back onto each stock. Then derive a Sharpe ratio per stock (returns over volatility), and flag the stock whose Sharpe equals the cluster maximum as the cluster representative; flag the others as non-representative so the optimizer can force them to zero." + +**Construction** — undirected, unweighted graph: +- Node concept: `Stock` (8 nodes) +- Edges built from the derived `Stock.correlation(i, j)` property where `|correlation| >= 0.3` and `i < j` +- Aggregator: `"sum"` (no parallel edges expected) + +**Algorithm:** `louvain()` for community detection. + +**Volatility, correlation, and crisis covariance are all PyRel derived properties** — no numpy precompute. The covariance matrix loaded from `covar.csv` is the only solver input not derived from another property. + +``` +DERIVED IN PYREL + Stock.variance <- covar(i, j) where i == j + Stock.volatility <- sqrt(variance) + Stock.correlation(i,j) <- covar(i, j) / (vol_i * vol_j) + +CORRELATION GRAPH + Edges with |correlation| >= 0.30: 4 + Stocks above threshold pairwise: {AAPL,MSFT}, {AAPL,GOOGL}, + {MSFT,GOOGL}, {JNJ,PFE} + +LOUVAIN COMMUNITIES 5 clusters + Cluster 1 (size 2): JNJ (Healthcare), PFE (Healthcare) + Cluster 2 (size 3): AAPL (Technology), MSFT (Technology), + GOOGL (Technology) + Cluster 3 (size 1): JPM (Financials) ← singleton + Cluster 4 (size 1): PG (Consumer Staples) ← singleton + Cluster 5 (size 1): XOM (Energy) ← singleton + + Avg correlation: intra-cluster = +0.683 + inter-cluster = +0.131 ── ~5x separation +``` + +Singletons (Cluster 3-5) are their own representatives. The non-trivial choice happens inside Cluster 1 (JNJ vs PFE) and Cluster 2 (AAPL/MSFT/GOOGL): + +``` +REPRESENTATIVE = HIGHEST SHARPE PER CLUSTER (returns / volatility) + + Cluster 1 (Healthcare): + JNJ Sharpe 0.500 PFE Sharpe 0.530 ← REP + + Cluster 2 (Technology): + AAPL Sharpe 0.582 MSFT Sharpe 0.560 + GOOGL Sharpe 0.605 ← REP + + Cluster 3 (Financials): JPM Sharpe 0.500 ← REP (singleton) + Cluster 4 (Consumer Staples): PG Sharpe 0.444 ← REP (singleton) + Cluster 5 (Energy): XOM Sharpe 0.588 ← REP (singleton) + + ────────────────────────────────────────────────────────────────── + Investable universe collapses 8 -> 5. + AAPL, MSFT, JNJ get Stock.is_non_representative — Stage 3 forces + their decision variables to zero. + + This is "collapse, don't cap" — Stage 3 doesn't allow the full 8 + with caps inside a redundant cluster; it removes the duplicates + before the optimizer sees them. + ────────────────────────────────────────────────────────────────── + + ✓ Stock.variance / volatility / correlation written back [8 / 8 / 64] + ✓ Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe [8 each] + ✓ Stock.is_representative [5] + ✓ Stock.is_non_representative [3] +``` + +--- + +## Stage 3 — Prescriptive: bi-objective QP with epsilon constraint + +> **Skill:** `rai-prescriptive-problem-formulation` · +> **Prompt:** "Formulate a Markowitz mean-variance QP with a continuous decision variable `Stock.x_quantity` indexed by a `Scenario` Concept that combines budget and regime — load the six scenarios for budgets {500, 1000, 2000} crossed with regimes {base, crisis}. Add five constraint families per scenario: non-negativity, fully invested at the scenario budget, per-representative position cap at 30% of budget, per-sector cap at 30% of budget, and a hard zero on every non-representative stock. Anchor the frontier with a min-risk solve and a max-return solve, then sweep five interior epsilon-rate points uniformly across the resulting return-rate range, each one re-solving min-risk subject to a per-scenario return floor that scales with budget. Use Ipopt with a 60-second time limit." + +``` +FORMULATION + + Decision variable + Stock.x_quantity(Stock, Scenario) continuous, >= 0 + 8 stocks x 6 scenarios = 48 vars + 40 forced to 0 by Stock.is_non_representative() + 8 active = 5 representatives x — wait: 5 reps x 6 scenarios = 30 + (the 18 singleton non-rep slots are also forced — same effect) + + Scenarios (3 budgets x 2 regimes = 6 tuples) + base_500 base_1000 base_2000 crisis_500 crisis_1000 crisis_2000 + + Constraints (per scenario) + 1. Non-negative x_qty >= 0 + 2. Fully invested sum(x_qty) == Scenario.budget + 3. Per-rep position x_qty <= 0.30 * Scenario.budget + 4. Per-sector sum(x_qty per sector) <= 0.30 * Scenario.budget + 5. Representative-only x_qty == 0 where is_non_representative + + Risk objective (regime-aware) + minimize Sigma_ij regime_covar(i, j, Scenario.regime) * x_i * x_j + └── PyRel-derived per regime; PSD-preserving for crisis ──┘ + + Return constraint (epsilon, scaled by budget) + sum(Stock.returns * x_qty) >= eps_rate * Scenario.budget + +────────────────────────────────────────────────────────────────────── +SOLVE (Ipopt, time limit 60s) → LOCALLY_SOLVED + Anchor 1 (min risk) + Anchor 2 (max return) + 5 epsilon points + = 7 solves, 42 optimal portfolios (one per scenario per point) +────────────────────────────────────────────────────────────────────── + +ANCHOR 1 — minimize risk (no return floor) + base_500 return 32.43 risk 1,160.39 + base_1000 return 64.87 risk 4,641.57 + base_2000 return 129.73 risk 18,566.28 + crisis_500 return 31.69 risk 1,913.60 ← higher risk, same universe + crisis_1000 return 63.37 risk 7,654.40 + crisis_2000 return 126.75 risk 30,617.59 + +ANCHOR 2 — maximize return + base_500 / crisis_500 return 42.00 + base_1000 / crisis_1000 return 84.00 + base_2000 / crisis_2000 return 168.00 + (returns are regime-independent) + +Return-rate range [0.0634, 0.0840] per unit invested +Epsilon sweep 5 interior points evenly spaced across the range +``` + +--- + +## Stage 3 — Reading the frontier (per scenario) + +> **Skill:** `rai-prescriptive-results-interpretation` · +> **Prompt:** "For each of the six scenarios, list the seven-point Pareto frontier (return, risk) from the anchors and epsilon points. Compute the marginal `delta_risk / delta_return` between adjacent points and identify the knee — the point where the marginal cost jumps the most relative to the previous step. Confirm the rate-form frontier shape is consistent across budgets (risk scales as budget squared because the QP is quadratic in x, but the rate-form curve and knee location are budget-independent)." + +``` +EFFICIENT FRONTIER — base_500 (budget = 500, regime = base) + + # Label Return Risk + ──────────────────────────────────────── + 1 min_risk 32.43 1,160.39 + 2 eps_1 33.41 1,176.78 ← KNEE + 3 eps_2 35.12 1,262.61 + 4 eps_3 36.84 1,385.89 + 5 eps_4 38.56 1,545.79 + 6 eps_5 40.28 1,742.47 + 7 max_return 42.00 (separate anchor) + + Marginal delta_risk / delta_return: + min_risk → eps_1 16.85 + eps_1 → eps_2 49.94 ← +3.0x ── KNEE + eps_2 → eps_3 71.72 + eps_3 → eps_4 93.03 + eps_4 → eps_5 114.43 + + ────────────────────────────────────────────────────────────────── + base_1000 / base_2000 / crisis_* show the SAME shape — risk scales + as budget^2 (the QP is quadratic in x), but the rate-form frontier + and the knee location are budget-independent. + ────────────────────────────────────────────────────────────────── + + ✓ Stock.x_quantity written back, indexed by (Stock, Scenario) +``` + +--- + +## Stage 4 — Crisis stress test + +> **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · +> **Prompt:** "Define a regime-conditioned covariance `Stock.regime_covar(i, j, Regime)` derived in PyRel: for the base regime use the input covariance unchanged, and for the crisis regime use a PSD-preserving correlation shrinkage toward all-ones with weight 0.7 on the base covariance plus 0.3 on the outer product of volatilities. The Stage 3 solver call is unchanged — `Scenario.regime` selects the matching `regime_covar` per scenario, so all six base and crisis scenarios solve in one call. After the sweep, emit a side-by-side `vol_base` versus `vol_crisis` table per (budget, epsilon point) where `vol = sqrt(risk)`, with the absolute and percentage gap, and explain whether the gap peaks mid-frontier or at the concentrated end." + +Same `solve_epsilon` call, no separate model — `Scenario.regime` selects between two `Stock.regime_covar` definitions: + +- `base`: Sigma(i, j) +- `crisis`: alpha * Sigma(i, j) + (1 - alpha) * vol_i * vol_j (alpha = 0.7) + +The crisis formula is correlation shrinkage toward all-ones (`rho_crisis = alpha * rho + (1 - alpha) * J`) re-expressed in covariance units. PSD is preserved by construction (convex combination of PSD matrices), so every lambda solves cleanly. + +``` +VOLATILITY COMPARISON vol = sqrt(risk) + + Budget 500: + Label vol_base vol_crisis gap gap_% + ──────────────────────────────────────────────────── + min_risk 34.06 43.74 +9.68 +28.4% + eps_1 34.30 44.54 +10.24 +29.8% ← peak + eps_2 35.53 46.11 +10.58 +29.8% + eps_3 37.23 47.94 +10.72 +28.8% + eps_4 39.32 49.99 +10.68 +27.2% + eps_5 41.74 52.27 +10.53 +25.2% + + Budget 1000 + Budget 2000: same gap_% pattern (vol scales with budget; + the rate-form gap is budget-independent). + + ────────────────────────────────────────────────────────────────── + Crisis vol sits 25-30% above base at EVERY lambda. The gap PEAKS in + the middle of the frontier (eps_1..eps_2 at +29.8%) and NARROWS at + the concentrated end (eps_5 at +25.2%). + + Why the inversion: at the concentrated end the optimizer is picking + the highest-Sharpe distinct bet per cluster — the bundled data + weights this toward Energy + Consumer Staples, which happen to + carry lower crisis correlations than the middle of the frontier. + Without the representative collapse, the concentrated end would + stack near-duplicates and the crisis gap would GROW, not shrink. + ────────────────────────────────────────────────────────────────── + + ✓ Stock.regime_covar written back (64 base + 64 crisis = 128 entries) +``` + +**Why `Scenario` is a Concept, not a loop:** all 6 (budget, regime) tuples solve in a single call to the solver, against the matching `regime_covar`. Adding a fourth regime or a fifth budget is a data edit in `scenario_data`, not a change to `solve_epsilon`. Scenarios are data. + +--- + +## Stage outputs — what each reasoner contributes back + +``` +ONTOLOGY ENRICHMENT — what each stage wrote back + + Stage 1 (rules) Holding.value [15] + Holding.is_overconcentrated [4] + Holding.is_sector_concentrated [2] + User.is_high_risk_trader [2] + + Stage 2 (graph) Stock.variance [8] + Stock.volatility [8] + Stock.correlation (i, j) [64] + Stock.cluster [8] + Stock.sharpe [8] + Stock.cluster_max_sharpe [8] + Stock.is_representative [5] + Stock.is_non_representative [3] + + Stage 3 (prescriptive) Stock.regime_covar (i, j, Regime) [128] + Stock.x_quantity (Stock, Scenario) [48] + + Stage 4 (stress) (terminal — vol_base vs vol_crisis table) + + ────────────────────────────────────────────────────────────────── + Each stage reads what the previous stage wrote. + Re-running any downstream stage automatically picks up enrichments. + No glue code, no DataFrame round-trip — same ontology throughout. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## The chain — accretive ontology enrichment + +``` +THE PORTFOLIO-BALANCING CHAIN + + STAGE 1 RULES + "Where is the current book breaking compliance?" + reads: Holding, Account, User, Transaction, Stock.sector + writes: Holding.value + Holding.is_overconcentrated / is_sector_concentrated + User.is_high_risk_trader + │ + ▼ + STAGE 2 GRAPH (Louvain) + "Which stocks are really the same bet?" + reads: Stock.covar (loaded), Stock.returns + writes: Stock.variance / volatility / correlation (i, j) + Stock.cluster ── 5 communities + Stock.sharpe / cluster_max_sharpe + Stock.is_representative ── 5 of 8 stocks + Stock.is_non_representative ── forced to zero in Stage 3 + │ + ▼ + STAGE 3 PRESCRIPTIVE (Ipopt QP) + "What's the optimal allocation under position + sector caps, + for each (budget, regime) scenario?" + reads: Stock.is_representative ──► decision-variable scope + Stock.is_non_representative ──► hard-zero constraint + Stock.returns ──► epsilon return target + Stock.regime_covar ──► quadratic risk objective + Scenario.budget / regime ──► per-scenario constraints + SECTOR_LIMIT (= Stage 1's) ──► hard sector cap + writes: Stock.regime_covar (PSD-preserving) [base + crisis] + Stock.x_quantity (Stock, Scenario) [42 active portfolios] + │ + ▼ + STAGE 4 STRESS (same solver, regime swap) + "How does the optimal portfolio degrade under a crisis covariance?" + reads: Stock.regime_covar (regime = "crisis") + Stage 3's pareto results + writes: (terminal — vol gap table) + + ────────────────────────────────────────────────────────────────── + No glue. No DataFrame ping-pong. No re-derivation per-reasoner. + Three reasoner families, one ontology, one accretive thread. + Stage 4 is a regime swap on the same solve — not a separate model. + ────────────────────────────────────────────────────────────────── +``` + +--- + +## Why the chain matters (vs. any single stage) + +| Stage alone | What it tells you | What it doesn't | +|---|---|---| +| Rules alone | "4 holdings + 2 sectors break compliance" | How to rebuild the book | +| Graph alone | "AAPL/MSFT/GOOGL move together; JNJ/PFE move together" | Which to keep, how much to allocate | +| Prescriptive alone (full universe) | A "diversified" frontier that stacks near-duplicates inside a cluster | That two of those names are one bet — and that the crisis gap will grow under stress | +| Prescriptive alone (single regime) | A base-case efficient frontier | What it costs you when correlations spike | + +| Combined | Output | +|---|---| +| Rules → Graph | Compliance violations + redundant-bet map | +| + Prescriptive (rep-only universe) | 7-point Pareto frontier per scenario; knee at eps_1 | +| + Stress (Scenario.regime swap) | Crisis vol 25-30% above base, gap narrows at concentrated end (the cluster-collapse payoff) | + +**Multi-reasoner chaining grounded in (and contributing to) the ontology.** + +--- + +## Crisis Regime Construction + +**Do not** naively scale off-diagonal covariance by a constant — it frequently breaks positive semidefiniteness, the lambda=0 pure min-variance solve fails (Ipopt rejects non-convex QP), and the whole frontier anchors wrong. PSD-preserving alternatives: + +| Approach | Formula | Preserves PSD? | +|----------|---------|----------------| +| **Correlation shrinkage toward all-ones** (this template) | `rho' = alpha * rho + (1 - alpha) * J`, alpha in [0.6, 0.9]. In covariance units: `cov'(i,j) = alpha * cov(i,j) + (1 - alpha) * vol_i * vol_j` | Yes (convex combination of PSD matrices) | +| Eigenvalue flooring | Eigendecompose, replace lambda_i with max(lambda_i, eps), recompose | Yes by construction | +| Scale off-diagonals + PSD projection | Scale, then find nearest PSD matrix via eigendecomposition | Yes after projection | +| ~~Uniform 1.5x off-diagonals~~ | `Sigma'[i,j] = 1.5 * Sigma[i,j]` for i != j | **No** — frequently non-PSD | + +`CRISIS_ALPHA = 0.7` produces a clearly "crisis-like" regime (intra-cluster correlations bump toward 1) while keeping the QP well-conditioned at every lambda. Lower values (0.5-0.6) give more severe crises; values > 0.9 get close to the base case. + +--- + +## Data Reference + +- **Source data**: bundled CSVs in `../data/` (8 stocks across 5 sectors with 64-entry symmetric covariance, 6 users, 4 accounts, 15 holdings, 21 transactions). To run against your own Snowflake schema instead, swap the `read_csv(...)` loads for `model.Table(...)` references in `portfolio_balancing.py`; the rest of the pipeline is unchanged. +- **Stages**: implemented in `../portfolio_balancing.py` as a single combined script with stage banners (Stage 1 → Stage 4). +- **Ontology**: 6 user-facing concepts (`Stock`, `Sector`, `User`, `Account`, `Holding`, `Transaction`) plus the `Regime` and `Scenario` concepts introduced in Stage 3. Run `inspect.schema(model)` after the pipeline (see template README) to dump the full concept/property/relationship surface, filtering out reasoner-owned concepts (`Variable`, `Constraint`, etc.) and the auto-generated `graph_Edge` from Stage 2. + +--- + +## Adapting this recipe to a new domain + +The chain pattern transfers cleanly. To rebuild for a different problem: + +1. Re-run `rai-discovery` on the new business question — does it actually need all four reasoner roles (rules, graph, prescriptive, prescriptive re-solve), or is one or two sufficient? A pure compliance audit stops after Stage 1; a rebalancer without redundant bets in the universe can skip Stage 2. +2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). For portfolio variants, the load-bearing concepts are the asset entity (here `Stock`), its pairwise covariance, an entity that holds compliance thresholds (here `Account`), and the `Scenario` Concept that parameterizes the optimizer. +3. Stage 1 (rules) is where every threshold the optimizer will later enforce gets named once and reused — keep `SECTOR_LIMIT` (or its equivalent) shared between the compliance scan and the prescriptive constraints, so the rebuilt book obeys the same caps the diagnostic flagged. +4. Stages 2–4 are the load-bearing chain: Graph collapses redundant bets via clustering + per-cluster representative selection, writing `is_representative` and `is_non_representative` flags the optimizer reads. Prescriptive uses those flags to scope the decision variable and adds the position + sector caps. The stress-test stage is the *same solver call* under a regime-swapped covariance — `Scenario.regime` and `Stock.regime_covar(i, j, Regime)` keep base and crisis as data, not separate models. +5. Keep the validation checks at every stage: assert flagged-set size, the cluster count and intra-vs-inter average correlation gap, anchor solves return `LOCALLY_SOLVED`, the return-rate range is non-degenerate, and `REP_POSITION_LIMIT * num_representatives >= 1.0` so the fully-invested constraint stays feasible. +6. When constructing a stress regime, never naively scale off-diagonal covariance — eigenvalue flooring, projection to nearest PSD, or correlation shrinkage toward all-ones (this template) all preserve PSD; arbitrary scaling does not, and the min-risk anchor will fail. + +The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 0fac804..3d91928 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -41,6 +41,35 @@ A single-reasoner approach can't answer this. Reachability alone names the suppl --- +## How to read this runbook + +This runbook serves two audiences: + +- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. +- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data in `../data/`, and the agent will reproduce the stage. + +The two views share one ontology: each stage writes properties back to the model that the next stage reads. The skill blocks are how an engineer rebuilds the chain link-by-link; the narrative is how a stakeholder reads what the chain produced. + +--- + +## Step 0 — Scope the question with `rai-discovery` + +> **Skill:** `rai-discovery` · +> **Prompt:** "Given a supply-chain ontology with sites, businesses (suppliers / manufacturers / warehouses / buyers), shipping operations, SKUs, demand orders, historical shipments, and per-supplier quarterly delay predictions, what questions can each reasoner family answer for a risk-adjusted routing problem? I need to know which high-priority customers are exposed to which suppliers, which sites are bottlenecks, which suppliers are unreliable, and what the minimum-cost flow looks like once those risks are priced in." + +Discovery classifies the question by reasoner family and tells you which downstream skills to load: + +| Sub-question | Reasoner | Skill | +|---|---|---| +| "Which suppliers do my high-priority customers transitively depend on?" | Graph (reachability) | `rai-graph-analysis` | +| "Which sites are network bottlenecks?" | Graph (centrality, components) | `rai-graph-analysis` | +| "Which suppliers are risky? Which demands are escalated?" | Rules | `rai-rules-authoring` | +| "What's the minimum-cost flow plan that respects all of the above, and how much do disruptions cost?" | Prescriptive | `rai-prescriptive-problem-formulation`, `rai-prescriptive-solver-management`, `rai-prescriptive-results-interpretation` | + +Discovery's output is a *plan*, not code. Everything that follows materializes that plan. + +--- + ## Setup See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `supply_chain_resilience.py` against the bundled CSVs in `../data/`. @@ -49,6 +78,8 @@ See the template's main `README.md` for installation, RAI connection setup, and ## Stage 0 — Reachability: blast-radius pre-analysis +> **Skill:** `rai-graph-analysis` · +> **Prompt:** "Build a **directed** business graph from shipment records, with edges going from each shipment's supplier to its customer. Then run upstream reachability **into** every business that holds at least one HIGH-priority demand, and filter the reachable set to nodes whose business type is SUPPLIER. The directed-not-undirected choice matters here: we want the suppliers that feed each high-priority customer's upstream cone, not the symmetric neighborhood. List, per high-priority customer, the suppliers it transitively depends on and their reliability scores." **Construction** — directed `Business` graph, edges from `Business.ships_to` (derived from `Shipment.supplier` -> `Shipment.customer`). @@ -84,6 +115,8 @@ The point of running reachability before the MILP: when the scenario in Stage 3 ## Stage 1 — Graph: site centrality + connected components +> **Skill:** `rai-graph-analysis` · +> **Prompt:** "Build an **undirected, unweighted** site graph using SHIP-type operations as edges between source and output sites, with a sum aggregator to collapse parallel ship lanes. Compute weakly-connected components to surface fragmentation, then eigenvector centrality to rank hubs — restrict the centrality output to FACTORY and DC sites (drop STORE / OFFICE). Normalize the centrality scores to [0, 1] and write them back to the ontology as a per-site property so downstream stages can use them as a bottleneck weight." **Construction:** - Node concept: `Site` (31 sites) @@ -124,6 +157,8 @@ Top critical sites — eigenvector centrality (FACTORY/DC only) ## Stage 2 — Rules: supplier risk classification +> **Skill:** `rai-rules-authoring` · +> **Prompt:** "Add three derived flags to Business and one to Demand. A business is unreliable when its reliability score is below 0.80. A business has high delay risk when at least one Q1-2025 delay prediction for that supplier exceeds 0.15. A business is watch-level when either of those holds. A demand is escalated when its priority is HIGH. Chain the rules so watch-level fires from either underlying flag. Downstream, suppliers with **both** flags are the avoid set (hard-blocked in routing) and suppliers with **either** flag are the watch set (surcharged)." **Late-shipment context** (computed in pandas, not RAI): @@ -197,6 +232,8 @@ Stage 3 reads `is_watch_level` for the surcharge term and `is_unreliable AND has ## Stage 3 — Prescriptive: risk-adjusted minimum-cost flow +> **Skill:** `rai-prescriptive-problem-formulation` · +> **Prompt:** "Formulate a continuous minimum-cost network flow over the operations. The decision variable is per-operation flow, bounded by each operation's daily capacity, with a non-negative unmet-demand slack per demand order. Constraint: for each demand, inbound flow at the customer's site for the demanded SKU plus its slack must cover the order quantity. Hard-block any operation whose source business is in the avoid set (both Stage-2 flags fire). The objective minimizes transport cost plus a $5/unit surcharge on flow through watch-level suppliers, plus a centrality-weighted penalty on flow into bottleneck sites (using the normalized score from Stage 1, weight 2.0), plus a $100/unit unmet-demand penalty." ``` FORMULATION @@ -241,6 +278,8 @@ The baseline buys: enough finished-goods flow on the shortest cost-weighted lane ## Scenario analysis — quantify disruption +> **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · +> **Prompt:** "Re-solve the same formulation under two disruptions and compare them to the baseline. Scenario A: take the highest-centrality site offline by adding a zero-flow constraint on every operation sourced from that site. Scenario B: downgrade every watch-level supplier to avoid by adding zero-flow blocks for every supplier in the union of the two Stage-2 flags. Report status, objective, active flow count, and unmet demand for each. Then explain *why* the cost deltas are asymmetric — the structural-vs-behavioural distinction is the punchline." The same `solve_flow(...)` function re-runs with modified constraints. Two scenarios surface different aspects of the chain's value: @@ -357,3 +396,17 @@ Each row is a single agent prompt. Skills are loaded in order; each writes prope - **Source data**: bundled CSVs in `../data/` — 31 sites across APAC / AMERICAS / EMEA, 31 businesses (6 suppliers, 6 component manufacturers, 2 manufacturers, 8 warehouses, 9 buyers), 9 SKUs (raw materials -> components -> finished goods ProPhone X1 / ProTab T1), 70 operations (SHIP + TRANSFER), 20 demand orders (9 HIGH, 5 MEDIUM, 6 LOW), 262 historical shipments (37 late), 36 quarterly delay predictions (4 quarters × 9 suppliers). To run against your own Snowflake schema instead, swap `read_csv(...)` for typed `model.Table(...)` loads against the equivalent table set. - **Ontology**: defined inline in `../supply_chain_resilience.py` (lines 57–250) — 7 concepts plus the derived `Business.ships_to` and `Operation.source_business` relationships. - **Stages**: implemented in `../supply_chain_resilience.py` as a single combined script with stage banners (`STAGE 0` through `STAGE 3` plus `SCENARIO ANALYSIS`). + +--- + +## Adapting this recipe to a new domain + +The chain pattern transfers cleanly. To rebuild for a different problem: + +1. Re-run `rai-discovery` on the new business question — does it actually need all four reasoner families, or is one or two sufficient? Reachability + prescriptive alone is a viable shortcut if you don't need a centrality-based bottleneck weight or a per-entity risk classifier. +2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). The load-bearing concepts here are the node concept of each graph (`Business`, `Site`), the entity that carries risk flags (`Business`), and the entity that carries decision variables (`Operation`, `Demand`). +3. Stage 0 (reachability) is *optional context* — it scopes the conversation around the disruption table without changing any constraint. Keep it when stakeholders need to see the dependency cone; drop it when the optimizer's hard blocks already make exposure obvious. +4. Stages 1–3 are the load-bearing chain: graph centrality writes a per-site bottleneck weight, rules write per-supplier risk flags, and prescriptive consumes both — centrality as an objective coefficient, the avoid set as a hard block, the watch set as a surcharge. The scenario re-solves are cheap once the formulation exists. +5. Keep the validation checks at every stage: assert the source-business derivation populates, the centrality top-N looks plausible, the avoid/watch sets are non-empty (or non-empty by design), the baseline objective is not zero, and the disruption deltas have the sign you expected. + +The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index d1b3003..9486f52 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -2,7 +2,16 @@ Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **WEST region recovery** — traced across all five RAI reasoner families, each stage writing properties back to the same ontology that downstream stages consume. -The template's combined script (`telco_network_recovery.py`) implements stages 2–5 directly; this runbook expands the surrounding narrative including the descriptive Stage 1 diagnosis and the Stage 6 interpretation, so a non-OR reader can follow the full reasoning thread end-to-end. +The template's combined script (`telco_network_recovery.py`) implements the predictive, rules, graph, and prescriptive stages directly; this runbook frames them with a descriptive Stage 1 diagnosis and a Stage 6 interpretation, so a non-OR reader can follow the full reasoning thread end-to-end. + +--- + +## How to read this runbook + +This runbook serves two audiences: + +- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. +- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data in `../data/`, and the agent will reproduce the stage. --- @@ -43,6 +52,26 @@ A single-reasoner approach can't answer this. Descriptive alone tells the story --- +## Step 0 — Scope the question with `rai-discovery` + +> **Skill:** `rai-discovery` · +> **Prompt:** "WEST is missing revenue targets while every other region grows. Diagnose whether this is a subscriber-retention problem or a network-operations problem, then produce a defensible tower-upgrade plan inside a $5M capex cap and a 200 install-crew-week schedule that prioritizes towers by both who depends on them and where regional demand is heading." + +Discovery classifies the question by reasoner family and tells you which downstream skills to load: + +| Sub-question | Reasoner | Skill | +|---|---|---| +| Where is the revenue bleed coming from, and is it retention or operational? | Descriptive | `rai-querying` | +| Which WEST towers are technically broken enough to be in scope for upgrade? | Rules | `rai-rules-authoring` | +| Whose service depends on each broken tower — what is the social blast radius? | Graph (PageRank) | `rai-graph-analysis` | +| Is regional demand growing or contracting at those sites over the forward horizon? | Predictive (GNN) | _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ | +| Given the cost cap and install-crew budget, which tier should each critical tower receive? | Prescriptive (MIP) | `rai-prescriptive-problem-formulation` | +| Which constraint is binding, and what would change if we relaxed it? | Prescriptive (post-solve) | `rai-prescriptive-results-interpretation` | + +Discovery's output is a *plan*, not code. Everything that follows materializes that plan. + +--- + ## Setup See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `telco_network_recovery.py`. @@ -51,6 +80,8 @@ See the template's main `README.md` for installation, RAI connection setup, and ## Stage 1 — Descriptive: diagnose WEST +> **Skill:** `rai-querying` · +> **Prompt:** "Run a regional triage on the daily KPIs and tower fleet for Q3–Q4 2024. Compare WEST against the other eight regions on network availability, churn rate, daily revenue, and the revenue-forecast vs. actual gap, and break down the WEST tower fleet by status. Identify the top performance offenders by per-tower packet loss and latency, and check whether any WEST subscribers carry elevated churn risk. Conclude whether this looks like a retention problem or an operational one." ``` Q4 2024 — Daily KPIs by region @@ -101,6 +132,8 @@ Subscriber.churn_risk_score is a static feature that hasn't caught up to WEST's ## Stage 2 — Rules: flag critical_restore towers +> **Skill:** `rai-rules-authoring` · +> **Prompt:** "Define per-tower derived averages for packet loss, latency, and error rate from the network-performance measurements, plus an average equipment-health score from the two-hop join through network equipment to equipment-health snapshots. Then add a unary critical-restore flag on the tower that fires when the tower is in WEST and either DEGRADED with average health below 0.85, or shows average packet loss above 5% with the same health threshold (so an ACTIVE-but-failing tower is still caught)." **Properties added to the ontology** (via `model.define(...)`): - `CellTower.avg_packet_loss` (Float) — `aggs.avg(NetworkPerformance.packet_loss_pct).per(CellTower)` @@ -156,6 +189,8 @@ Branch 2 didn't fire — none of WEST's ACTIVE towers fall below health 0.85. Th ## Stage 3 — Graph: subscriber influence + tower blast radius +> **Skill:** `rai-graph-analysis` · +> **Prompt:** "Build a directed subscriber-to-subscriber call graph from the call-detail records, with caller pointing to callee and parallel calls between the same pair summed into a single edge. Score each subscriber with PageRank and write that influence back to the subscriber. Then per critical-restore tower, aggregate the distinct subscribers whose calls route through it and the sum of their PageRank — that's the social blast radius if the tower fails." **Construction** — Pattern 3 (`edge_concept`): - Node concept: `Subscriber` (1,200 nodes) @@ -213,6 +248,8 @@ Per-critical-tower blast radius (sorted by weighted_impact) ## Stage 4 — Predictive: forecast WEST capacity demand +> **Skill:** _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ · +> **Prompt:** "Train a regression GNN on per-region daily KPIs predicting subscriber growth rate. Use same-region 1-day-lag temporal edges, region as a category feature, and three lag features (previous-day growth, previous-week growth, and a 7-day rolling mean) computed before load. Train on rows before November 2024, validate on November, test on December, then bind each region's mean predicted growth back to every cell tower in that region as a per-tower demand multiplier." **Method:** GNN node regression on `TimeSeriesMetric` (composite key `metric_date` + `region`). Target: `subscriber_growth_rate`. Features: the other 12 daily KPIs + 3 lag features (`prev_day_growth`, `prev_week_growth`, `growth_7d_mean`) + `region` as a category. Graph: same-region 1-day-lag temporal edges. Train < 2024-11-01 (includes the Sep–Oct WEST decline onset); validate on Nov 2024; test on Dec 2024. @@ -260,6 +297,8 @@ objective = sum( selected[t,tier] * ## Stage 5 — Prescriptive: tower upgrade selection MIP +> **Skill:** `rai-prescriptive-problem-formulation` · +> **Prompt:** "Pick at most one upgrade tier (BRONZE, SILVER, or GOLD) per critical-restore tower using a binary decision variable on the tower-upgrade-option junction. Stay within a $5M total cost cap and 200 total install crew-weeks. Maximize the sum across selected options of capacity-increase × tower weighted-impact × tower projected-demand-growth, so the optimizer favors towers that are both broken and high-blast-radius, scaled by the regional demand forecast. Solve with Gurobi." ``` FORMULATION @@ -331,6 +370,8 @@ Headline metrics ## Stage 6 — Interpretation +> **Skill:** `rai-prescriptive-results-interpretation` · +> **Prompt:** "Summarize the optimal plan in business terms: total cost vs. budget, capacity restored, tier mix, towers covered, and how many subscribers stop being served by a critical tower over the install schedule. Identify which constraint is binding and what would change if it were relaxed by 10–20% (which tower would jump tiers, what the marginal capacity lift would be). List the per-stage ontology enrichments so the reader can see what each reasoner contributed back." ``` THE PLAN, IN BUSINESS TERMS @@ -442,6 +483,20 @@ THE WEST RECOVERY CHAIN --- +## Adapting this recipe to a new domain + +The chain pattern transfers cleanly. To rebuild for a different problem: + +1. Re-run `rai-discovery` on the new business question — does it actually need all 5 reasoner families, or is one or two sufficient? +2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). +3. Stage 1 (descriptive triage) is *optional but high-leverage*: it scopes the problem and rules out a misdiagnosis (e.g., is this a retention crisis or a network crisis?) before any rule, graph, GNN, or solver runs. +4. Stages 2–5 are the load-bearing chain: rules write the flag that scopes graph aggregations and the solver's decision variables; graph writes the per-entity impact weight that becomes a solver objective coefficient; predictive writes the forward-looking multiplier that becomes the second objective coefficient; prescriptive composes both upstream signals into the final plan. +5. Keep the validation checks at every stage: assert flagged-set size, PageRank top-N looks plausible, the GNN forecast separates the anomalous segment from the rest, the solve status is OPTIMAL, the objective is not zero, and at least one constraint is binding (otherwise you're under-constrained). + +The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. + +--- + ## Data Reference - **Source data**: bundled CSVs in `../data/` (the main template ships ~1.2 MB of synthetic-but-realistic telco data — 250 cell towers, 1,200 subscribers, 6,000 CDRs, 3,285 daily KPI rows across 9 regions). From 20c4bfafdc680e33dd22b89f831bcf28aa9c5c02 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:13:45 -0700 Subject: [PATCH 03/22] Rewrite runbook prompts in user-question style Replace the prior verbose, mechanical prompts with short natural questions a user would actually type, modeled on the reasoner-eval QA catalog. Where an eval QA exists for the same skill+pattern, the runbook prompt mirrors that question directly. Also fix the predictive skill names: rai-predictive-modeling + rai-predictive-training (the public skills are available; the "no public skill yet" placeholder was stale). Each prompt is now: - 1-2 sentences asking what the user wants to know - Domain-natural language - Aligned to a real eval-style question Affects all five runbooks: telco_network_recovery, energy_grid_planning, supply_chain_resilience, machine_maintenance, portfolio_balancing. --- v1/energy_grid_planning/references/runbook.md | 14 +++++++------- v1/machine_maintenance/references/runbook.md | 12 ++++++------ v1/portfolio_balancing/references/runbook.md | 12 ++++++------ v1/supply_chain_resilience/references/runbook.md | 12 ++++++------ v1/telco_network_recovery/references/runbook.md | 16 ++++++++-------- 5 files changed, 33 insertions(+), 33 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index d963954..b98db6b 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -61,7 +61,7 @@ This runbook serves two audiences: ## Step 0 — Scope the question with `rai-discovery` > **Skill:** `rai-discovery` · -> **Prompt:** "We have 10 hyperscaler interconnection requests totalling 2,930 MW queued against a 12-substation ERCOT-style Texas grid. Classify the sub-questions we need to answer to decide which to approve, which substation upgrades to fund, and at what budget level — map each sub-question to the reasoner family that should handle it." +> **Prompt:** "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -113,8 +113,8 @@ Steps are sequential — each depends on prior steps. Steps without a skill are ## Stage 1 — Predictive: substation load forecasting -> **Skill:** _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ · -> **Prompt:** "Forecast each substation's future peak load by aggregating the maximum predicted load across the 6/12/18/24-month forecast horizons in the demand-forecast table, and write the result back to every substation as a derived load-projection property. The downstream rules engine and optimizer both need to read this same forecasted headroom — fall back to the substation's current load only when no forecast row exists. Flag substations whose predicted load exceeds their nameplate capacity within the horizon and report which one breaches first." +> **Skill:** `rai-predictive-modeling` + `rai-predictive-training` · +> **Prompt:** "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." **Method:** load max forecasted load per substation as `Substation.predicted_load`. The template aggregates `DemandForecast.predicted_load_mw` over forecast horizons (6/12/18/24 months) and writes the max back to the substation. A pre-trained GNN can replace the table lookup; the script falls back gracefully when the GNN model registry is unavailable. @@ -160,7 +160,7 @@ effective_load = Substation.predicted_load | Substation.current_load_mw ## Stage 2 — Graph: grid topology & structural vulnerability > **Skill:** `rai-graph-analysis` · -> **Prompt:** "Build an undirected, unweighted graph using `Substation` directly as the node concept and active transmission lines as edges. Run weakly connected components to confirm grid connectivity, Louvain community detection to surface ERCOT regional clusters, and the betweenness/degree/eigenvector centrality trio. Combine the three centrality ranks into a composite rank and flag the top 3 substations as structurally critical, writing the centrality scores, community label, and criticality flag back to each substation." +> **Prompt:** "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." **Construction** — `Substation` as the node concept directly (no mirror concept): - Node concept: `Substation` (12 nodes) @@ -211,7 +211,7 @@ Centrality (top-3 marked is_structurally_critical) ## Stage 3 — Rules: interconnection queue compliance > **Skill:** `rai-rules-authoring` · -> **Prompt:** "Author three declarative compliance rules per data-center request, each consuming an upstream enrichment. Rule 1 fails capacity when the request's MW plus the substation's forecasted load (with current load as fallback) exceeds the substation's nameplate capacity. Rule 2 fails structural risk when the target substation is flagged structurally critical from Stage 2. Rule 3 fails the low-carbon mandate when the substation's zero-emission generation share is below the request's required percentage; sum capacity for generators with emissions rate of zero. Add a composite `is_compliant` flag that fires only when none of the three failure flags fire." +> **Prompt:** "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" Three declarative `Relationship` rules consume Stages 1–2 enrichments. Each is written as a `model.where(...).define(...)` block; a composite `is_compliant` fires only when none of the three failure flags fire. @@ -276,7 +276,7 @@ Every request passes low-carbon — ERCOT's nuclear (STP, Comanche Peak) plus it ## Stage 4 — Prescriptive: joint DC approval + upgrade MIP > **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Formulate a single MIP that picks DC approvals and substation upgrades jointly across five budget scenarios at $200M, $300M, $400M, $500M, and $600M. Model the budget as an `InvestmentLevel` Scenario Concept and index both the binary approve and binary upgrade decision variables by it, so one solve produces the full Pareto frontier — no per-budget re-solve loop. Constrain per substation per scenario that approved DC load fits within nameplate capacity minus forecasted load plus selected upgrade headroom, and per scenario that selected upgrade costs stay within the scenario's budget cap. Maximize total annual interconnection revenue summed across all scenarios." +> **Prompt:** "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." ``` FORMULATION @@ -353,7 +353,7 @@ PARETO FRONTIER (queried directly from ontology) ## Stage 5 — Interpretation > **Skill:** `rai-prescriptive-results-interpretation` · -> **Prompt:** "Read the per-scenario decision-variable properties straight from the ontology — approved DCs, selected upgrades, total MW, revenue, and amortized upgrade cost — and present the Pareto frontier across the five investment levels. Compute marginal value per added budget dollar between adjacent levels and call out the knee where the marginal jump is largest. Translate the result into business language: which DCs unlock at the knee, which substation is the binding bottleneck, and which requests are infeasible at every budget level." +> **Prompt:** "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" ``` THE PLAN, IN BUSINESS TERMS diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index 42c4abc..3f8cc04 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -59,7 +59,7 @@ This runbook serves two audiences: ## Step 0 — Scope the question with `rai-discovery` > **Skill:** `rai-discovery` · -> **Prompt:** "We need to schedule preventive maintenance for a multi-plant operation: 30 machines across 3 plants, 10 technicians across 3 cities, a 4-period horizon. Classify the sub-questions we must answer, map each to the right reasoner family, and tell us which downstream skills to load. Where does OEE alone mislead us, and what structural risks (qualifications, technician concentration) won't a pure optimizer surface?" +> **Prompt:** "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -113,35 +113,35 @@ The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skil ## Stage 0 — Querying: operational intelligence > **Skill:** `rai-querying` · -> **Prompt:** "From the loaded production runs, sensor readings, and per-period failure predictions, compute an OEE proxy by facility (Performance × Quality), list machines with above-threshold sensor anomalies grouped by facility, and identify the six machines with the steepest failure-probability rise from period 1 to period 4. Write the per-period failure prediction back as a `MachinePeriod` property so downstream stages can read it." +> **Prompt:** "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" This stage establishes the operational baseline. Plant_C leads at 79.8% OEE; Plant_B trails at 61.4%. But Plant_A — middle of the OEE pack at 68.2% — owns 7 of 9 sensor anomalies and the three steepest failure trajectories (M001, M013, M016). The querying stage writes nine derived properties on `Machine` plus `MachinePeriod.predicted_fp` (120 rows), and Stage 3 reads `predicted_fp` directly into the failure-cost objective term. ## Stage 1 — Graph: dependency clusters and bottleneck centrality > **Skill:** `rai-graph-analysis` · -> **Prompt:** "Build an undirected graph using `Machine` directly as the node concept. Two machines are adjacent when at least one technician is qualified for both of their machine types. Run weakly connected components to find dependency clusters, then compute betweenness centrality, normalize it to 0..1 against the max, and store both raw and normalized scores back on the machine." +> **Prompt:** "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." The 30 machines form a single connected component — every machine is reachable through shared qualifications. Pump-type machines tie at the top of betweenness (raw 24.0, normalized 1.0): M003 (Plant_C), M008 (Plant_B), M013 (Plant_A). The normalized centrality is consumed by Stage 3's failure-cost multiplier `(1 + 2.0 × betweenness)`, so leaving a bottleneck Pump vulnerable is markedly more expensive than leaving a peripheral Motor vulnerable. ## Stage 2 — Rules: compliance flags and composite risk tier > **Skill:** `rai-rules-authoring` · -> **Prompt:** "Author seven derived flags on the existing concepts: overdue (remaining useful life below required maintenance hours), high-risk (failure probability above 0.3 and criticality at least 4), anomalous (any sensor anomaly), chronic-downtime (more than 8 events), parts-reorder (stock at or below minimum order), and certification-expiring (under 30 days remaining). Then chain three of those flags — chronic, high-risk, overdue — into a `risk_tier` property of Critical / Elevated / Standard, exhaustively enumerating all eight three-flag combinations using negation." +> **Prompt:** "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." Six machines overdue, one high-risk (M013), three chronic-downtime, four parts-reorder, five expiring certs. The composite tier surfaces a single Critical machine — M013 (Pump, Plant_A) — and a single Elevated machine — M016 (Turbine, Plant_A). The overdue flag is consumed by Stage 3 as a hard constraint: every overdue machine must be scheduled by period 2. ## Stage 3 — Prescriptive: maintenance schedule > **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Schedule preventive maintenance over 4 periods. Decide for each machine and period whether to maintain it, whether it remains vulnerable, and which qualified technician services it. Enforce cumulative coverage (each machine is either maintained or vulnerable through every period), assignment-maintenance linkage (one tech per maintained job), per-tech hour capacity, a 5-jobs-per-period parts/bay limit, and the hard rule that every overdue machine from Stage 2 is maintained by period 2. Minimize failure cost (period-specific failure probability × parts cost × criticality, scaled up by `(1 + 2.0 × betweenness)`) plus labor cost (duration × hourly rate) plus travel cost ($50/hour for non-co-located assignments)." +> **Prompt:** "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." The solver returns OPTIMAL with objective $605,240.61 and 20 maintenance jobs across the four periods (capacity-binding at 5 jobs/period). M013 and M016 — Plant_A's Critical and Elevated machines — are both scheduled by period 1, satisfying the overdue deadline. Several Turbine assignments require travel because all three Turbine-qualified techs are based in Houston_TX while four of six Turbines sit at Plant_A and Plant_C. The optimizer pays the travel cost; it cannot restructure the qualification pool. ## Stage 4 — Resilience: concentration sweep and cross-training > **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · -> **Prompt:** "Take the optimal schedule and stress-test the qualification structure underneath it. For each machine type, identify whether all qualified technicians are concentrated in one location, count the scheduled jobs of that type that required travel, and surface the geographic single-point-of-failure. Then re-rank the cross-training options for the concentrated type by cost, prefer non-local candidates, and recommend the cheapest one with its training cost and duration. This is a follow-up sensitivity sweep over the existing solve, not a new formulation." +> **Prompt:** "For each machine type, are all qualified technicians concentrated in one location? How many scheduled jobs required travel, and what's the cheapest cross-training option to eliminate the single-point-of-failure?" Turbine is the concentrated type — all three qualified techs (T001, T002, T003) sit in Houston_TX, and 67% of scheduled Turbine jobs already require travel. The recommended fix: cross-train T006 (Senior, Chicago_IL) for $3,200 over 5 weeks. That single addition eliminates the Houston single-point-of-failure for Turbine work at Plant_B and Plant_C, and pays back the first time a weather event, illness, or expiring cert would have idled a Turbine job that the optimizer would otherwise have left uncovered. diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index ea6f43c..6640840 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -55,7 +55,7 @@ This runbook serves two audiences: ## Step 0 — Scope the question with `rai-discovery` > **Skill:** `rai-discovery` · -> **Prompt:** "We have an 8-stock portfolio book that breaks compliance on 4 holdings and 2 sectors. We want to rebuild the book under a Markowitz mean-variance objective with position and sector caps, deduplicate redundant bets via correlation clustering, and stress-test the resulting frontier under a crisis regime where correlations spike. Classify the sub-questions by reasoner family and tell us which RAI skills to load." +> **Prompt:** "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -107,7 +107,7 @@ The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skil ## Stage 1 — Rules: compliance scan > **Skill:** `rai-rules-authoring` · -> **Prompt:** "Add a derived `Holding.value` from `quantity * purchase_price`, then author three compliance flags as Relationships on the existing book. Flag a holding as overconcentrated when its value exceeds 15% of the parent account's balance. Flag it as sector-concentrated when the per-account, per-sector total of holding values exceeds 30% of balance. Flag a user as a high-risk trader when their risk score is above 0.8 and they have more than five flagged transactions." +> **Prompt:** "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." ``` COMPLIANCE VIOLATIONS — current book (4 accounts, 15 holdings, 6 users) @@ -148,7 +148,7 @@ COMPLIANCE VIOLATIONS — current book (4 accounts, 15 holdings, 6 users) ## Stage 2 — Graph: covariance clustering + cluster representatives > **Skill:** `rai-graph-analysis` · -> **Prompt:** "Derive per-stock variance from the covariance diagonal, volatility as the square root of variance, and pairwise correlation as covariance over the product of volatilities — all in PyRel, no numpy precompute. Build an undirected stock graph with an edge wherever the absolute correlation is at least 0.3 and the index pair is deduplicated. Run Louvain community detection and persist the cluster id back onto each stock. Then derive a Sharpe ratio per stock (returns over volatility), and flag the stock whose Sharpe equals the cluster maximum as the cluster representative; flag the others as non-representative so the optimizer can force them to zero." +> **Prompt:** "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." **Construction** — undirected, unweighted graph: - Node concept: `Stock` (8 nodes) @@ -219,7 +219,7 @@ REPRESENTATIVE = HIGHEST SHARPE PER CLUSTER (returns / volatility) ## Stage 3 — Prescriptive: bi-objective QP with epsilon constraint > **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Formulate a Markowitz mean-variance QP with a continuous decision variable `Stock.x_quantity` indexed by a `Scenario` Concept that combines budget and regime — load the six scenarios for budgets {500, 1000, 2000} crossed with regimes {base, crisis}. Add five constraint families per scenario: non-negativity, fully invested at the scenario budget, per-representative position cap at 30% of budget, per-sector cap at 30% of budget, and a hard zero on every non-representative stock. Anchor the frontier with a min-risk solve and a max-return solve, then sweep five interior epsilon-rate points uniformly across the resulting return-rate range, each one re-solving min-risk subject to a per-scenario return floor that scales with budget. Use Ipopt with a 60-second time limit." +> **Prompt:** "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets × 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." ``` FORMULATION @@ -277,7 +277,7 @@ Epsilon sweep 5 interior points evenly spaced across the range ## Stage 3 — Reading the frontier (per scenario) > **Skill:** `rai-prescriptive-results-interpretation` · -> **Prompt:** "For each of the six scenarios, list the seven-point Pareto frontier (return, risk) from the anchors and epsilon points. Compute the marginal `delta_risk / delta_return` between adjacent points and identify the knee — the point where the marginal cost jumps the most relative to the previous step. Confirm the rate-form frontier shape is consistent across budgets (risk scales as budget squared because the QP is quadratic in x, but the rate-form curve and knee location are budget-independent)." +> **Prompt:** "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most? Is the rate-form frontier shape consistent across budgets?" ``` EFFICIENT FRONTIER — base_500 (budget = 500, regime = base) @@ -313,7 +313,7 @@ EFFICIENT FRONTIER — base_500 (budget = 500, regime = base) ## Stage 4 — Crisis stress test > **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · -> **Prompt:** "Define a regime-conditioned covariance `Stock.regime_covar(i, j, Regime)` derived in PyRel: for the base regime use the input covariance unchanged, and for the crisis regime use a PSD-preserving correlation shrinkage toward all-ones with weight 0.7 on the base covariance plus 0.3 on the outer product of volatilities. The Stage 3 solver call is unchanged — `Scenario.regime` selects the matching `regime_covar` per scenario, so all six base and crisis scenarios solve in one call. After the sweep, emit a side-by-side `vol_base` versus `vol_crisis` table per (budget, epsilon point) where `vol = sqrt(risk)`, with the absolute and percentage gap, and explain whether the gap peaks mid-frontier or at the concentrated end." +> **Prompt:** "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point — does the gap peak mid-frontier or at the concentrated end?" Same `solve_epsilon` call, no separate model — `Scenario.regime` selects between two `Stock.regime_covar` definitions: diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 3d91928..cd2670f 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -55,7 +55,7 @@ The two views share one ontology: each stage writes properties back to the model ## Step 0 — Scope the question with `rai-discovery` > **Skill:** `rai-discovery` · -> **Prompt:** "Given a supply-chain ontology with sites, businesses (suppliers / manufacturers / warehouses / buyers), shipping operations, SKUs, demand orders, historical shipments, and per-supplier quarterly delay predictions, what questions can each reasoner family answer for a risk-adjusted routing problem? I need to know which high-priority customers are exposed to which suppliers, which sites are bottlenecks, which suppliers are unreliable, and what the minimum-cost flow looks like once those risks are priced in." +> **Prompt:** "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -79,7 +79,7 @@ See the template's main `README.md` for installation, RAI connection setup, and ## Stage 0 — Reachability: blast-radius pre-analysis > **Skill:** `rai-graph-analysis` · -> **Prompt:** "Build a **directed** business graph from shipment records, with edges going from each shipment's supplier to its customer. Then run upstream reachability **into** every business that holds at least one HIGH-priority demand, and filter the reachable set to nodes whose business type is SUPPLIER. The directed-not-undirected choice matters here: we want the suppliers that feed each high-priority customer's upstream cone, not the symmetric neighborhood. List, per high-priority customer, the suppliers it transitively depends on and their reliability scores." +> **Prompt:** "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." **Construction** — directed `Business` graph, edges from `Business.ships_to` (derived from `Shipment.supplier` -> `Shipment.customer`). @@ -116,7 +116,7 @@ The point of running reachability before the MILP: when the scenario in Stage 3 ## Stage 1 — Graph: site centrality + connected components > **Skill:** `rai-graph-analysis` · -> **Prompt:** "Build an **undirected, unweighted** site graph using SHIP-type operations as edges between source and output sites, with a sum aggregator to collapse parallel ship lanes. Compute weakly-connected components to surface fragmentation, then eigenvector centrality to rank hubs — restrict the centrality output to FACTORY and DC sites (drop STORE / OFFICE). Normalize the centrality scores to [0, 1] and write them back to the ontology as a per-site property so downstream stages can use them as a bottleneck weight." +> **Prompt:** "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." **Construction:** - Node concept: `Site` (31 sites) @@ -158,7 +158,7 @@ Top critical sites — eigenvector centrality (FACTORY/DC only) ## Stage 2 — Rules: supplier risk classification > **Skill:** `rai-rules-authoring` · -> **Prompt:** "Add three derived flags to Business and one to Demand. A business is unreliable when its reliability score is below 0.80. A business has high delay risk when at least one Q1-2025 delay prediction for that supplier exceeds 0.15. A business is watch-level when either of those holds. A demand is escalated when its priority is HIGH. Chain the rules so watch-level fires from either underlying flag. Downstream, suppliers with **both** flags are the avoid set (hard-blocked in routing) and suppliers with **either** flag are the watch set (surcharged)." +> **Prompt:** "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." **Late-shipment context** (computed in pandas, not RAI): @@ -233,7 +233,7 @@ Stage 3 reads `is_watch_level` for the surcharge term and `is_unreliable AND has ## Stage 3 — Prescriptive: risk-adjusted minimum-cost flow > **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Formulate a continuous minimum-cost network flow over the operations. The decision variable is per-operation flow, bounded by each operation's daily capacity, with a non-negative unmet-demand slack per demand order. Constraint: for each demand, inbound flow at the customer's site for the demanded SKU plus its slack must cover the order quantity. Hard-block any operation whose source business is in the avoid set (both Stage-2 flags fire). The objective minimizes transport cost plus a $5/unit surcharge on flow through watch-level suppliers, plus a centrality-weighted penalty on flow into bottleneck sites (using the normalized score from Stage 1, weight 2.0), plus a $100/unit unmet-demand penalty." +> **Prompt:** "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." ``` FORMULATION @@ -279,7 +279,7 @@ The baseline buys: enough finished-goods flow on the shortest cost-weighted lane ## Scenario analysis — quantify disruption > **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · -> **Prompt:** "Re-solve the same formulation under two disruptions and compare them to the baseline. Scenario A: take the highest-centrality site offline by adding a zero-flow constraint on every operation sourced from that site. Scenario B: downgrade every watch-level supplier to avoid by adding zero-flow blocks for every supplier in the union of the two Stage-2 flags. Report status, objective, active flow count, and unmet demand for each. Then explain *why* the cost deltas are asymmetric — the structural-vs-behavioural distinction is the punchline." +> **Prompt:** "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" The same `solve_flow(...)` function re-runs with modified constraints. Two scenarios surface different aspects of the chain's value: diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 9486f52..188533e 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -55,7 +55,7 @@ A single-reasoner approach can't answer this. Descriptive alone tells the story ## Step 0 — Scope the question with `rai-discovery` > **Skill:** `rai-discovery` · -> **Prompt:** "WEST is missing revenue targets while every other region grows. Diagnose whether this is a subscriber-retention problem or a network-operations problem, then produce a defensible tower-upgrade plan inside a $5M capex cap and a 200 install-crew-week schedule that prioritizes towers by both who depends on them and where regional demand is heading." +> **Prompt:** "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -81,7 +81,7 @@ See the template's main `README.md` for installation, RAI connection setup, and ## Stage 1 — Descriptive: diagnose WEST > **Skill:** `rai-querying` · -> **Prompt:** "Run a regional triage on the daily KPIs and tower fleet for Q3–Q4 2024. Compare WEST against the other eight regions on network availability, churn rate, daily revenue, and the revenue-forecast vs. actual gap, and break down the WEST tower fleet by status. Identify the top performance offenders by per-tower packet loss and latency, and check whether any WEST subscribers carry elevated churn risk. Conclude whether this looks like a retention problem or an operational one." +> **Prompt:** "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" ``` Q4 2024 — Daily KPIs by region @@ -133,7 +133,7 @@ Subscriber.churn_risk_score is a static feature that hasn't caught up to WEST's ## Stage 2 — Rules: flag critical_restore towers > **Skill:** `rai-rules-authoring` · -> **Prompt:** "Define per-tower derived averages for packet loss, latency, and error rate from the network-performance measurements, plus an average equipment-health score from the two-hop join through network equipment to equipment-health snapshots. Then add a unary critical-restore flag on the tower that fires when the tower is in WEST and either DEGRADED with average health below 0.85, or shows average packet loss above 5% with the same health threshold (so an ACTIVE-but-failing tower is still caught)." +> **Prompt:** "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." **Properties added to the ontology** (via `model.define(...)`): - `CellTower.avg_packet_loss` (Float) — `aggs.avg(NetworkPerformance.packet_loss_pct).per(CellTower)` @@ -190,7 +190,7 @@ Branch 2 didn't fire — none of WEST's ACTIVE towers fall below health 0.85. Th ## Stage 3 — Graph: subscriber influence + tower blast radius > **Skill:** `rai-graph-analysis` · -> **Prompt:** "Build a directed subscriber-to-subscriber call graph from the call-detail records, with caller pointing to callee and parallel calls between the same pair summed into a single edge. Score each subscriber with PageRank and write that influence back to the subscriber. Then per critical-restore tower, aggregate the distinct subscribers whose calls route through it and the sum of their PageRank — that's the social blast radius if the tower fails." +> **Prompt:** "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." **Construction** — Pattern 3 (`edge_concept`): - Node concept: `Subscriber` (1,200 nodes) @@ -248,8 +248,8 @@ Per-critical-tower blast radius (sorted by weighted_impact) ## Stage 4 — Predictive: forecast WEST capacity demand -> **Skill:** _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ · -> **Prompt:** "Train a regression GNN on per-region daily KPIs predicting subscriber growth rate. Use same-region 1-day-lag temporal edges, region as a category feature, and three lag features (previous-day growth, previous-week growth, and a 7-day rolling mean) computed before load. Train on rows before November 2024, validate on November, test on December, then bind each region's mean predicted growth back to every cell tower in that region as a per-tower demand multiplier." +> **Skill:** `rai-predictive-modeling` + `rai-predictive-training` · +> **Prompt:** "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history (subscriber_growth_rate, churn_rate, marketing_spend, network_availability_pct, total_calls). Bind each region's forecast back to its towers as a demand multiplier." **Method:** GNN node regression on `TimeSeriesMetric` (composite key `metric_date` + `region`). Target: `subscriber_growth_rate`. Features: the other 12 daily KPIs + 3 lag features (`prev_day_growth`, `prev_week_growth`, `growth_7d_mean`) + `region` as a category. Graph: same-region 1-day-lag temporal edges. Train < 2024-11-01 (includes the Sep–Oct WEST decline onset); validate on Nov 2024; test on Dec 2024. @@ -298,7 +298,7 @@ objective = sum( selected[t,tier] * ## Stage 5 — Prescriptive: tower upgrade selection MIP > **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Pick at most one upgrade tier (BRONZE, SILVER, or GOLD) per critical-restore tower using a binary decision variable on the tower-upgrade-option junction. Stay within a $5M total cost cap and 200 total install crew-weeks. Maximize the sum across selected options of capacity-increase × tower weighted-impact × tower projected-demand-growth, so the optimizer favors towers that are both broken and high-blast-radius, scaled by the regional demand forecast. Solve with Gurobi." +> **Prompt:** "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." ``` FORMULATION @@ -371,7 +371,7 @@ Headline metrics ## Stage 6 — Interpretation > **Skill:** `rai-prescriptive-results-interpretation` · -> **Prompt:** "Summarize the optimal plan in business terms: total cost vs. budget, capacity restored, tier mix, towers covered, and how many subscribers stop being served by a critical tower over the install schedule. Identify which constraint is binding and what would change if it were relaxed by 10–20% (which tower would jump tiers, what the marginal capacity lift would be). List the per-stage ontology enrichments so the reader can see what each reasoner contributed back." +> **Prompt:** "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" ``` THE PLAN, IN BUSINESS TERMS From 9681e2cf7935cf9f57afef1d8a87804ef01add33 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:24:46 -0700 Subject: [PATCH 04/22] Inline skill invocation in prompt callouts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the two-line `**Skill:** ... · **Prompt:** "..."` callout with a single-line `> /rai-skill "question"` form that mirrors how a user actually invokes a skill in chat. Multi-skill stages stay readable as `> /rai-A + /rai-B "question"`. One change per stage callout, all five runbooks. No content rewrite. --- v1/energy_grid_planning/references/runbook.md | 18 ++++++---------- v1/machine_maintenance/references/runbook.md | 18 ++++++---------- v1/portfolio_balancing/references/runbook.md | 18 ++++++---------- .../references/runbook.md | 18 ++++++---------- .../references/runbook.md | 21 +++++++------------ 5 files changed, 31 insertions(+), 62 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index b98db6b..f1329ac 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -60,8 +60,7 @@ This runbook serves two audiences: ## Step 0 — Scope the question with `rai-discovery` -> **Skill:** `rai-discovery` · -> **Prompt:** "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" +> `/rai-discovery` "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -113,8 +112,7 @@ Steps are sequential — each depends on prior steps. Steps without a skill are ## Stage 1 — Predictive: substation load forecasting -> **Skill:** `rai-predictive-modeling` + `rai-predictive-training` · -> **Prompt:** "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." +> `/rai-predictive-modeling` + `/rai-predictive-training` "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." **Method:** load max forecasted load per substation as `Substation.predicted_load`. The template aggregates `DemandForecast.predicted_load_mw` over forecast horizons (6/12/18/24 months) and writes the max back to the substation. A pre-trained GNN can replace the table lookup; the script falls back gracefully when the GNN model registry is unavailable. @@ -159,8 +157,7 @@ effective_load = Substation.predicted_load | Substation.current_load_mw ## Stage 2 — Graph: grid topology & structural vulnerability -> **Skill:** `rai-graph-analysis` · -> **Prompt:** "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." +> `/rai-graph-analysis` "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." **Construction** — `Substation` as the node concept directly (no mirror concept): - Node concept: `Substation` (12 nodes) @@ -210,8 +207,7 @@ Centrality (top-3 marked is_structurally_critical) ## Stage 3 — Rules: interconnection queue compliance -> **Skill:** `rai-rules-authoring` · -> **Prompt:** "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" +> `/rai-rules-authoring` "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" Three declarative `Relationship` rules consume Stages 1–2 enrichments. Each is written as a `model.where(...).define(...)` block; a composite `is_compliant` fires only when none of the three failure flags fire. @@ -275,8 +271,7 @@ Every request passes low-carbon — ERCOT's nuclear (STP, Comanche Peak) plus it ## Stage 4 — Prescriptive: joint DC approval + upgrade MIP -> **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." +> `/rai-prescriptive-problem-formulation` "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." ``` FORMULATION @@ -352,8 +347,7 @@ PARETO FRONTIER (queried directly from ontology) ## Stage 5 — Interpretation -> **Skill:** `rai-prescriptive-results-interpretation` · -> **Prompt:** "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" +> `/rai-prescriptive-results-interpretation` "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" ``` THE PLAN, IN BUSINESS TERMS diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index 3f8cc04..ca6201c 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -58,8 +58,7 @@ This runbook serves two audiences: ## Step 0 — Scope the question with `rai-discovery` -> **Skill:** `rai-discovery` · -> **Prompt:** "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" +> `/rai-discovery` "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -112,36 +111,31 @@ The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skil ## Stage 0 — Querying: operational intelligence -> **Skill:** `rai-querying` · -> **Prompt:** "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" +> `/rai-querying` "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" This stage establishes the operational baseline. Plant_C leads at 79.8% OEE; Plant_B trails at 61.4%. But Plant_A — middle of the OEE pack at 68.2% — owns 7 of 9 sensor anomalies and the three steepest failure trajectories (M001, M013, M016). The querying stage writes nine derived properties on `Machine` plus `MachinePeriod.predicted_fp` (120 rows), and Stage 3 reads `predicted_fp` directly into the failure-cost objective term. ## Stage 1 — Graph: dependency clusters and bottleneck centrality -> **Skill:** `rai-graph-analysis` · -> **Prompt:** "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." +> `/rai-graph-analysis` "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." The 30 machines form a single connected component — every machine is reachable through shared qualifications. Pump-type machines tie at the top of betweenness (raw 24.0, normalized 1.0): M003 (Plant_C), M008 (Plant_B), M013 (Plant_A). The normalized centrality is consumed by Stage 3's failure-cost multiplier `(1 + 2.0 × betweenness)`, so leaving a bottleneck Pump vulnerable is markedly more expensive than leaving a peripheral Motor vulnerable. ## Stage 2 — Rules: compliance flags and composite risk tier -> **Skill:** `rai-rules-authoring` · -> **Prompt:** "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." +> `/rai-rules-authoring` "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." Six machines overdue, one high-risk (M013), three chronic-downtime, four parts-reorder, five expiring certs. The composite tier surfaces a single Critical machine — M013 (Pump, Plant_A) — and a single Elevated machine — M016 (Turbine, Plant_A). The overdue flag is consumed by Stage 3 as a hard constraint: every overdue machine must be scheduled by period 2. ## Stage 3 — Prescriptive: maintenance schedule -> **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." +> `/rai-prescriptive-problem-formulation` "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." The solver returns OPTIMAL with objective $605,240.61 and 20 maintenance jobs across the four periods (capacity-binding at 5 jobs/period). M013 and M016 — Plant_A's Critical and Elevated machines — are both scheduled by period 1, satisfying the overdue deadline. Several Turbine assignments require travel because all three Turbine-qualified techs are based in Houston_TX while four of six Turbines sit at Plant_A and Plant_C. The optimizer pays the travel cost; it cannot restructure the qualification pool. ## Stage 4 — Resilience: concentration sweep and cross-training -> **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · -> **Prompt:** "For each machine type, are all qualified technicians concentrated in one location? How many scheduled jobs required travel, and what's the cheapest cross-training option to eliminate the single-point-of-failure?" +> `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "For each machine type, are all qualified technicians concentrated in one location? How many scheduled jobs required travel, and what's the cheapest cross-training option to eliminate the single-point-of-failure?" Turbine is the concentrated type — all three qualified techs (T001, T002, T003) sit in Houston_TX, and 67% of scheduled Turbine jobs already require travel. The recommended fix: cross-train T006 (Senior, Chicago_IL) for $3,200 over 5 weeks. That single addition eliminates the Houston single-point-of-failure for Turbine work at Plant_B and Plant_C, and pays back the first time a weather event, illness, or expiring cert would have idled a Turbine job that the optimizer would otherwise have left uncovered. diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index 6640840..3df9efa 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -54,8 +54,7 @@ This runbook serves two audiences: ## Step 0 — Scope the question with `rai-discovery` -> **Skill:** `rai-discovery` · -> **Prompt:** "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" +> `/rai-discovery` "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -106,8 +105,7 @@ The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skil ## Stage 1 — Rules: compliance scan -> **Skill:** `rai-rules-authoring` · -> **Prompt:** "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." +> `/rai-rules-authoring` "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." ``` COMPLIANCE VIOLATIONS — current book (4 accounts, 15 holdings, 6 users) @@ -147,8 +145,7 @@ COMPLIANCE VIOLATIONS — current book (4 accounts, 15 holdings, 6 users) ## Stage 2 — Graph: covariance clustering + cluster representatives -> **Skill:** `rai-graph-analysis` · -> **Prompt:** "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." +> `/rai-graph-analysis` "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." **Construction** — undirected, unweighted graph: - Node concept: `Stock` (8 nodes) @@ -218,8 +215,7 @@ REPRESENTATIVE = HIGHEST SHARPE PER CLUSTER (returns / volatility) ## Stage 3 — Prescriptive: bi-objective QP with epsilon constraint -> **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets × 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." +> `/rai-prescriptive-problem-formulation` "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets × 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." ``` FORMULATION @@ -276,8 +272,7 @@ Epsilon sweep 5 interior points evenly spaced across the range ## Stage 3 — Reading the frontier (per scenario) -> **Skill:** `rai-prescriptive-results-interpretation` · -> **Prompt:** "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most? Is the rate-form frontier shape consistent across budgets?" +> `/rai-prescriptive-results-interpretation` "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most? Is the rate-form frontier shape consistent across budgets?" ``` EFFICIENT FRONTIER — base_500 (budget = 500, regime = base) @@ -312,8 +307,7 @@ EFFICIENT FRONTIER — base_500 (budget = 500, regime = base) ## Stage 4 — Crisis stress test -> **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · -> **Prompt:** "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point — does the gap peak mid-frontier or at the concentrated end?" +> `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point — does the gap peak mid-frontier or at the concentrated end?" Same `solve_epsilon` call, no separate model — `Scenario.regime` selects between two `Stock.regime_covar` definitions: diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index cd2670f..877c451 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -54,8 +54,7 @@ The two views share one ontology: each stage writes properties back to the model ## Step 0 — Scope the question with `rai-discovery` -> **Skill:** `rai-discovery` · -> **Prompt:** "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" +> `/rai-discovery` "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -78,8 +77,7 @@ See the template's main `README.md` for installation, RAI connection setup, and ## Stage 0 — Reachability: blast-radius pre-analysis -> **Skill:** `rai-graph-analysis` · -> **Prompt:** "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." +> `/rai-graph-analysis` "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." **Construction** — directed `Business` graph, edges from `Business.ships_to` (derived from `Shipment.supplier` -> `Shipment.customer`). @@ -115,8 +113,7 @@ The point of running reachability before the MILP: when the scenario in Stage 3 ## Stage 1 — Graph: site centrality + connected components -> **Skill:** `rai-graph-analysis` · -> **Prompt:** "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." +> `/rai-graph-analysis` "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." **Construction:** - Node concept: `Site` (31 sites) @@ -157,8 +154,7 @@ Top critical sites — eigenvector centrality (FACTORY/DC only) ## Stage 2 — Rules: supplier risk classification -> **Skill:** `rai-rules-authoring` · -> **Prompt:** "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." +> `/rai-rules-authoring` "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." **Late-shipment context** (computed in pandas, not RAI): @@ -232,8 +228,7 @@ Stage 3 reads `is_watch_level` for the surcharge term and `is_unreliable AND has ## Stage 3 — Prescriptive: risk-adjusted minimum-cost flow -> **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." +> `/rai-prescriptive-problem-formulation` "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." ``` FORMULATION @@ -278,8 +273,7 @@ The baseline buys: enough finished-goods flow on the shortest cost-weighted lane ## Scenario analysis — quantify disruption -> **Skill:** `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` · -> **Prompt:** "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" +> `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" The same `solve_flow(...)` function re-runs with modified constraints. Two scenarios surface different aspects of the chain's value: diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 188533e..8cb676e 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -54,8 +54,7 @@ A single-reasoner approach can't answer this. Descriptive alone tells the story ## Step 0 — Scope the question with `rai-discovery` -> **Skill:** `rai-discovery` · -> **Prompt:** "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" +> `/rai-discovery` "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" Discovery classifies the question by reasoner family and tells you which downstream skills to load: @@ -80,8 +79,7 @@ See the template's main `README.md` for installation, RAI connection setup, and ## Stage 1 — Descriptive: diagnose WEST -> **Skill:** `rai-querying` · -> **Prompt:** "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" +> `/rai-querying` "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" ``` Q4 2024 — Daily KPIs by region @@ -132,8 +130,7 @@ Subscriber.churn_risk_score is a static feature that hasn't caught up to WEST's ## Stage 2 — Rules: flag critical_restore towers -> **Skill:** `rai-rules-authoring` · -> **Prompt:** "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." +> `/rai-rules-authoring` "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." **Properties added to the ontology** (via `model.define(...)`): - `CellTower.avg_packet_loss` (Float) — `aggs.avg(NetworkPerformance.packet_loss_pct).per(CellTower)` @@ -189,8 +186,7 @@ Branch 2 didn't fire — none of WEST's ACTIVE towers fall below health 0.85. Th ## Stage 3 — Graph: subscriber influence + tower blast radius -> **Skill:** `rai-graph-analysis` · -> **Prompt:** "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." +> `/rai-graph-analysis` "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." **Construction** — Pattern 3 (`edge_concept`): - Node concept: `Subscriber` (1,200 nodes) @@ -248,8 +244,7 @@ Per-critical-tower blast radius (sorted by weighted_impact) ## Stage 4 — Predictive: forecast WEST capacity demand -> **Skill:** `rai-predictive-modeling` + `rai-predictive-training` · -> **Prompt:** "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history (subscriber_growth_rate, churn_rate, marketing_spend, network_availability_pct, total_calls). Bind each region's forecast back to its towers as a demand multiplier." +> `/rai-predictive-modeling` + `/rai-predictive-training` "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history (subscriber_growth_rate, churn_rate, marketing_spend, network_availability_pct, total_calls). Bind each region's forecast back to its towers as a demand multiplier." **Method:** GNN node regression on `TimeSeriesMetric` (composite key `metric_date` + `region`). Target: `subscriber_growth_rate`. Features: the other 12 daily KPIs + 3 lag features (`prev_day_growth`, `prev_week_growth`, `growth_7d_mean`) + `region` as a category. Graph: same-region 1-day-lag temporal edges. Train < 2024-11-01 (includes the Sep–Oct WEST decline onset); validate on Nov 2024; test on Dec 2024. @@ -297,8 +292,7 @@ objective = sum( selected[t,tier] * ## Stage 5 — Prescriptive: tower upgrade selection MIP -> **Skill:** `rai-prescriptive-problem-formulation` · -> **Prompt:** "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." +> `/rai-prescriptive-problem-formulation` "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." ``` FORMULATION @@ -370,8 +364,7 @@ Headline metrics ## Stage 6 — Interpretation -> **Skill:** `rai-prescriptive-results-interpretation` · -> **Prompt:** "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" +> `/rai-prescriptive-results-interpretation` "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" ``` THE PLAN, IN BUSINESS TERMS From 78323e33e9249d0a4ad316b6538a4550da99eace Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:27:19 -0700 Subject: [PATCH 05/22] Trim runbooks to overview + workflow table + data footer Each runbook is now ~50 lines: 1-paragraph intro, the TL;DR chain ASCII, a workflow table (skill + prompt + expected output per step), and a brief data footer. All per-stage narrative subsections, "how to read" / "adapting" / "why the chain matters" sections, and duplicate enrichment diagrams are gone. Total: 5 files, ~2000 lines removed, ~260 retained. --- v1/energy_grid_planning/references/runbook.md | 486 +----------------- v1/machine_maintenance/references/runbook.md | 257 +-------- v1/portfolio_balancing/references/runbook.md | 465 +---------------- .../references/runbook.md | 383 +------------- .../references/runbook.md | 473 +---------------- 5 files changed, 64 insertions(+), 2000 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index f1329ac..87242eb 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -1,12 +1,8 @@ # Runbook: Energy Grid Planning — Multi-Reasoner Walkthrough -Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **ERCOT processes 10 hyperscaler interconnection requests against the 12-substation Texas grid** — traced across four RAI reasoner families, each stage writing properties back to the same ontology that downstream stages consume. +ERCOT processes 10 hyperscaler interconnection requests (2,930 MW) against a 12-substation Texas grid. The chain forecasts substation load, finds structural bottlenecks, screens compliance, and produces a Pareto frontier across 5 budget levels — no single reasoner can answer this end-to-end. -The template's combined script (`energy_grid_planning.py`) implements stages 1–4 directly; this runbook walks through how an agent would derive the same pipeline prompt-by-prompt, skill-by-skill, so a non-OR reader can follow the full reasoning thread end-to-end. - ---- - -## TL;DR — the chain in one screen +## The chain ``` ERCOT has 10 hyperscaler interconnection requests totalling 2,930 MW @@ -40,473 +36,17 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ───────────────────────────────────────────────────────────────── ``` -A single-reasoner approach can't answer this. Predictive alone forecasts demand but doesn't decide. Graph alone finds bottlenecks but doesn't weigh revenue. Rules alone flag noncompliance but can't optimize. Prescriptive alone has no way to weigh "critical" without graph + predictive upstream. - ---- - -## How to read this runbook - -This runbook serves two audiences: - -- **Reading top-to-bottom**: the narrative + ASCII visualizations show what - the chain produces stage-by-stage, with the same business framing the - stakeholder would see. -- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the - start of each stage is the recipe — load that RAI agent skill, give it - that prompt against the bundled demo data, and the agent will reproduce - the stage. - ---- - -## Step 0 — Scope the question with `rai-discovery` - -> `/rai-discovery` "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" - -Discovery classifies the question by reasoner family and tells you which -downstream skills to load: - -| Sub-question | Reasoner | Skill | -|---|---|---| -| Which substations are losing headroom fastest and will breach capacity? | Predictive | _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ | -| Which substations are structural bottlenecks on the transmission grid? Which communities? | Graph | `rai-graph-analysis` | -| Which DC requests fail capacity, structural-risk, or low-carbon compliance? | Rules | `rai-rules-authoring` | -| Across budget levels, which DCs should we approve and which upgrades should we fund? | Prescriptive | `rai-prescriptive-problem-formulation` | -| What does the Pareto frontier tell us about the knee, marginal value, and binding constraints? | Prescriptive (post-solve) | `rai-prescriptive-results-interpretation` | - -Discovery's output is a *plan*, not code. Everything that follows -materializes that plan. - ---- - -## Prerequisites - -The template ships everything needed to run the chain end-to-end: - -- Bundled CSVs in `../data/` (12 substations, 15 generators, 18 transmission lines, 10 DC requests, 10 substation upgrades, and supporting load/forecast tables). To run against your own Snowflake schema, swap CSV loaders for `model.Table(...)` references. -- `raiconfig.yaml` pointed at your RAI-enabled Snowflake account. -- The combined script `../energy_grid_planning.py` runs the full chain. - -The narrative below follows the actual stage outputs of `energy_grid_planning.py`. Steps 1–3 below are an agent-style walk-through of how the ontology would be built from scratch via skills like `/rai-build-starter-ontology` and `/rai-discovery`; the template ships a pre-built ontology in `energy_grid_planning.py` for users who want to skip ahead to the reasoner stages. - ---- - -## Walk-through (agent-style) - -Steps are sequential — each depends on prior steps. Steps without a skill are presentation-only. - -| # | Step | Skill | Prompt | Expected Output | -|---|------|-------|--------|-----------------| -| 1 | Ontology | `/rai-build-starter-ontology` | "Build an ontology for ERCOT grid infrastructure planning from the bundled CSVs (or your Snowflake schema)." | 13 concepts: `Substation`, `Generator`, `TransmissionLine`, `LoadZone`, `DemandPeriod`, `RenewableProfile`, `MaintenanceWindow`, `Customer`, `DataCenterRequest`, `SubstationUpgrade`, `DemandForecast`, `LoadHistory`, `DCAnnouncement`. 12 substations, 15 generators, 18 transmission lines, 10 DC requests (2,930 MW). | -| 2 | Visualize | — | "Show the ontology as an ASCII diagram." | Concept map with `Substation` as central hub — `Generator`, `TransmissionLine`, `DataCenterRequest`, `SubstationUpgrade`, `DemandForecast` all relate to it. | -| 3 | Discovery | `/rai-discovery` | "What questions can we answer with this ontology? We're evaluating data center interconnection requests." | 4 reasoning paths: substation load forecast (predictive), grid topology + structural criticality (graph), interconnection compliance (rules), joint approval + upgrade investment (prescriptive). | -| 4 | Explore: generation mix | `/rai-querying` | "What's our current generation mix by fuel type? How much renewable capacity vs fossil?" | 15 generators, 8,135 MW total. Nuclear leads (2,560 MW, 31.5%), then gas (2,290 MW, 28.1%), wind (1,250 MW, 15.4%), coal (1,020 MW, 12.5%), solar (630 MW, 7.7%), battery (300 MW), hydro (85 MW). Renewable: 2,265 MW (28%). Requests with 100% low-carbon mandates (Google, Crusoe) face a structural constraint. | -| 5 | Explore: capacity headroom | `/rai-querying` | "Which substations have the most and least spare capacity right now, before any new DC load?" | Tightest: Houston Ship Channel (69.4% utilized, 550 MW headroom), Austin Energy (68.9%, 280 MW), DFW (68.8%, 500 MW). Most spare: Midland-Permian (38.2%, 680 MW headroom), Lubbock (44.3%, 390 MW). DFW has only 500 MW headroom but 1,100 MW of DC requests stacked on it. | -| 6 | Explore: DC request landscape | `/rai-querying` | "Summarize the 10 DC requests — total MW per substation, revenue per MW, low-carbon requirements." | 2,930 MW total, $528M/yr revenue across 6 substations. DFW most stacked (1,100 MW, 3 requests: Google $195K/MW, xAI $210K/MW, Lambda $150K/MW). xAI is highest revenue ($210K/MW/yr, $105M/yr total). Google and Crusoe require 100% low-carbon. Top 3 substations (DFW, Houston, San Antonio) account for 78% of requested MW. | -| 7 | Stage 1 — Predict | `/rai-querying` (or GNN) | "Forecast which substations are losing headroom fastest and which will breach capacity." | `Substation.predicted_load` written for all 12. DFW: 1,700 MW predicted vs 1,600 MW capacity at 24 months (+54.6% growth). Houston Ship Channel: 1,797 MW (+43.8%, within capacity). The only substation predicted to breach. | -| 8 | Stage 2 — Graph | `/rai-graph-analysis` | "Build a graph on the transmission grid. Find structurally critical substations using betweenness, degree, and eigenvector centrality. Identify connected components and Louvain communities." | 1 connected component, 3 communities (North Texas, West Texas, Gulf Coast). Top 3 by combined centrality rank: DFW, Houston, San Antonio — all flagged `is_structurally_critical`. 7 of 10 DC requests target critical substations. | -| 9 | Stage 3 — Rules | `/rai-rules-authoring` | "Check each request against capacity (using `predicted_load`), low-carbon mandate, and structural risk (using `is_structurally_critical`)." | 3 declarative `Relationship` rules (`fails_capacity`, `fails_structural`, `fails_low_carbon`) + composite `is_compliant`. All 10 pass low-carbon. 2 compliant: Crusoe (Midland) and Oracle (Corpus Christi). 8 flagged on capacity + structural risk. | -| 10 | Stage 4 — Optimize | `/rai-prescriptive-problem-formulation` | "Which DCs to approve and which upgrades to fund across 5 budget levels ($200M-$600M)? Use `predicted_load` for capacity. Show the Pareto frontier — DCs, MW, revenue, net value at each level." | Pareto frontier across `InvestmentLevel` Scenario Concept. Knee at $300M (5 DCs, 1,500 MW, $264M net value). xAI Colossus unlocks at $300M. Google and Lambda never approved (DFW full). | -| 11 | Results | `/rai-prescriptive-results-interpretation` | "How do approvals and upgrades vary by investment level? Where's the knee? What's marginal return per $M?" | Per-level DC list + selected upgrades, queried via `model.select(...).where(x_approve > 0.5)`. Marginal: $200→$300M = $995K/$M (knee); declines to $400K/$M by $600M. | - ---- - -## Stage 1 — Predictive: substation load forecasting - -> `/rai-predictive-modeling` + `/rai-predictive-training` "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." - -**Method:** load max forecasted load per substation as `Substation.predicted_load`. The template aggregates `DemandForecast.predicted_load_mw` over forecast horizons (6/12/18/24 months) and writes the max back to the substation. A pre-trained GNN can replace the table lookup; the script falls back gracefully when the GNN model registry is unavailable. - -``` -Substation load forecast (max across 6/12/18/24-month horizons) - - Houston Ship Channel ──── pred 1,797 MW ████████████ +43.8% safe - Dallas-Fort Worth ──── pred 1,700 MW ██████████ +54.6% ▲ breach 24mo - San Antonio Metro ──── pred 1,069 MW ███████ +37.1% safe - Austin Energy ──── pred 819 MW █████ +32.1% safe - Waco Gateway ──── pred 600 MW ████ +22.4% safe - Corpus Christi Coast ──── pred 600 MW ████ +11.1% safe - Midland-Permian ──── pred 520 MW ███ +23.8% safe - El Paso Border ──── pred 470 MW ███ +14.6% safe - Abilene Central ──── pred 400 MW ███ +14.3% safe - Lubbock West Texas ──── pred 360 MW ██ +16.1% safe - Brownsville Valley ──── pred 355 MW ██ +10.9% safe - Amarillo Panhandle ──── pred 315 MW ██ +12.5% safe - ▲ - ──────────────────────────────────────────│────────────────────── - DFW is the only substation predicted to │ - breach capacity (1,700 vs 1,600 MW cap). │ - Stage 3 rules and Stage 4 capacity │ - constraint both consume this property. │ - ────────────────────────────────────────────────────────────────── - -✓ Substation.predicted_load written back to all 12 substations -``` - -**Stage 3 rule and Stage 4 constraint with the predictive term:** - -```python -# Both reuse the same effective_load expression -effective_load = Substation.predicted_load | Substation.current_load_mw -``` - -**Caveats:** -- The bundled forecasts are pre-computed in `data/demand_forecasts.csv` to keep the template self-contained. To wire in a live GNN, point `GNN(...)` at your model registry — the template's `try` block already handles both paths. -- DFW's projected breach partially encodes the stacked DC request itself ("we expect demand to keep rising at the substation everyone is targeting"). For an independent baseline, train on a pre-announcement slice and compare. - ---- - -## Stage 2 — Graph: grid topology & structural vulnerability - -> `/rai-graph-analysis` "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." - -**Construction** — `Substation` as the node concept directly (no mirror concept): -- Node concept: `Substation` (12 nodes) -- Edges: active `TransmissionLine` rows, `from_substation` → `to_substation` -- Direction: undirected, unweighted - -**Algorithms:** weakly connected components, Louvain community detection, betweenness/degree/eigenvector centrality. Combined rank picks the top-3 as `is_structurally_critical`. - -``` -Connectivity ──► 1 component (12 of 12 reachable) - -Louvain communities ──► 3 regions - - Region 1 — North Texas Dallas-Fort Worth, Austin Energy, Waco Gateway - Region 2 — West Texas Midland-Permian, Lubbock, El Paso, Amarillo, Abilene - Region 3 — Gulf Coast Houston Ship Channel, San Antonio Metro, - Corpus Christi Coast, Brownsville Valley - -Centrality (top-3 marked is_structurally_critical) - - #1 Dallas-Fort Worth betw 31.67 ████████████ [CRITICAL] - #2 Houston Ship Channel betw 15.83 ██████ [CRITICAL] - #3 San Antonio Metro betw 4.33 ██ [CRITICAL] - #4 Austin Energy betw ~3.0 - … - #12 Brownsville Valley betw 0.00 - - ────────────────────────────────────────────────────────────────── - DC requests targeting structurally critical substations: - Microsoft Horizon Campus (350 MW) ─► Houston [CRITICAL] - Meta Bayou DC (300 MW) ─► Houston [CRITICAL] - Google Metroplex DC (400 MW) ─► DFW [CRITICAL] - xAI Colossus Texas (500 MW) ─► DFW [CRITICAL] - Lambda Labs DFW (200 MW) ─► DFW [CRITICAL] - Amazon SA Cloud (280 MW) ─► San Antonio [CRITICAL] - Apple iCloud Texas (250 MW) ─► San Antonio [CRITICAL] - ──────────── - 7 of 10 DC requests sit on the 3 most structurally critical nodes. - ────────────────────────────────────────────────────────────────── - -✓ Substation.betweenness / degree_centrality / eigenvector_centrality -✓ Substation.grid_community written back (12 rows) -✓ Substation.is_structurally_critical written back (3 rows) -``` - ---- - -## Stage 3 — Rules: interconnection queue compliance - -> `/rai-rules-authoring` "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" - -Three declarative `Relationship` rules consume Stages 1–2 enrichments. Each is written as a `model.where(...).define(...)` block; a composite `is_compliant` fires only when none of the three failure flags fire. - -**Rule 1 — Capacity** (consumes `Substation.predicted_load` from Stage 1): - -```python -DataCenterRequest.fails_capacity = model.Relationship(...) -effective_load = SubRef.predicted_load | SubRef.current_load_mw -model.where( - DataCenterRequest.substation(SubRef), - DataCenterRequest.requested_mw + effective_load > SubRef.max_capacity_mw, -).define(DataCenterRequest.fails_capacity()) -``` - -**Rule 2 — Structural risk** (consumes `Substation.is_structurally_critical` from Stage 2): - -```python -model.where( - DataCenterRequest.substation(SubRef), - SubRef.is_structurally_critical(), -).define(DataCenterRequest.fails_structural()) -``` - -**Rule 3 — Low-carbon mandate** (zero-emission share at the substation must meet the request's requirement; nuclear + renewable count): - -```python -model.where( - DataCenterRequest.substation(SubRef), - (SubRef.low_carbon_gen_mw | 0.0) * 100 - < DataCenterRequest.low_carbon_requirement_pct * (SubRef.total_gen_mw | 0.001), -).define(DataCenterRequest.fails_low_carbon()) -``` - -``` - DC Request Hyper Q# MW Cap LowC Crit OK? - ────────────────────────────────────────────────────────────────────── - Microsoft Horizon Campus Microsoft 1 350 FAIL PASS FAIL N - Meta Bayou DC Meta 2 300 FAIL PASS FAIL N - Google Metroplex DC Google 3 400 FAIL PASS FAIL N - xAI Colossus Texas xAI 4 500 FAIL PASS FAIL N - Lambda Labs DFW Lambda Labs 5 200 FAIL PASS FAIL N - Amazon SA Cloud Amazon 6 280 FAIL PASS FAIL N - Apple iCloud Texas Apple 7 250 FAIL PASS FAIL N - CoreWeave Austin GPU CoreWeave 8 320 FAIL PASS PASS N - Crusoe Permian DC Crusoe Energy 9 180 PASS PASS PASS Y - Oracle Coastal DC Oracle 10 150 PASS PASS PASS Y - ────────────────────────────────────────────────────────────────────── - Summary: 2 compliant, 8 flagged -``` - -Every request passes low-carbon — ERCOT's nuclear (STP, Comanche Peak) plus its wind/solar fleet provides enough zero-emission generation. The two compliant requests sit on substations that are neither structurally critical (Stage 2) nor predicted to breach (Stage 1). Eight requests need either upgrades, redirection, or both — Stage 4 picks the optimal mix. - -``` -✓ DataCenterRequest.fails_capacity written back -✓ DataCenterRequest.fails_structural written back -✓ DataCenterRequest.fails_low_carbon written back -✓ DataCenterRequest.is_compliant written back (2 rows) -``` - ---- - -## Stage 4 — Prescriptive: joint DC approval + upgrade MIP - -> `/rai-prescriptive-problem-formulation` "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." - -``` -FORMULATION - - Scenario Concept - InvestmentLevel(name, budget_cap) 5 levels: $200M, $300M, $400M, $500M, $600M - - Decision variables (binary, indexed by InvestmentLevel) - DataCenterRequest.x_approve(InvestmentLevel) 50 binaries (10 DCs × 5 levels) - SubstationUpgrade.x_upgrade(InvestmentLevel) 50 binaries (10 upgrades × 5 levels) - - Constraints - 1. Substation capacity per InvestmentLevel - max_capacity − predicted_load + Σ(x_upgrade · capacity_increase) - ≥ Σ(x_approve · requested_mw) per (Substation, InvestmentLevel) - 2. Budget per InvestmentLevel - Σ(x_upgrade · cost_million) ≤ budget_cap per InvestmentLevel - - Objective (maximize) - Σ x_approve · annual_revenue_per_mw · requested_mw summed over all levels - -────────────────────────────────────────────────────────────────────── -SOLVE (HiGHS, single solve, all 5 levels in one MIP) → OPTIMAL -────────────────────────────────────────────────────────────────────── -``` - -``` -PARETO FRONTIER (queried directly from ontology) - - Level Budget DCs DC MW Revenue $/yr Upg $M Net Value - ────── ──────── ──── ────── ───────────── ──────── ─────────── - $200M $200M 4 1,000 $174,350,000 $190.0 $164,850,000 - $300M $300M 5 1,500 $279,350,000 $300.0 $264,350,000 ◄ KNEE - $400M $400M 6 1,800 $328,850,000 $385.0 $309,600,000 - $500M $500M 7 2,080 $376,450,000 $430.0 $354,950,000 - $600M $600M 8 2,330 $420,200,000 $505.0 $394,950,000 - - Per-level DC approvals (cumulative as budget increases) - - $200M: Microsoft (350) · CoreWeave (320) · Crusoe (180) · Oracle (150) ── 1,000 MW - $300M: + xAI Colossus (500) ── 1,500 MW - $400M: + Meta Bayou (300) ── 1,800 MW - $500M: + Amazon SA (280) ── 2,080 MW - $600M: + Apple iCloud (250) ── 2,330 MW - - Never approved at any level: - Google Metroplex DC (400 MW, DFW) - Lambda Labs DFW (200 MW, DFW) - ────────────────────────────────── - DFW is physically full once xAI takes 500 MW. - - Marginal value per $M added budget - - $200 → $300M ── $995K / $M ◄ knee — xAI Colossus unlocks at $300M - $300 → $400M ── $453K / $M - $400 → $500M ── $454K / $M - $500 → $600M ── $400K / $M - -✓ DataCenterRequest.x_approve(InvestmentLevel) written back -✓ SubstationUpgrade.x_upgrade(InvestmentLevel) written back - Each variable is now a queryable property of the model — no - parsing of solver output, no per-level re-solve loops. -``` - -### Reading the solve - -- **Knee at $300M.** xAI Colossus alone delivers $105M/yr ($210K × 500 MW), the single highest-revenue request. Unlocking it requires the $110M UPG-003 upgrade at DFW. Below $300M, xAI doesn't fit; above $300M, the next-best DCs (Meta, Amazon, Apple) deliver $0.4M/$M-ish — diminishing returns. -- **DFW is the binding bottleneck.** With xAI's 500 MW absorbing the post-upgrade headroom, neither Google (400 MW) nor Lambda (200 MW) ever fit — they share DFW. The optimizer correctly prioritizes revenue per MW over queue position. -- **All compliant requests get approved by $200M.** The 2 fully compliant requests from Stage 3 (Crusoe, Oracle) sit on sub-critical substations; they're cheap upgrades and clear the bar at the lowest budget. -- **`predicted_load` propagates the chain.** Capacity slack at every substation reflects Stage 1's forecast, not raw historical load — the same signal the rules engine used. - ---- - -## Stage 5 — Interpretation - -> `/rai-prescriptive-results-interpretation` "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" - -``` -THE PLAN, IN BUSINESS TERMS - - • Recommended budget: $300M (the knee). - 5 DCs, 1,500 MW connected, $279M/yr revenue, $264M net value. - • Marginal $100M from $200M unlocks xAI Colossus alone — $105M/yr — at - $995K return per $1M spent. Beyond the knee, marginal return roughly - halves. - • Google (400 MW) and Lambda (200 MW) are structurally infeasible at any - budget level — DFW is full once xAI takes its 500 MW. Either redirect - them to a different substation, or de-prioritize. - • Crusoe (Midland) and Oracle (Corpus Christi) approve cheaply at any - budget — they were the only two that passed all three rules in Stage 3. - -ONTOLOGY ENRICHMENT — what each stage contributed back - - Stage 1 (predictive) Substation.predicted_load [12] - - Stage 2 (graph) Substation.betweenness [12] - Substation.degree_centrality [12] - Substation.eigenvector_centrality [12] - Substation.grid_community [12] - Substation.is_structurally_critical [3] - - Stage 3 (rules) DataCenterRequest.fails_capacity [8] - DataCenterRequest.fails_structural [7] - DataCenterRequest.fails_low_carbon [0] - DataCenterRequest.is_compliant [2] - Substation.low_carbon_gen_mw [12] - Substation.total_gen_mw [12] - - Stage 4 (prescriptive) DataCenterRequest.x_approve [50] - SubstationUpgrade.x_upgrade [50] - - ────────────────────────────────────────────────────────────────── - Each stage reads what the previous stage wrote. - Re-running any downstream stage automatically picks up enrichments. - No glue code, no DataFrame round-trip — same ontology throughout. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## The chain — accretive ontology enrichment - -``` -THE ENERGY GRID PLANNING CHAIN - - STAGE 1 PREDICTIVE - "Where is demand growing? Who breaches first?" - reads: DemandForecast.predicted_load_mw, Substation.current_load_mw - writes: Substation.predicted_load ── per substation - │ - ▼ - STAGE 2 GRAPH (WCC / Louvain / centrality) - "How is the grid connected? Which substations are structural bottlenecks?" - reads: Substation nodes, TransmissionLine edges (active) - writes: Substation.betweenness / degree_centrality / eigenvector_centrality - Substation.grid_community ── 3 regions - Substation.is_structurally_critical ── 3 substations - │ - ▼ - STAGE 3 RULES (declarative Relationships) - "Which DC requests pass capacity, structural, low-carbon checks?" - reads: Substation.predicted_load ◄── Stage 1 - Substation.is_structurally_critical ◄── Stage 2 - Generator.emissions_rate, DataCenterRequest.low_carbon_requirement_pct - writes: DataCenterRequest.fails_capacity / fails_structural / fails_low_carbon - DataCenterRequest.is_compliant ── 2 requests - │ - ▼ - STAGE 4 PRESCRIPTIVE (HiGHS MIP, Scenario Concept) - "Which DCs to approve and which upgrades to fund across 5 budget levels?" - reads: Substation.predicted_load ◄── Stage 1 - SubstationUpgrade.cost_million / capacity_increase_mw - DataCenterRequest.requested_mw / annual_revenue_per_mw - InvestmentLevel.budget_cap ── 5 levels - writes: DataCenterRequest.x_approve(InvestmentLevel) - SubstationUpgrade.x_upgrade(InvestmentLevel) - │ - ▼ - Pareto frontier, - queryable directly from the ontology. - - ────────────────────────────────────────────────────────────────── - No glue. No DataFrame ping-pong. No per-level re-solve loop. - Four reasoners, one ontology, one accretive thread. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## Why the chain matters (vs. any single stage) - -| Stage alone | What it tells you | What it doesn't | -|---|---|---| -| Predictive | "DFW will breach in 24 months" | What to do; which requests matter | -| Graph alone | "DFW, Houston, San Antonio are bottlenecks" | Whether they have headroom; what to approve | -| Rules alone | (won't fire — no `predicted_load`, no `is_structurally_critical`) | Pipeline misses | -| Prescriptive alone | (no flagged set, no critical-node info, no forecast) | Whole pipeline misses | - -| Combined | Output | -|---|---| -| Predictive → Graph | Forecasted load + structural bottleneck map | -| + Rules | Per-request compliance vs. capacity / structural / low-carbon | -| + Prescriptive | Pareto frontier across 5 budget levels in one solve | - -**Multi-reasoner chaining grounded in (and contributing to) the ontology.** - ---- - -## Adapting this recipe to a new domain - -The chain pattern transfers cleanly. To rebuild for a different problem: - -1. Re-run `rai-discovery` on the new business question — does it actually - need all 4 reasoner families, or is one or two sufficient? -2. Strip the demo ontology to the concepts the new chain needs (lean is - better for type inference and solver compile time). -3. Stage 1 (Predictive) is optional — if you have forecast tables already, - a simple `aggs.max(...).per(...)` derived property is enough; swap in - a GNN later by pointing the predictive reasoner at your model registry. -4. Stages 2–4 are the load-bearing chain: graph centrality flags - structurally critical nodes, rules consume both the predictive forecast - and the criticality flag to fail or pass each request, and the - prescriptive MIP reads the same forecast as a capacity baseline while - indexing decision variables by an `InvestmentLevel` Scenario Concept so - one solve produces the full Pareto frontier. -5. Keep the validation checks at every stage: assert the predicted-load - write covers all nodes, the top-N critical set looks plausible against - the topology, the compliance table has at least one PASS and one FAIL, - and the optimizer reports OPTIMAL with a non-zero objective. - -The shape this template demonstrates — *each reasoner writes a property -the next reasoner reads* — is what makes the chain accretive rather than -serial. The agent skills are how you reliably author each link. - ---- - -## Data Reference - -**Substations with DC requests:** - -| Substation | Location | Capacity | DC Requests | DC MW | -|------------|----------|----------|-------------|-------| -| SUB-001 | Houston Ship Channel | 1,800 MW | Microsoft (350), Meta (300) | 650 MW | -| SUB-002 | Dallas-Fort Worth | 1,600 MW | Google (400), xAI (500), Lambda (200) | 1,100 MW | -| SUB-003 | San Antonio Metro | 1,200 MW | Amazon (280), Apple (250) | 530 MW | -| SUB-004 | Austin Energy | 900 MW | CoreWeave (320) | 320 MW | -| SUB-005 | Midland-Permian | 1,100 MW | Crusoe (180) | 180 MW | -| SUB-007 | Corpus Christi Coast | 800 MW | Oracle (150) | 150 MW | - -**ERCOT regions (Louvain):** North Texas (DFW, Austin, Waco) | West Texas (Midland, Lubbock, El Paso, Amarillo, Abilene) | Gulf Coast (Houston, San Antonio, Corpus Christi, Brownsville) +## Workflow -**DFW breach:** 1,600 MW capacity, 1,700 MW predicted (24mo), +54.6% growth, 1,100 MW DC requests stacked on top. Google and Lambda permanently infeasible. +| # | Skill + prompt | Expected output | +|---|----------------|-----------------| +| 0 | `/rai-discovery` "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" | Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. | +| 1 | `/rai-predictive-modeling` + `/rai-predictive-training` "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." | `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). | +| 2 | `/rai-graph-analysis` "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." | 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. | +| 3 | `/rai-rules-authoring` "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" | `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. | +| 4 | `/rai-prescriptive-problem-formulation` "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." | OPTIMAL MIP across 5 `InvestmentLevel` values in one solve; `x_approve` and `x_upgrade` written back per level. | +| 5 | `/rai-prescriptive-results-interpretation` "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" | Pareto frontier with knee at $300M (5 DCs, 1,500 MW, $264M net); marginal $995K/$M at knee, declining to $400K/$M by $600M; Google + Lambda never approved (DFW full). | -**Upgrades:** 10 available, $630M total, 2,900 MW combined capacity. Only $300M of upgrades are needed at the knee. +## Data -- **Source data**: bundled CSVs in `../data/` (12 substations, 15 generators, 18 transmission lines, 10 DC requests, 10 upgrade options, plus historical load and forecast tables). -- **Ontology**: defined in `../energy_grid_planning.py` (13 concepts). -- **Stages**: implemented in `../energy_grid_planning.py` as a single combined script with stage banners. +Bundled CSVs in `../data/`: 12 substations, 15 generators, 18 transmission lines, 10 DC requests (2,930 MW), 10 substation upgrades ($630M total), plus historical load and forecast tables. Full chain implemented in `../energy_grid_planning.py`. diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index ca6201c..d4efa03 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -1,12 +1,8 @@ # Runbook: Machine Maintenance — Multi-Reasoner Walkthrough -Walk-through of the chained-reasoner pattern this template is built on. One realistic plant-floor thread — **scheduling preventive maintenance for a 30-machine, 3-plant operation** — traced across querying, graph, rules, and prescriptive reasoners, each stage writing properties back to the same ontology that downstream stages consume. +Schedules preventive maintenance for a 30-machine, 3-plant operation. OEE alone misranks the plants; sensor counts don't quantify forward risk; rules flag machines but don't allocate scarce technician time; the optimizer produces a feasible schedule but can't see that all Turbine techs sit in one city. The chain threads querying, graph, rules, and prescriptive reasoners through one ontology so each stage's enrichments feed the next. -The template's combined script (`machine_maintenance.py`) implements all five stages directly; this runbook expands the surrounding narrative — what each prompt asks, what shape of output to expect, and how each enrichment feeds the next — so a reader can follow the reasoning thread end-to-end without re-running the script. - ---- - -## TL;DR — the chain in one screen +## The chain ``` Plant_B looks worst on OEE (61.4%). Plant_A looks mid-tier (68.2%). @@ -43,246 +39,17 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ───────────────────────────────────────────────────────────────── ``` -A single-reasoner approach can't surface this. OEE alone says Plant_B is the problem. Sensor counts alone don't quantify forward risk. Rules alone flag machines but don't allocate scarce technician time. The optimizer alone produces a feasible schedule — but doesn't know that a single weather event in Houston blocks all on-site Turbine work. - ---- - -## How to read this runbook - -This runbook serves two audiences: - -- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. -- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data, and the agent will reproduce the stage. - ---- - -## Step 0 — Scope the question with `rai-discovery` - -> `/rai-discovery` "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" - -Discovery classifies the question by reasoner family and tells you which downstream skills to load: - -| Sub-question | Reasoner | Skill | -|---|---|---| -| Where does the operation actually hurt — OEE, anomalies, failure trajectories? | Querying / Descriptive | `rai-querying` | -| Which machines are scheduling bottlenecks given shared technician pools? | Graph | `rai-graph-analysis` | -| Which machines are overdue, high-risk, chronic, or composite-Critical? | Rules | `rai-rules-authoring` | -| What's the optimal maintain-and-assign plan across 4 periods? | Prescriptive | `rai-prescriptive-problem-formulation` | -| Where is the schedule structurally fragile, and what cross-training fixes it? | Prescriptive (re-solve / interpretation) | `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` | - -Discovery's output is a *plan*, not code. Everything that follows materializes that plan. - ---- - -## Setup - -See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `machine_maintenance.py`. - -**Prerequisites** - -- Template's `data/` CSVs available (or your own Snowflake schema with equivalent tables — `machines`, `technicians`, `qualifications`, `availability`, `parts_inventory`, `certification_expiry`, `sensors`, `sensor_readings`, `failure_predictions`, `downtime_events`, `production_runs`, `training_options`) -- `raiconfig.yaml` pointing at your RAI engine -- Python ≥ 3.10 with `relationalai >= 1.0.14` - ---- - ## Workflow -The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skill order. Each row maps to a section of the script. - -| # | Step | Skill | Prompt | Expected Output | -|---|------|-------|--------|-----------------| -| 1 | Build ontology | `/rai-build-starter-ontology` | "Build a RAI ontology for a manufacturing maintenance scheduling problem from the CSVs in `data/`. Concepts: Machine, Technician, Qualification, Period, MachinePeriod, TechnicianPeriod, TechnicianMachinePeriod, PartsInventory, CertificationExpiry, Sensor, SensorReading, FailurePrediction, DowntimeEvent, ProductionRun." | Model `machine_maintenance` with 14 user-facing concepts. 30 Machine rows (3 plants × 10 machines, 5 types × 6). 10 Technician rows (4 Chicago_IL, 3 Houston_TX, 3 Phoenix_AZ). 16 Qualification rows. 4 Period rows. | -| 2 | Discovery | `/rai-discovery` | "What questions can we answer with this ontology? We want to schedule preventive maintenance and surface hidden operational risk." | Querying: OEE by facility, anomaly counts, failure trajectory deltas. Graph: machine dependency clusters, bottleneck centrality on shared-technician edges. Rules: overdue, high-risk, chronic-downtime flags chained to a composite risk tier. Prescriptive: maintain × period × technician assignment minimizing failure + labor + travel cost. Resilience: post-solve concentration analysis. | -| 3 | Stage 0 — OEE proxy | `/rai-querying` | "Compute OEE proxy (Performance × Quality) by facility. Performance is total_actual / total_planned across ProductionRuns; Quality is total_good / total_actual." | Plant_C: Perf 81.3%, Qual 98.1%, OEE 79.8%. Plant_A: Perf 69.8%, Qual 97.8%, OEE 68.2%. Plant_B: Perf 62.6%, Qual 98.1%, OEE 61.4%. Quality is uniform; Performance is the differentiator. | -| 4 | Stage 0 — Sensor anomalies | `/rai-querying` | "List machines with above-threshold sensor readings (`SensorReading.is_anomaly == 1`), grouped by facility." | 9 anomaly readings across 5 machines. Plant_A: 7 (M013 Pump:3, M001 Turbine:2, M016 Turbine:2). Plant_B: 1 (M002 Compressor). Plant_C: 1 (M006 Turbine). Plant_A's anomaly load is 7× Plant_B's despite Plant_A's higher OEE. | -| 5 | Stage 0 — Failure trajectories | `/rai-querying` | "For each machine, compute the failure-probability delta from period 1 to period 4 from `FailurePrediction`. Show the steepest 6." | M001 (Turbine, Plant_A): 0.102 → 0.332 (+0.230, bearing_wear). M013 (Pump, Plant_A): 0.435 → 0.663 (+0.228, impeller_erosion). M016 (Turbine, Plant_A): 0.263 → 0.482 (+0.219, bearing_wear). All three steepest curves are Plant_A. Stored back as `MachinePeriod.predicted_fp` (120 rows) for Stage 3's objective. | -| 6 | Stage 1 — Dependency graph | `/rai-graph-analysis` | "Build a graph with `Machine` as `node_concept`. Two machines are adjacent when at least one technician is qualified for both machine types. Run weakly connected components." | 30 nodes, edges joined via `Qualification`. WCC: 1 cluster of 30 — every machine is reachable from every other through shared qualifications. No isolated subgraphs. | -| 7 | Stage 1 — Bottleneck centrality | `/rai-graph-analysis` | "Compute betweenness centrality on the dependency graph. Normalize and store as `Machine.betweenness`." | Pump-type machines tie at the top (raw betweenness 24.0 → normalized 1.0): M003 (Plant_C), M008 (Plant_B), M013 (Plant_A). Turbines, Generators, Motors, Compressors lower. `Machine.betweenness` written back for all 30 machines and consumed by Stage 3's failure-cost multiplier. | -| 8 | Stage 2 — Compliance flags | `/rai-rules-authoring` | "Define seven derived flags: overdue (`remaining_useful_life < maintenance_duration_hours`), high-risk (`failure_probability > 0.3 AND criticality >= 4`), anomalous (`anomaly_count > 0`), chronic-downtime (`downtime_event_count > 8`), parts-reorder (`stock_level <= min_order_qty`), expiring-cert (`days_remaining < 30`). Use `model.where(...).define(...)`." | Overdue (6): M002, M006, M013, M016, M022, M025 (RUL below required maintenance hours). High-risk (1): M013 (fp=0.435, crit=4). Anomalous (5): M013, M001, M016, M002, M006. Chronic downtime (3, threshold > 8 events): M001 (12 events), M016 (11), M013 (10). Parts reorder (4): P001, P003, P004, P006. Expiring certs (5): T001 Compressor 22d, T004 Pump 8d, T003 Compressor 15d, T006 Motor 25d, T009 Motor 12d. | -| 9 | Stage 2 — Composite risk tier | `/rai-rules-authoring` | "Chain `is_chronic_downtime`, `is_high_risk`, `is_overdue_maintenance` into `Machine.risk_tier`: Critical if all three, Elevated if exactly two, Standard otherwise. Enumerate all 8 combinations using `model.not_()` for negation." | Critical (1): M013 (Pump, Plant_A) — chronic + high-risk + overdue. Elevated (1): M016 (Turbine, Plant_A) — chronic + overdue, not high-risk. Standard (28): rest. Plant_A holds the only Critical and the only Elevated machine despite ranking second on OEE. | -| 10 | Stage 3 — Formulation | `/rai-prescriptive-problem-formulation` | "Formulate the maintenance schedule. Decision variables: `MachinePeriod.x_maintain` (bin), `MachinePeriod.x_vulnerable` (bin), `TechnicianMachinePeriod.x_assigned` (bin) — restricted to qualified pairs. Constraints: cumulative coverage (`Σ x_maintain[m,1..τ] + x_vulnerable[m,τ] = 1` per machine and period), assignment-maintenance linkage (`Σ x_assigned over techs = x_maintain` per (m,τ)), technician hour capacity (`Σ x_assigned · duration ≤ available_hours`), parts/bay capacity (`≤ 5 jobs per period`), and overdue deadline (`Σ x_maintain[m, τ ≤ 2] ≥ 1` for every overdue machine — feeds from Stage 2). Objective: minimize failure_cost + labor_cost + travel_cost." | 120 `x_maintain` binaries (30 machines × 4 periods). 120 `x_vulnerable` binaries. ~250 `x_assigned` binaries (qualification-restricted). 5 constraint families. Failure cost uses `MachinePeriod.predicted_fp` (Stage 0) × `Machine.criticality` × `(1 + 2.0 × Machine.betweenness)` (Stage 1). | -| 11 | Stage 3 — Solve | `/rai-prescriptive-solver-management` | "Solve with HiGHS, time limit 120s, assert OPTIMAL." | OPTIMAL. Objective = $605,240.61. 20 maintenance jobs scheduled across 4 periods (capacity-binding at 5 jobs/period). Both Plant_A overdue Turbines (M016 plus the rest of the overdue list) maintained by period 2 — overdue constraint satisfied. | -| 12 | Stage 3 — Schedule readout | `/rai-prescriptive-results-interpretation` | "Show the period-by-period schedule and technician assignments. Flag any travel (`base_location != machine.location`)." | Period 1 includes M002 (Plant_B), M006 (Plant_C), M013 (Plant_A), M016 (Plant_A) — high-priority overdue/critical machines. Periods 2–4 cover the remaining 16 jobs. Multiple Turbine assignments require travel because all 3 Turbine-qualified techs (T001, T002, T003) are based in Houston_TX while Turbines exist at all 3 plants. Travel cost is paid at $50/hr × duration. | -| 13 | Stage 4 — Concentration analysis | `/rai-graph-analysis`, `/rai-querying` | "From the qualification table, find machine types whose qualified technicians are all in one location. For each concentrated type, count how many scheduled jobs require travel." | Compressor: techs in Chicago_IL, Houston_TX (gap: Phoenix_AZ). Generator: Chicago_IL, Phoenix_AZ (gap: Houston_TX). Motor: Chicago_IL, Phoenix_AZ (gap: Houston_TX). Pump: Chicago_IL, Phoenix_AZ (gap: Houston_TX). **Turbine: all 3 techs in Houston_TX — CONCENTRATED.** Of 3 scheduled Turbine jobs, 2 require travel (67%). 4 of 6 Turbines are at remote plants. The optimizer found the cheapest plan but cannot fix the structural fragility — losing T001's Compressor cert (22 days remaining) doesn't break Turbines, but losing any of T001/T002/T003 from Houston shrinks Turbine coverage by a third. | -| 14 | Stage 4 — Cross-training recommendation | `/rai-prescriptive-results-interpretation` | "From `training_options.csv`, find the cheapest Turbine-cross-training candidate based outside Houston_TX." | Best candidate: **T006 (Fiona_Garcia, Senior, Chicago_IL) — $3,200 / 5 weeks.** Other non-Houston options: T005 ($3,500/6w, Chicago), T008 ($3,800/6w, Phoenix), T009 ($4,200/8w, Phoenix), T004 ($5,500/10w, Chicago). Training T006 adds the first non-Houston Turbine tech, eliminates the single-point-of-failure for Plant_B and Plant_C Turbines, and pays back the first time travel or a cert lapse would have idled a Turbine job. The prescriptive reasoner produced the schedule; the resilience layer produced the structural action item. | - ---- - -## Stage 0 — Querying: operational intelligence - -> `/rai-querying` "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" - -This stage establishes the operational baseline. Plant_C leads at 79.8% OEE; Plant_B trails at 61.4%. But Plant_A — middle of the OEE pack at 68.2% — owns 7 of 9 sensor anomalies and the three steepest failure trajectories (M001, M013, M016). The querying stage writes nine derived properties on `Machine` plus `MachinePeriod.predicted_fp` (120 rows), and Stage 3 reads `predicted_fp` directly into the failure-cost objective term. - -## Stage 1 — Graph: dependency clusters and bottleneck centrality - -> `/rai-graph-analysis` "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." - -The 30 machines form a single connected component — every machine is reachable through shared qualifications. Pump-type machines tie at the top of betweenness (raw 24.0, normalized 1.0): M003 (Plant_C), M008 (Plant_B), M013 (Plant_A). The normalized centrality is consumed by Stage 3's failure-cost multiplier `(1 + 2.0 × betweenness)`, so leaving a bottleneck Pump vulnerable is markedly more expensive than leaving a peripheral Motor vulnerable. - -## Stage 2 — Rules: compliance flags and composite risk tier - -> `/rai-rules-authoring` "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." - -Six machines overdue, one high-risk (M013), three chronic-downtime, four parts-reorder, five expiring certs. The composite tier surfaces a single Critical machine — M013 (Pump, Plant_A) — and a single Elevated machine — M016 (Turbine, Plant_A). The overdue flag is consumed by Stage 3 as a hard constraint: every overdue machine must be scheduled by period 2. - -## Stage 3 — Prescriptive: maintenance schedule - -> `/rai-prescriptive-problem-formulation` "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." - -The solver returns OPTIMAL with objective $605,240.61 and 20 maintenance jobs across the four periods (capacity-binding at 5 jobs/period). M013 and M016 — Plant_A's Critical and Elevated machines — are both scheduled by period 1, satisfying the overdue deadline. Several Turbine assignments require travel because all three Turbine-qualified techs are based in Houston_TX while four of six Turbines sit at Plant_A and Plant_C. The optimizer pays the travel cost; it cannot restructure the qualification pool. - -## Stage 4 — Resilience: concentration sweep and cross-training - -> `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "For each machine type, are all qualified technicians concentrated in one location? How many scheduled jobs required travel, and what's the cheapest cross-training option to eliminate the single-point-of-failure?" - -Turbine is the concentrated type — all three qualified techs (T001, T002, T003) sit in Houston_TX, and 67% of scheduled Turbine jobs already require travel. The recommended fix: cross-train T006 (Senior, Chicago_IL) for $3,200 over 5 weeks. That single addition eliminates the Houston single-point-of-failure for Turbine work at Plant_B and Plant_C, and pays back the first time a weather event, illness, or expiring cert would have idled a Turbine job that the optimizer would otherwise have left uncovered. - ---- - -## Stage outputs — what each reasoner contributes back - -``` -ONTOLOGY ENRICHMENT — what each stage wrote back - - Stage 0 (querying) Machine.total_planned_qty [30] - Machine.total_actual_qty [30] - Machine.total_good_qty [30] - Machine.performance_ratio [30] - Machine.quality_ratio [30] - Machine.total_downtime_minutes [30] - Machine.downtime_event_count [30] - Machine.anomaly_count [30] - MachinePeriod.predicted_fp [120] - - Stage 1 (graph) Machine.betweenness_raw [30] - Machine.betweenness (normalized) [30] - - Stage 2 (rules) Machine.is_overdue_maintenance [6] - Machine.is_high_risk [1] - Machine.is_anomalous [5] - Machine.is_chronic_downtime [3] - Machine.risk_tier [30] - PartsInventory.needs_reorder [4] - CertificationExpiry.is_expiring [5] - - Stage 3 (prescriptive) MachinePeriod.x_maintain [120 binary] - MachinePeriod.x_vulnerable [120 binary] - TechnicianMachinePeriod.x_assigned - - Stage 4 (analysis) (terminal — prints concentration risk and - costed cross-training recommendation) - - ────────────────────────────────────────────────────────────────── - Each stage reads what the previous stage wrote. - Re-running any downstream stage automatically picks up enrichments. - No glue code, no DataFrame round-trip — same ontology throughout. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## The chain — accretive ontology enrichment - -``` -THE MACHINE-MAINTENANCE CHAIN - - STAGE 0 QUERYING - "Where does the operation actually hurt?" - reads: ProductionRun, SensorReading, FailurePrediction - writes: Machine.performance_ratio / quality_ratio - Machine.anomaly_count - Machine.downtime_event_count / total_downtime_minutes - MachinePeriod.predicted_fp - │ - ▼ - STAGE 1 GRAPH (betweenness centrality) - "Which machines are scheduling bottlenecks?" - reads: Qualification, Machine (as node_concept) - writes: Machine.betweenness ── normalized 0..1 - │ - ▼ - STAGE 2 RULES - "Which machines violate which compliance flags?" - reads: Machine.failure_probability, Machine.criticality, - Machine.remaining_useful_life, Machine.maintenance_duration_hours, - Machine.anomaly_count, Machine.downtime_event_count, - PartsInventory.stock_level, CertificationExpiry.days_remaining - writes: Machine.is_overdue_maintenance / is_high_risk / - is_chronic_downtime / is_anomalous - Machine.risk_tier ── Critical / Elevated / Standard - PartsInventory.needs_reorder - CertificationExpiry.is_expiring - │ - ▼ - STAGE 3 PRESCRIPTIVE (HiGHS MIP) - "What's the optimal maintain-and-assign plan?" - reads: MachinePeriod.predicted_fp ──► failure cost (period-specific) - Machine.betweenness ──► failure cost multiplier - Machine.is_overdue_maintenance ──► hard deadline constraint - Qualification (assignment scope), TechnicianPeriod.capacity_hours - writes: MachinePeriod.x_maintain ── 20 jobs flagged - MachinePeriod.x_vulnerable - TechnicianMachinePeriod.x_assigned - │ - ▼ - STAGE 4 RESILIENCE - "Where is the schedule structurally fragile?" - reads: Solution variables, Qualification, Technician, TrainingOption - writes: (terminal — concentration risk + cross-training recommendation) - - ────────────────────────────────────────────────────────────────── - No glue. No DataFrame ping-pong. No re-derivation per-reasoner. - Five reasoners, one ontology, one accretive thread. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## Why the chain matters (vs. any single stage) - -| Stage alone | What it tells you | What it doesn't | -|---|---|---| -| Querying | "Plant_B has worst OEE; Plant_A has the most anomalies" | Whether anomalies translate to scheduling priority | -| Graph alone | "Pumps are the most central machine type" | Which Pump matters most or what to do | -| Rules alone | "M013 is Critical-tier" | How to fit M013 plus 19 others into a 4-period horizon with 10 techs | -| Prescriptive alone | (won't run — no per-period failure curve, no centrality weight, no overdue flag) | Whole pipeline misses | -| Resilience alone | (won't run — no schedule to analyze) | — | - -| Combined | Output | -|---|---| -| Querying → Graph | Per-machine signals plus structural bottleneck weight | -| + Rules | Composite risk tier + a hard "must maintain by period 2" constraint | -| + Prescriptive | $605K plan, 20 jobs, Plant_A's Critical machine handled in Period 1 | -| + Resilience | Cross-train T006 for $3,200 → eliminates Houston Turbine concentration | - -**Multi-reasoner chaining grounded in (and contributing to) the ontology.** - ---- - -## Adapting this recipe to a new domain - -The chain pattern transfers cleanly. To rebuild for a different scheduling-with-resilience problem: - -1. Re-run `rai-discovery` on the new business question — does it actually need all five reasoner families, or is one or two sufficient? A pure dispatch problem may only need querying + prescriptive; a pure compliance problem may only need rules. -2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). Keep the cross-product concept (`MachinePeriod`-equivalent) — it's where most decision variables and per-period derived properties live. -3. Stage 1 (querying) is required scaffolding: the optimization objective leans on derived per-period signals, not raw inputs. -4. Stages 2–5 are the load-bearing chain: graph centrality writes a multiplier the cost objective consumes; rules write a hard deadline the optimizer must satisfy; the optimizer writes solution variables the resilience sweep reads; the resilience layer doesn't re-solve a new problem — it stress-tests the structure underneath the existing solve and recommends a structural fix. -5. Keep the validation checks at every stage: assert flagged-set size, betweenness top-N looks plausible, the OPTIMAL gate, the objective is not zero, and the resilience pass surfaces at least one actionable recommendation when concentration exists. - -The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. - ---- - -## Optional extension — operator-shift assignment - -A second optimization pass — assigning operators to line-shifts to maximize a skill-match bonus, subject to a same-facility constraint — is a natural follow-on but not part of this template's main script. It would consume the same `Machine` and facility ontology and add `Operator`, `Shift`, and `OperatorShift` concepts. Out of scope for this runbook. - ---- +| # | Skill + prompt | Expected output | +|---|----------------|-----------------| +| 0 | `/rai-discovery` "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" | Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. | +| 1 | `/rai-querying` "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" | Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4%; 7 of 9 anomalies at Plant_A; `MachinePeriod.predicted_fp` written for 120 rows. | +| 2 | `/rai-graph-analysis` "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." | 30 machines → 1 connected component; Pumps tie at top betweenness (24.0 raw, 1.0 normalized); `Machine.betweenness` stored. | +| 3 | `/rai-rules-authoring` "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." | 6 overdue, 1 high-risk, 3 chronic; M013 (Pump, Plant_A) = Critical; M016 (Turbine, Plant_A) = Elevated. | +| 4 | `/rai-prescriptive-problem-formulation` "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." | 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × criticality × (1 + 2.0 × betweenness)`. | +| 5 | `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Solve with HiGHS, then for each machine type check whether all qualified techs sit in one location and recommend the cheapest cross-training fix." | OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. | -## Data Reference +## Data -- **Source data**: bundled CSVs in `../data/` (30 machines across 3 plants × 5 machine types, 10 technicians across 3 cities, 16 qualifications, 4 planning periods, 60 sensors with 240 readings, 120 per-period failure predictions, 129 downtime events, 120 production runs, 13 cross-training options). To run against your own Snowflake schema instead, swap the `read_csv(...)` loads for `model.Table(...)` references in `machine_maintenance.py`; the rest of the pipeline is unchanged. -- **Stages**: implemented in `../machine_maintenance.py` as a single combined script with stage banners (Stage 0 → Stage 4). -- **Ontology**: 14 user-facing concepts. Run `inspect.schema(model)` after the pipeline (see template README) to dump the full concept/property/relationship surface, filtering out reasoner-owned concepts (`Variable`, `Constraint`, etc.) and the auto-generated `graph_Edge` from Stage 1. +Bundled CSVs in `../data/`: 30 machines (3 plants × 5 types), 10 technicians (3 cities), 16 qualifications, 4 periods, 60 sensors / 240 readings, 120 failure predictions, 129 downtime events, 120 production runs, 13 training options. All five stages run in `../machine_maintenance.py`. diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index 3df9efa..4a89259 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -1,12 +1,8 @@ # Runbook: Portfolio Balancing — Multi-Reasoner Walkthrough -Walk-through of the chained-reasoner pattern this template is built on. One realistic portfolio thread — **rebalance an 8-stock universe under compliance + crisis stress** — traced across rules, graph, and prescriptive reasoners, each stage writing properties back to the same ontology that downstream stages consume. +Rebalance an 8-stock book under compliance + crisis stress. Rules surface broken positions, graph collapses redundant bets via correlation clustering, prescriptive solves a Markowitz QP across 6 (budget, regime) scenarios. No single reasoner does all three: rules don't allocate, graph doesn't optimize, prescriptive on the full universe stacks near-duplicate cluster members. -The template's combined script (`portfolio_balancing.py`) implements all four stages directly; this runbook expands the surrounding narrative — what each prompt asks, what shape of output to expect, and how each enrichment feeds the next — so a reader can follow the reasoning thread end-to-end without re-running the script. - ---- - -## TL;DR — the chain in one screen +## The chain ``` The current book breaks compliance on 4 holdings + 2 sectors. Naive @@ -39,454 +35,17 @@ base at every lambda — without the cluster collapse, the gap would grow. ───────────────────────────────────────────────────────────────── ``` -A single-reasoner approach can't answer this. Rules alone flag broken positions but don't rebuild the book. Graph alone clusters stocks but doesn't allocate. Prescriptive alone solves a Markowitz QP — but on the full universe it stacks near-duplicate cluster members, and on a single covariance it ignores regime risk. Each stage feeds the next: rules surface the violations, graph collapses redundant bets, prescriptive optimizes over the reduced universe across both regimes in one solve. - ---- - -## How to read this runbook - -This runbook serves two audiences: - -- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. -- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data, and the agent will reproduce the stage. - ---- - -## Step 0 — Scope the question with `rai-discovery` - -> `/rai-discovery` "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" - -Discovery classifies the question by reasoner family and tells you which downstream skills to load: - -| Sub-question | Reasoner | Skill | -|---|---|---| -| Where is the current book breaking compliance (per-stock, per-sector, per-trader)? | Rules | `rai-rules-authoring` | -| Which stocks are really the same bet (correlation clusters + cluster representative)? | Graph | `rai-graph-analysis` | -| What's the optimal allocation under position + sector caps for each (budget, regime) scenario? | Prescriptive | `rai-prescriptive-problem-formulation` | -| How does the optimal portfolio degrade under a PSD-preserving crisis covariance? | Prescriptive (re-solve) | `rai-prescriptive-solver-management` + `rai-prescriptive-results-interpretation` | - -Discovery's output is a *plan*, not code. Everything that follows materializes that plan. - ---- - -## Setup - -See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `portfolio_balancing.py` against the bundled CSVs in `../data/`. - -**Prerequisites** - -- Template's `data/` CSVs available — `returns.csv` (8 stocks), `covar.csv` (64 covariance entries), `users.csv`, `accounts.csv`, `holdings.csv`, `transactions.csv`. Or your own Snowflake schema with equivalent tables. -- `raiconfig.yaml` pointing at your RAI engine -- Python >= 3.10 with `relationalai >= 1.0.14` - ---- - ## Workflow -The runbook walks the same chain stage-by-stage, prompt-by-prompt, in agent-skill order. Each row maps to a section of the script. - -| # | Step | Skill | Prompt | Expected Output | -|---|------|-------|--------|-----------------| -| 1 | Build ontology | `/rai-build-starter-ontology` | "Build a RAI ontology for portfolio balancing from the CSVs in `data/`. Concepts: Stock (index, ticker, sector, returns, covar), Sector (derived from Stock sectors), User (with risk_score), Account (balance, account_type), Holding (quantity, purchase_price, value), Transaction (amount, category, flagged indicator)." | Model `portfolio` with 6 user-facing concepts. 8 Stocks across 5 sectors (Technology x3, Healthcare x2, Financials, Energy, Consumer Staples). 6 Users, 4 Accounts, 15 Holdings, 21 Transactions. Pairwise `Stock.covar(i, j)` two-arg property loaded from the long-form covariance CSV (64 entries, symmetric). | -| 2 | Discovery | `/rai-discovery` | "What questions can we answer with this ontology? We want to surface compliance violations on the current book, then rebuild it under a Markowitz objective with crisis-regime stress." | Rules: position-limit, sector-concentration, high-risk-trader flags as derived Relationships. Graph: covariance clustering (Louvain on |correlation| >= threshold) -> per-cluster representative by Sharpe -> investable-universe collapse. Prescriptive: bi-objective QP via epsilon constraint over the representative-only universe, indexed by a `Scenario` Concept that combines budget and regime. Stress: same `solve_epsilon` call under a PSD-preserving crisis covariance. Predictive: DATA_GAP (no time series). | -| 3 | Stage 1 — Compliance: overconcentrated holdings | `/rai-rules-authoring` | "Define `Holding.value = quantity * purchase_price`, then flag `Holding.is_overconcentrated` whenever `value > POSITION_LIMIT * Account.balance` (default 15%)." | 4 holdings flagged: AAPL (Account 1, 18.0%), MSFT (Account 1, 16.0%), JNJ (Account 4, 16.0%), PFE (Account 4, 16.2%). All Account-1 (Alice Chen, balance $100K) and Account-4 (Dan Wilson, balance $80K) — both have multiple positions clipping the 15% cap. | -| 4 | Stage 1 — Compliance: sector concentration | `/rai-rules-authoring` | "Aggregate `Holding.value` per (Account, Sector) and flag `Holding.is_sector_concentrated` whenever the sector total > `SECTOR_LIMIT * Account.balance` (default 30%)." | 2 (account, sector) pairs flagged: Account 1 Technology 34.0% (AAPL + MSFT), Account 4 Healthcare 32.5% (JNJ + PFE). Pattern: the same accounts driving Stage 1.3 stack within a sector. Stage 3 enforces the same 30% cap as a hard constraint. | -| 5 | Stage 1 — Compliance: high-risk traders | `/rai-rules-authoring` | "Flag `User.is_high_risk_trader` whenever `risk_score > 0.8` AND > 5 flagged transactions." | 2 users flagged: Alice Chen (risk 0.85), Eve Taylor (risk 0.92). Both have multiple `Transaction.is_flagged == True` rows in the bundled data. Standalone signal — used as a reviewer trigger, not a solver constraint. | -| 6 | Stage 2 — Derive volatility + correlation in PyRel | `/rai-ontology-design` | "Add `Stock.variance` (covariance diagonal where i == j), `Stock.volatility = sqrt(variance)`, and pairwise `Stock.correlation(i, j) = covar(i, j) / (vol_i * vol_j)`. All in PyRel — no numpy precompute." | `Stock.variance` and `Stock.volatility` written for all 8 stocks. `Stock.correlation` two-arg property populated from the 64 covariance pairs. Volatilities span ~0.06 to ~0.10 (small dataset, daily-scale). Storing in the ontology lets every downstream stage read the same source of truth instead of recomputing. | -| 7 | Stage 2 — Cluster the correlation graph | `/rai-graph-analysis` | "Build an undirected `Graph` with `Stock` as `node_concept`. Add an edge wherever `|correlation| >= CORR_THRESHOLD` (default 0.3) and `i < j` (deduplicate symmetric pairs). Run Louvain and persist `Stock.cluster`." | 4 edges retained (8 stocks, sparse graph). Louvain yields 5 communities: Cluster 1 = {JNJ, PFE} (Healthcare), Cluster 2 = {AAPL, MSFT, GOOGL} (Technology), plus singletons Cluster 3 = {JPM}, Cluster 4 = {PG}, Cluster 5 = {XOM}. Intra-cluster avg correlation = +0.683, inter-cluster = +0.131 (~5x separation — clean). | -| 8 | Stage 2 — Pick cluster representatives by Sharpe | `/rai-graph-analysis` | "Define `Stock.sharpe = returns / volatility`. For each cluster, the stock whose Sharpe equals the cluster max is the representative — set `Stock.is_representative`. Singletons are their own representative. Define `Stock.is_non_representative` as the positive complement (the prescriptive rewriter doesn't accept `model.not_(...)` inside a solver `.where()`)." | 5 representatives picked from 8 stocks: Cluster 1 PFE (Healthcare, Sharpe 0.530), Cluster 2 GOOGL (Technology, 0.605), Cluster 3 JPM (Financials, 0.500), Cluster 4 PG (Consumer Staples, 0.444), Cluster 5 XOM (Energy, 0.588). Investable universe shrinks from 8 to 5. Non-representatives — AAPL, MSFT, JNJ — get `Stock.is_non_representative` and are forced to zero in Stage 3. | -| 9 | Stage 3 — Scenario + regime + decision variable | `/rai-prescriptive-problem-formulation` | "Define `Regime` (`base`, `crisis`) and a `Scenario` Concept indexed by name with `budget` and `regime` properties. Load 6 scenarios = {500, 1000, 2000} x {base, crisis}. Add `Stock.regime_covar(i, j, Regime) = base covariance for base; alpha * covar(i,j) + (1 - alpha) * vol_i * vol_j for crisis` (PSD-preserving correlation shrinkage in covariance units, alpha = 0.7). Decision variable: `Stock.x_quantity(Stock, Scenario)` continuous." | 1 `Stock.x_quantity` property, 8 stocks x 6 scenarios = 48 continuous variables (40 of which will be hard-zero from the representative-only constraint). Regime-conditioned covariance lives in the ontology — the QP picks each scenario's matching regime without any branching in solver code. | -| 10 | Stage 3 — Compliance constraints on the decision variable | `/rai-prescriptive-problem-formulation` | "Add: non-negative (`x_qty >= 0`); fully invested per scenario (`sum(x) == Scenario.budget`); per-rep position cap (`x_qty <= REP_POSITION_LIMIT * Scenario.budget`, default 0.30); per-sector cap (`sum(x_qty per sector) <= SECTOR_LIMIT * Scenario.budget`, default 0.30); representative-only (`x_qty == 0` where `Stock.is_non_representative()`)." | 5 constraint families. `REP_POSITION_LIMIT = 0.30` is intentionally higher than Stage 1's `POSITION_LIMIT = 0.15`: a representative carries its cluster's combined exposure, and feasibility requires `REP_POSITION_LIMIT * num_reps >= 1.0` (5 x 0.30 = 1.5, OK). `SECTOR_LIMIT = 0.30` is reused verbatim from Stage 1's compliance threshold — same parameter binds the existing book and the rebuilt one. | -| 11 | Stage 3 — Anchors + epsilon sweep | `/rai-prescriptive-solver-management` | "Anchor 1: minimize risk (no return constraint). Anchor 2: maximize return. Compute return-rate range across all 6 scenarios. Then sweep 5 interior epsilon-rate points uniformly across the rate range and resolve `min risk s.t. return_rate >= eps_rate`. Use Ipopt, time limit 60s." | Per-scenario rate range: [0.0634, 0.0840] per unit invested (regime-independent because `Stock.returns` is regime-independent — only the covariance differs). 7 solves total: 2 anchors + 5 epsilon points = 42 optimal portfolios (6 scenarios x 7 points), all `LOCALLY_SOLVED`. base_500 anchor 1: return 32.43, risk 1160.39. base_500 anchor 2: return 42.00. crisis_500 anchor 1: return 31.69, risk 1913.60 — same investable universe, different regime covariance. | -| 12 | Stage 3 — Pareto + knee | `/rai-prescriptive-results-interpretation` | "For each scenario, list the 7-point frontier (return, risk). Print marginal `delta_risk / delta_return` between adjacent points and identify the knee — where the marginal jumps most." | base_500: returns 32.43 -> 33.41 -> 35.12 -> 36.84 -> 38.56 -> 40.28; risk 1160 -> 1177 -> 1263 -> 1386 -> 1546 -> 1742. Marginal climbs 16.85 -> 49.94 -> 71.72 -> 93.03 -> 114.43. Knee at Point 2 (`eps_1`) — marginal cost jumps ~3x beyond. base_1000 / base_2000 / crisis_* show the same shape (risk scales as budget^2 because the QP is quadratic, but the rate-form frontier is identical). | -| 13 | Stage 4 — Crisis stress comparison | `/rai-prescriptive-results-interpretation` | "From the Stage 3 sweep, emit a side-by-side `vol_base` vs `vol_crisis` table per (budget, lambda). Vol = sqrt(risk). Compute the absolute and percentage gap." | Same 7-point frontier resolved at each budget x regime. Crisis vol sits ~25-30% above base at every lambda. Budget 500: at `min_risk` vol_base 34.06 / vol_crisis 43.74 (+28.4%); at `eps_1` 34.30 / 44.54 (+29.8%); peaks at `eps_1`-`eps_2`; narrows to +25.2% at `eps_5`. The gap peaks mid-frontier and narrows toward the concentrated end — the inversion is the payoff of the cluster collapse: at the concentrated end the optimizer holds the highest-Sharpe distinct bet per cluster (weighted toward Energy/Consumer Staples here, which carry lower crisis correlations than the middle of the frontier). Without the representative collapse, the concentrated end would stack near-duplicates and the crisis gap would grow instead of shrink. | - ---- - -## Stage 1 — Rules: compliance scan - -> `/rai-rules-authoring` "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." - -``` -COMPLIANCE VIOLATIONS — current book (4 accounts, 15 holdings, 6 users) - - Rule 1: Holding.is_overconcentrated (position > 15% of balance) - ──────────────────────────────────────────────────────────────── - AAPL Account 1 $18,000 / $100,000 18.0% ─── Alice Chen - MSFT Account 1 $16,000 / $100,000 16.0% - JNJ Account 4 $12,800 / $80,000 16.0% ─── Dan Wilson - PFE Account 4 $13,000 / $80,000 16.2% - - Rule 2: Holding.is_sector_concentrated (sector > 30% of balance) - ──────────────────────────────────────────────────────────────── - Account 1 Technology $34,000 / $100,000 34.0% - Account 4 Healthcare $25,800 / $80,000 32.2% - - Rule 3: User.is_high_risk_trader (risk_score > 0.8 AND >5 flagged txns) - ──────────────────────────────────────────────────────────────── - Alice Chen risk_score 0.85 ── flagged transactions - Eve Taylor risk_score 0.92 - - ────────────────────────────────────────────────────────────────── - The same accounts that breach the per-stock cap also breach the - sector cap — Stage 3 will use the SECTOR_LIMIT (0.30) as a hard - constraint when rebuilding both books. The trader flag is a - reviewer signal, not a solver input. - ────────────────────────────────────────────────────────────────── - - Holding.is_overconcentrated [4] - Holding.is_sector_concentrated [2] - User.is_high_risk_trader [2] -``` - -`POSITION_LIMIT` (0.15) and `SECTOR_LIMIT` (0.30) are top-level constants. `SECTOR_LIMIT` is reused verbatim by Stage 3; `POSITION_LIMIT` is replaced in Stage 3 by `REP_POSITION_LIMIT = 0.30` because a representative carries its cluster's combined exposure (and 5 reps x 0.20 = 1.00 would already pin the budget — 0.30 leaves headroom). - ---- - -## Stage 2 — Graph: covariance clustering + cluster representatives - -> `/rai-graph-analysis` "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." - -**Construction** — undirected, unweighted graph: -- Node concept: `Stock` (8 nodes) -- Edges built from the derived `Stock.correlation(i, j)` property where `|correlation| >= 0.3` and `i < j` -- Aggregator: `"sum"` (no parallel edges expected) - -**Algorithm:** `louvain()` for community detection. - -**Volatility, correlation, and crisis covariance are all PyRel derived properties** — no numpy precompute. The covariance matrix loaded from `covar.csv` is the only solver input not derived from another property. - -``` -DERIVED IN PYREL - Stock.variance <- covar(i, j) where i == j - Stock.volatility <- sqrt(variance) - Stock.correlation(i,j) <- covar(i, j) / (vol_i * vol_j) - -CORRELATION GRAPH - Edges with |correlation| >= 0.30: 4 - Stocks above threshold pairwise: {AAPL,MSFT}, {AAPL,GOOGL}, - {MSFT,GOOGL}, {JNJ,PFE} - -LOUVAIN COMMUNITIES 5 clusters - Cluster 1 (size 2): JNJ (Healthcare), PFE (Healthcare) - Cluster 2 (size 3): AAPL (Technology), MSFT (Technology), - GOOGL (Technology) - Cluster 3 (size 1): JPM (Financials) ← singleton - Cluster 4 (size 1): PG (Consumer Staples) ← singleton - Cluster 5 (size 1): XOM (Energy) ← singleton - - Avg correlation: intra-cluster = +0.683 - inter-cluster = +0.131 ── ~5x separation -``` - -Singletons (Cluster 3-5) are their own representatives. The non-trivial choice happens inside Cluster 1 (JNJ vs PFE) and Cluster 2 (AAPL/MSFT/GOOGL): - -``` -REPRESENTATIVE = HIGHEST SHARPE PER CLUSTER (returns / volatility) - - Cluster 1 (Healthcare): - JNJ Sharpe 0.500 PFE Sharpe 0.530 ← REP - - Cluster 2 (Technology): - AAPL Sharpe 0.582 MSFT Sharpe 0.560 - GOOGL Sharpe 0.605 ← REP - - Cluster 3 (Financials): JPM Sharpe 0.500 ← REP (singleton) - Cluster 4 (Consumer Staples): PG Sharpe 0.444 ← REP (singleton) - Cluster 5 (Energy): XOM Sharpe 0.588 ← REP (singleton) - - ────────────────────────────────────────────────────────────────── - Investable universe collapses 8 -> 5. - AAPL, MSFT, JNJ get Stock.is_non_representative — Stage 3 forces - their decision variables to zero. - - This is "collapse, don't cap" — Stage 3 doesn't allow the full 8 - with caps inside a redundant cluster; it removes the duplicates - before the optimizer sees them. - ────────────────────────────────────────────────────────────────── - - ✓ Stock.variance / volatility / correlation written back [8 / 8 / 64] - ✓ Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe [8 each] - ✓ Stock.is_representative [5] - ✓ Stock.is_non_representative [3] -``` - ---- - -## Stage 3 — Prescriptive: bi-objective QP with epsilon constraint - -> `/rai-prescriptive-problem-formulation` "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets × 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." - -``` -FORMULATION - - Decision variable - Stock.x_quantity(Stock, Scenario) continuous, >= 0 - 8 stocks x 6 scenarios = 48 vars - 40 forced to 0 by Stock.is_non_representative() - 8 active = 5 representatives x — wait: 5 reps x 6 scenarios = 30 - (the 18 singleton non-rep slots are also forced — same effect) - - Scenarios (3 budgets x 2 regimes = 6 tuples) - base_500 base_1000 base_2000 crisis_500 crisis_1000 crisis_2000 - - Constraints (per scenario) - 1. Non-negative x_qty >= 0 - 2. Fully invested sum(x_qty) == Scenario.budget - 3. Per-rep position x_qty <= 0.30 * Scenario.budget - 4. Per-sector sum(x_qty per sector) <= 0.30 * Scenario.budget - 5. Representative-only x_qty == 0 where is_non_representative - - Risk objective (regime-aware) - minimize Sigma_ij regime_covar(i, j, Scenario.regime) * x_i * x_j - └── PyRel-derived per regime; PSD-preserving for crisis ──┘ - - Return constraint (epsilon, scaled by budget) - sum(Stock.returns * x_qty) >= eps_rate * Scenario.budget - -────────────────────────────────────────────────────────────────────── -SOLVE (Ipopt, time limit 60s) → LOCALLY_SOLVED - Anchor 1 (min risk) + Anchor 2 (max return) + 5 epsilon points - = 7 solves, 42 optimal portfolios (one per scenario per point) -────────────────────────────────────────────────────────────────────── - -ANCHOR 1 — minimize risk (no return floor) - base_500 return 32.43 risk 1,160.39 - base_1000 return 64.87 risk 4,641.57 - base_2000 return 129.73 risk 18,566.28 - crisis_500 return 31.69 risk 1,913.60 ← higher risk, same universe - crisis_1000 return 63.37 risk 7,654.40 - crisis_2000 return 126.75 risk 30,617.59 - -ANCHOR 2 — maximize return - base_500 / crisis_500 return 42.00 - base_1000 / crisis_1000 return 84.00 - base_2000 / crisis_2000 return 168.00 - (returns are regime-independent) - -Return-rate range [0.0634, 0.0840] per unit invested -Epsilon sweep 5 interior points evenly spaced across the range -``` - ---- - -## Stage 3 — Reading the frontier (per scenario) - -> `/rai-prescriptive-results-interpretation` "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most? Is the rate-form frontier shape consistent across budgets?" - -``` -EFFICIENT FRONTIER — base_500 (budget = 500, regime = base) - - # Label Return Risk - ──────────────────────────────────────── - 1 min_risk 32.43 1,160.39 - 2 eps_1 33.41 1,176.78 ← KNEE - 3 eps_2 35.12 1,262.61 - 4 eps_3 36.84 1,385.89 - 5 eps_4 38.56 1,545.79 - 6 eps_5 40.28 1,742.47 - 7 max_return 42.00 (separate anchor) - - Marginal delta_risk / delta_return: - min_risk → eps_1 16.85 - eps_1 → eps_2 49.94 ← +3.0x ── KNEE - eps_2 → eps_3 71.72 - eps_3 → eps_4 93.03 - eps_4 → eps_5 114.43 - - ────────────────────────────────────────────────────────────────── - base_1000 / base_2000 / crisis_* show the SAME shape — risk scales - as budget^2 (the QP is quadratic in x), but the rate-form frontier - and the knee location are budget-independent. - ────────────────────────────────────────────────────────────────── - - ✓ Stock.x_quantity written back, indexed by (Stock, Scenario) -``` - ---- - -## Stage 4 — Crisis stress test - -> `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point — does the gap peak mid-frontier or at the concentrated end?" - -Same `solve_epsilon` call, no separate model — `Scenario.regime` selects between two `Stock.regime_covar` definitions: - -- `base`: Sigma(i, j) -- `crisis`: alpha * Sigma(i, j) + (1 - alpha) * vol_i * vol_j (alpha = 0.7) - -The crisis formula is correlation shrinkage toward all-ones (`rho_crisis = alpha * rho + (1 - alpha) * J`) re-expressed in covariance units. PSD is preserved by construction (convex combination of PSD matrices), so every lambda solves cleanly. - -``` -VOLATILITY COMPARISON vol = sqrt(risk) - - Budget 500: - Label vol_base vol_crisis gap gap_% - ──────────────────────────────────────────────────── - min_risk 34.06 43.74 +9.68 +28.4% - eps_1 34.30 44.54 +10.24 +29.8% ← peak - eps_2 35.53 46.11 +10.58 +29.8% - eps_3 37.23 47.94 +10.72 +28.8% - eps_4 39.32 49.99 +10.68 +27.2% - eps_5 41.74 52.27 +10.53 +25.2% - - Budget 1000 + Budget 2000: same gap_% pattern (vol scales with budget; - the rate-form gap is budget-independent). - - ────────────────────────────────────────────────────────────────── - Crisis vol sits 25-30% above base at EVERY lambda. The gap PEAKS in - the middle of the frontier (eps_1..eps_2 at +29.8%) and NARROWS at - the concentrated end (eps_5 at +25.2%). - - Why the inversion: at the concentrated end the optimizer is picking - the highest-Sharpe distinct bet per cluster — the bundled data - weights this toward Energy + Consumer Staples, which happen to - carry lower crisis correlations than the middle of the frontier. - Without the representative collapse, the concentrated end would - stack near-duplicates and the crisis gap would GROW, not shrink. - ────────────────────────────────────────────────────────────────── - - ✓ Stock.regime_covar written back (64 base + 64 crisis = 128 entries) -``` - -**Why `Scenario` is a Concept, not a loop:** all 6 (budget, regime) tuples solve in a single call to the solver, against the matching `regime_covar`. Adding a fourth regime or a fifth budget is a data edit in `scenario_data`, not a change to `solve_epsilon`. Scenarios are data. - ---- - -## Stage outputs — what each reasoner contributes back - -``` -ONTOLOGY ENRICHMENT — what each stage wrote back - - Stage 1 (rules) Holding.value [15] - Holding.is_overconcentrated [4] - Holding.is_sector_concentrated [2] - User.is_high_risk_trader [2] - - Stage 2 (graph) Stock.variance [8] - Stock.volatility [8] - Stock.correlation (i, j) [64] - Stock.cluster [8] - Stock.sharpe [8] - Stock.cluster_max_sharpe [8] - Stock.is_representative [5] - Stock.is_non_representative [3] - - Stage 3 (prescriptive) Stock.regime_covar (i, j, Regime) [128] - Stock.x_quantity (Stock, Scenario) [48] - - Stage 4 (stress) (terminal — vol_base vs vol_crisis table) - - ────────────────────────────────────────────────────────────────── - Each stage reads what the previous stage wrote. - Re-running any downstream stage automatically picks up enrichments. - No glue code, no DataFrame round-trip — same ontology throughout. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## The chain — accretive ontology enrichment - -``` -THE PORTFOLIO-BALANCING CHAIN - - STAGE 1 RULES - "Where is the current book breaking compliance?" - reads: Holding, Account, User, Transaction, Stock.sector - writes: Holding.value - Holding.is_overconcentrated / is_sector_concentrated - User.is_high_risk_trader - │ - ▼ - STAGE 2 GRAPH (Louvain) - "Which stocks are really the same bet?" - reads: Stock.covar (loaded), Stock.returns - writes: Stock.variance / volatility / correlation (i, j) - Stock.cluster ── 5 communities - Stock.sharpe / cluster_max_sharpe - Stock.is_representative ── 5 of 8 stocks - Stock.is_non_representative ── forced to zero in Stage 3 - │ - ▼ - STAGE 3 PRESCRIPTIVE (Ipopt QP) - "What's the optimal allocation under position + sector caps, - for each (budget, regime) scenario?" - reads: Stock.is_representative ──► decision-variable scope - Stock.is_non_representative ──► hard-zero constraint - Stock.returns ──► epsilon return target - Stock.regime_covar ──► quadratic risk objective - Scenario.budget / regime ──► per-scenario constraints - SECTOR_LIMIT (= Stage 1's) ──► hard sector cap - writes: Stock.regime_covar (PSD-preserving) [base + crisis] - Stock.x_quantity (Stock, Scenario) [42 active portfolios] - │ - ▼ - STAGE 4 STRESS (same solver, regime swap) - "How does the optimal portfolio degrade under a crisis covariance?" - reads: Stock.regime_covar (regime = "crisis") - Stage 3's pareto results - writes: (terminal — vol gap table) - - ────────────────────────────────────────────────────────────────── - No glue. No DataFrame ping-pong. No re-derivation per-reasoner. - Three reasoner families, one ontology, one accretive thread. - Stage 4 is a regime swap on the same solve — not a separate model. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## Why the chain matters (vs. any single stage) - -| Stage alone | What it tells you | What it doesn't | -|---|---|---| -| Rules alone | "4 holdings + 2 sectors break compliance" | How to rebuild the book | -| Graph alone | "AAPL/MSFT/GOOGL move together; JNJ/PFE move together" | Which to keep, how much to allocate | -| Prescriptive alone (full universe) | A "diversified" frontier that stacks near-duplicates inside a cluster | That two of those names are one bet — and that the crisis gap will grow under stress | -| Prescriptive alone (single regime) | A base-case efficient frontier | What it costs you when correlations spike | - -| Combined | Output | -|---|---| -| Rules → Graph | Compliance violations + redundant-bet map | -| + Prescriptive (rep-only universe) | 7-point Pareto frontier per scenario; knee at eps_1 | -| + Stress (Scenario.regime swap) | Crisis vol 25-30% above base, gap narrows at concentrated end (the cluster-collapse payoff) | - -**Multi-reasoner chaining grounded in (and contributing to) the ontology.** - ---- - -## Crisis Regime Construction - -**Do not** naively scale off-diagonal covariance by a constant — it frequently breaks positive semidefiniteness, the lambda=0 pure min-variance solve fails (Ipopt rejects non-convex QP), and the whole frontier anchors wrong. PSD-preserving alternatives: - -| Approach | Formula | Preserves PSD? | -|----------|---------|----------------| -| **Correlation shrinkage toward all-ones** (this template) | `rho' = alpha * rho + (1 - alpha) * J`, alpha in [0.6, 0.9]. In covariance units: `cov'(i,j) = alpha * cov(i,j) + (1 - alpha) * vol_i * vol_j` | Yes (convex combination of PSD matrices) | -| Eigenvalue flooring | Eigendecompose, replace lambda_i with max(lambda_i, eps), recompose | Yes by construction | -| Scale off-diagonals + PSD projection | Scale, then find nearest PSD matrix via eigendecomposition | Yes after projection | -| ~~Uniform 1.5x off-diagonals~~ | `Sigma'[i,j] = 1.5 * Sigma[i,j]` for i != j | **No** — frequently non-PSD | - -`CRISIS_ALPHA = 0.7` produces a clearly "crisis-like" regime (intra-cluster correlations bump toward 1) while keeping the QP well-conditioned at every lambda. Lower values (0.5-0.6) give more severe crises; values > 0.9 get close to the base case. - ---- - -## Data Reference - -- **Source data**: bundled CSVs in `../data/` (8 stocks across 5 sectors with 64-entry symmetric covariance, 6 users, 4 accounts, 15 holdings, 21 transactions). To run against your own Snowflake schema instead, swap the `read_csv(...)` loads for `model.Table(...)` references in `portfolio_balancing.py`; the rest of the pipeline is unchanged. -- **Stages**: implemented in `../portfolio_balancing.py` as a single combined script with stage banners (Stage 1 → Stage 4). -- **Ontology**: 6 user-facing concepts (`Stock`, `Sector`, `User`, `Account`, `Holding`, `Transaction`) plus the `Regime` and `Scenario` concepts introduced in Stage 3. Run `inspect.schema(model)` after the pipeline (see template README) to dump the full concept/property/relationship surface, filtering out reasoner-owned concepts (`Variable`, `Constraint`, etc.) and the auto-generated `graph_Edge` from Stage 2. - ---- - -## Adapting this recipe to a new domain - -The chain pattern transfers cleanly. To rebuild for a different problem: +| # | Skill + prompt | Expected output | +|---|----------------|-----------------| +| 0 | `/rai-discovery` "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" | Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. | +| 1 | `/rai-rules-authoring` "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." | 4 holdings flagged (AAPL/MSFT on Account 1, JNJ/PFE on Account 4); 2 (account, sector) pairs flagged (Account 1 Tech 34.0%, Account 4 Healthcare 32.2%); 2 users flagged (Alice Chen 0.85, Eve Taylor 0.92). | +| 2 | `/rai-graph-analysis` "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." | 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. | +| 3 | `/rai-prescriptive-problem-formulation` "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." | 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. | +| 4 | `/rai-prescriptive-results-interpretation` "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most?" | base_500 frontier: returns 32.43 -> 40.28, risk 1160 -> 1742. Marginal `delta_risk/delta_return` jumps ~3x at eps_1 (knee). Same shape across all 6 scenarios — risk scales as budget^2, rate-form frontier is budget-independent. | +| 5 | `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?" | Crisis vol +28-30% above base at every lambda (budget 500: min_risk 34.06 -> 43.74, eps_1 34.30 -> 44.54 peak). Gap peaks mid-frontier, narrows to +25.2% at eps_5 — the cluster-collapse payoff. | -1. Re-run `rai-discovery` on the new business question — does it actually need all four reasoner roles (rules, graph, prescriptive, prescriptive re-solve), or is one or two sufficient? A pure compliance audit stops after Stage 1; a rebalancer without redundant bets in the universe can skip Stage 2. -2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). For portfolio variants, the load-bearing concepts are the asset entity (here `Stock`), its pairwise covariance, an entity that holds compliance thresholds (here `Account`), and the `Scenario` Concept that parameterizes the optimizer. -3. Stage 1 (rules) is where every threshold the optimizer will later enforce gets named once and reused — keep `SECTOR_LIMIT` (or its equivalent) shared between the compliance scan and the prescriptive constraints, so the rebuilt book obeys the same caps the diagnostic flagged. -4. Stages 2–4 are the load-bearing chain: Graph collapses redundant bets via clustering + per-cluster representative selection, writing `is_representative` and `is_non_representative` flags the optimizer reads. Prescriptive uses those flags to scope the decision variable and adds the position + sector caps. The stress-test stage is the *same solver call* under a regime-swapped covariance — `Scenario.regime` and `Stock.regime_covar(i, j, Regime)` keep base and crisis as data, not separate models. -5. Keep the validation checks at every stage: assert flagged-set size, the cluster count and intra-vs-inter average correlation gap, anchor solves return `LOCALLY_SOLVED`, the return-rate range is non-degenerate, and `REP_POSITION_LIMIT * num_representatives >= 1.0` so the fully-invested constraint stays feasible. -6. When constructing a stress regime, never naively scale off-diagonal covariance — eigenvalue flooring, projection to nearest PSD, or correlation shrinkage toward all-ones (this template) all preserve PSD; arbitrary scaling does not, and the min-risk anchor will fail. +## Data -The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. +Bundled CSVs in `../data/`: `returns.csv` (8 stocks across 5 sectors), `covar.csv` (64 symmetric covariance entries), plus `users.csv` (6), `accounts.csv` (4), `holdings.csv` (15), `transactions.csv` (21). All four stages run in `../portfolio_balancing.py`. diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 877c451..c7a68ad 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -1,12 +1,8 @@ # Runbook: Supply Chain Resilience — Multi-Reasoner Walkthrough -Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **risk-adjusted network flow with disruption scenarios** — traced across four RAI reasoning stages, each writing properties back to the same ontology that downstream stages consume. +Risk-adjusted network flow with disruption scenarios, traced across four RAI reasoning stages. Each stage writes properties back to the same ontology that downstream stages consume, so the optimizer can hard-block bad suppliers, surcharge watch suppliers, and weight bottleneck hubs using upstream graph and rules signals. -The template's combined script (`supply_chain_resilience.py`) implements all four stages directly. This runbook expands the surrounding narrative — what each stage finds, why the next stage needs it, what the optimizer does with the enrichment — so a non-OR reader can follow the full reasoning thread end-to-end. - ---- - -## TL;DR — the chain in one screen +## The chain ``` Two HIGH-priority customers depend on 6 upstream suppliers — one of which @@ -37,370 +33,17 @@ watch->avoid downgrade = +0.0% (optimizer already routed around it). ───────────────────────────────────────────────────────────────── ``` -A single-reasoner approach can't answer this. Reachability alone names the suppliers in scope but doesn't rank them. Graph alone ranks hubs but doesn't decide flow. Rules alone classifies suppliers but doesn't route around them. Prescriptive alone has no way to hard-block bad suppliers, surcharge watch suppliers, or penalize bottleneck hubs without those upstream signals. - ---- - -## How to read this runbook - -This runbook serves two audiences: - -- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. -- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data in `../data/`, and the agent will reproduce the stage. - -The two views share one ontology: each stage writes properties back to the model that the next stage reads. The skill blocks are how an engineer rebuilds the chain link-by-link; the narrative is how a stakeholder reads what the chain produced. - ---- - -## Step 0 — Scope the question with `rai-discovery` - -> `/rai-discovery` "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" - -Discovery classifies the question by reasoner family and tells you which downstream skills to load: - -| Sub-question | Reasoner | Skill | -|---|---|---| -| "Which suppliers do my high-priority customers transitively depend on?" | Graph (reachability) | `rai-graph-analysis` | -| "Which sites are network bottlenecks?" | Graph (centrality, components) | `rai-graph-analysis` | -| "Which suppliers are risky? Which demands are escalated?" | Rules | `rai-rules-authoring` | -| "What's the minimum-cost flow plan that respects all of the above, and how much do disruptions cost?" | Prescriptive | `rai-prescriptive-problem-formulation`, `rai-prescriptive-solver-management`, `rai-prescriptive-results-interpretation` | - -Discovery's output is a *plan*, not code. Everything that follows materializes that plan. - ---- - -## Setup - -See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `supply_chain_resilience.py` against the bundled CSVs in `../data/`. - ---- - -## Stage 0 — Reachability: blast-radius pre-analysis - -> `/rai-graph-analysis` "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." - -**Construction** — directed `Business` graph, edges from `Business.ships_to` (derived from `Shipment.supplier` -> `Shipment.customer`). - -**Targets** — `Business.is_high_priority_customer` is set wherever a `Demand` with `priority == "HIGH"` is placed by that business. From the bundled data, this fires for **2 buyers**: B008 MegaCorp Enterprise and B009 TechGiant Inc (9 HIGH-priority demands between them, all for ProPhone X1 / ProTab T1). - -**Algorithm** — `biz_graph.reachable(to=target_customer)` filtered to nodes with `business_type == "SUPPLIER"`. - -``` -Upstream supplier dependencies (HIGH-priority customers) - - MegaCorp Enterprise (B008) depends on 6 suppliers: - - CellChem China (reliability 78%) ← will become AVOID - - EuroCell Poland (reliability 88%) - - EuroChip Germany (reliability 94%) - - EuroDisplay Czech (reliability 91%) - - GlassCorp Korea (reliability 89%) - - WaferTech Taiwan (reliability 97%) - - TechGiant Inc (B009) depends on 6 suppliers: (same set) - - ────────────────────────────────────────────────────────────────── - Both HIGH-priority customers share the same 6-supplier upstream. - CellChem (B017) sits in BOTH dependency cones — Stage 2 will flag - it AVOID, and Stage 3's baseline already excludes it. - ────────────────────────────────────────────────────────────────── - -✓ Business.is_high_priority_customer written back (2 buyers) -``` - -The point of running reachability before the MILP: when the scenario in Stage 3 downgrades watch suppliers to avoid, the cost delta has to be read against this dependency map. If a downgraded supplier sits in zero customer cones, the optimizer just shrugs and the cost stays flat — which is exactly what happens here for the watch-tier supplier B003. - ---- - -## Stage 1 — Graph: site centrality + connected components - -> `/rai-graph-analysis` "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." - -**Construction:** -- Node concept: `Site` (31 sites) -- Edges: built from `Operation` rows where `op_type == "SHIP"` (undirected, unweighted) -- Aggregator: `"sum"` (collapse parallel ship lanes between the same pair) - -**Algorithms:** `weakly_connected_component()` for cluster discovery, then `eigenvector_centrality()` for hub importance. - -``` -Connected components: 2 - - Component A: 25 sites (APAC + AMERICAS, joined by S004->S006 long-haul) - Component B: 6 sites (EMEA distribution loop) - -Top critical sites — eigenvector centrality (FACTORY/DC only) - - S004 TechAssembly Factory (FACTORY, APAC) 0.5016 ████████████ ★ central hub - S006 West Coast DC (DC, AMERICAS) 0.3895 █████████ - S003 PowerCell Facility (FACTORY, APAC) 0.3688 █████████ ⚠ also Stage 2 watch - S002 DisplayCorp Plant (FACTORY, APAC) 0.3145 ████████ - S001 ChipTech Factory (FACTORY, APAC) 0.3145 ████████ - S012 SiliconWorks Factory (FACTORY, APAC) 0.2456 ██████ - S013 ScreenTech Plant (FACTORY, APAC) 0.2456 ██████ - S014 EnergyPlus Facility (FACTORY, APAC) 0.2280 ██████ - - ────────────────────────────────────────────────────────────────── - S004 TechAssembly is the convergence point: every APAC component - factory ships into it before finished goods radiate to DCs. - S003 PowerCell shows up here AND in Stage 2 — structural and - behavioural risk overlap on the same supplier. - ────────────────────────────────────────────────────────────────── - -✓ Site.centrality written back, normalized to [0, 1] - (S004 = 1.000, S006 = 0.776, S003 = 0.735, ...) -``` - ---- - -## Stage 2 — Rules: supplier risk classification - -> `/rai-rules-authoring` "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." - -**Late-shipment context** (computed in pandas, not RAI): - -``` -Late shipments: 37 of 262 (14%) - - B006 West Coast DC 7 late ████████ - B007 East Coast DC 5 late ██████ - B004 TechAssembly 4 late █████ - B022 EMEA DC Central 3 late ████ - B003 PowerCell 2 late ███ ← also high predicted Q1 - B017 CellChem 2 late ███ ← also low reliability -``` - -**Properties added to the ontology** (via `model.where(...).define(...)`): - -```python -# Rule 1 — reliability gate -m.where(Business.reliability_score < 0.80).define(Business.is_unreliable()) - -# Rule 2 — ML delay-risk gate (Q1-2025 GNN predictions) -m.where( - DelayPrediction.supplier_business(Business), - DelayPrediction.fiscal_quarter == "Q1-2025", - DelayPrediction.predicted_delay_prob > 0.15, -).define(Business.has_high_delay_risk()) - -# Rule 3 — union into watch level (rule chaining) -m.where(Business.is_unreliable()).define(Business.is_watch_level()) -m.where(Business.has_high_delay_risk()).define(Business.is_watch_level()) - -# Rule 4 — escalate HIGH-priority demands -m.where(Demand.priority == "HIGH").define(Demand.is_escalated()) -``` - -**Resulting classification** (Python combines the two RAI flags into avoid/watch/reliable): - -``` -Supplier risk classification - - [X] B017 CellChem China reliability 0.78 Q1 delay 0.22 ── AVOID - [!] B003 PowerCell Ltd reliability 0.81 Q1 delay 0.28 ── WATCH - [ ] B005 GlobalBuild Inc reliability 0.85 ── reliable - [ ] B014 EnergyPlus India reliability 0.85 ── reliable - [ ] B024 EuroAssembly Pol reliability 0.87 ── reliable - [ ] B020 EuroCell Poland reliability 0.88 ── reliable - [ ] B002 DisplayCorp reliability 0.88 ── reliable - [ ] B016 GlassCorp Korea reliability 0.89 ── reliable - [ ] B004 TechAssembly Co reliability 0.90 ── reliable - [ ] B019 EuroDisplay Czech reliability 0.91 ── reliable - [ ] B012 SiliconWorks Corp reliability 0.91 ── reliable - [ ] B013 ScreenTech Japan reliability 0.93 ── reliable - [ ] B018 EuroChip Germany reliability 0.94 ── reliable - [ ] B001 ChipTech Industries reliability 0.95 ── reliable - [ ] B015 WaferTech Taiwan reliability 0.97 ── reliable - - AVOID = is_unreliable AND has_high_delay_risk (both flags) - WATCH = is_unreliable OR has_high_delay_risk (one flag) - -Escalated demands (HIGH priority): 9 ── all from B008 / B009 - -✓ Business.is_unreliable [1 supplier: B017] -✓ Business.has_high_delay_risk [2 suppliers: B003, B017] -✓ Business.is_watch_level [2 suppliers: B003, B017] -✓ Demand.is_escalated [9 demands] -``` - -Stage 3 reads `is_watch_level` for the surcharge term and `is_unreliable AND has_high_delay_risk` (collapsed to the `avoid` set) for the hard block. - ---- - -## Stage 3 — Prescriptive: risk-adjusted minimum-cost flow - -> `/rai-prescriptive-problem-formulation` "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." - -``` -FORMULATION - - Decision variables - Operation.x_flow (continuous, 70 ops, 0 ≤ x ≤ capacity_per_day) - Demand.x_unmet (continuous slack, 20 demands, ≥ 0) - - Constraints - 1. Demand satisfaction - Σ x_flow into customer-site for the demanded SKU + x_unmet ≥ quantity - 2. Avoid suppliers blocked - For every operation sourced from B017 (CellChem): x_flow == 0 - 3. (Scenario only) Site offline / extra blocks - - Objective (minimize) - Σ Operation.cost_per_unit · x_flow ── transport - + RISK_SURCHARGE · Σ x_flow on watch-supplier ops ── Stage 2 - + CENTRALITY_WEIGHT · Σ x_flow · Site.centrality ── Stage 1 - + UNMET_PENALTY · Σ x_unmet ── slack - - Tunables: UNMET_PENALTY=100, RISK_SURCHARGE=5, CENTRALITY_WEIGHT=2, - DELAY_PROB_THRESHOLD=0.15, RELIABILITY_THRESHOLD=0.80, - PREDICTION_QUARTER="Q1-2025" - -────────────────────────────────────────────────────────────────────── -SOLVE (HiGHS) → OPTIMAL 8 active flows $1,865.00 0 unmet -────────────────────────────────────────────────────────────────────── - -✓ Operation.x_flow / Demand.x_unmet written back as model properties. -``` - -The baseline buys: enough finished-goods flow on the shortest cost-weighted lanes to cover all 20 demand orders in full. CellChem's operations (B017) are hard-blocked. PowerCell (B003) operations carry a +5/unit surcharge — the optimizer accepts a small amount of B003-sourced flow only when no cheaper non-watch alternative exists. - -### Reading the solve - -- **8 active flows from 70 candidate operations** — the network is sparse at optimum; most capacity is idle. -- **$1,865 total cost** vs. unconstrained transport cost would be ~$1,500 — the centrality and watch-surcharge terms together add ~$365. -- **Zero unmet demand** — capacity is plentiful, so the slack term is inactive at baseline. - ---- - -## Scenario analysis — quantify disruption - -> `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" - -The same `solve_flow(...)` function re-runs with modified constraints. Two scenarios surface different aspects of the chain's value: - -``` -SCENARIO COMPARISON - - Scenario Status Cost Δ vs baseline Unmet - ──────────────────────── ──────── ────────── ────────────── ───── - Baseline OPTIMAL $1,865.00 — 0 - Site S004 offline OPTIMAL $3,515.00 +88.5% 0 - Watch->Avoid OPTIMAL $1,865.00 +0.0% 0 -``` - -**Scenario A — top-centrality site offline (S004 TechAssembly).** The optimizer reroutes finished goods through S005 GlobalBuild Plant (Mexico) and longer EMEA lanes; cost jumps 88.5% but all demand is still covered. This is the **structural-risk** signal: losing the highest-centrality node forces expensive secondary routing. - -**Scenario B — downgrade all watch suppliers to avoid.** Adds B003 PowerCell to the hard-block set. **Cost is unchanged.** Why? B003 wasn't on any optimal lane — the centrality penalty + risk surcharge already discouraged the optimizer from routing through it at baseline. **This asymmetry is the punchline:** structural risk (Stage 1) costs 88.5% to disrupt; behavioural risk on already-deprioritized suppliers (Stage 2) costs 0% to harden against. The chain reveals which mitigations actually move the needle. - -(Cross-check the Stage 0 dependency map: B003 wasn't in either HIGH-priority customer's *direct* upstream — it ships components to manufacturers, who then route via the optimizer's preferred S004/S005 corridor. Reachability surfaced the supplier; centrality + objective weights ensured baseline never relied on it.) - ---- - -## The chain — accretive ontology enrichment - -``` -THE SUPPLY-CHAIN-RESILIENCE CHAIN - - STAGE 0 REACHABILITY (directed Business graph) - "Which suppliers do my high-priority customers transitively depend on?" - reads: Shipment.supplier / .customer ──► Business.ships_to (derived) - Demand.priority == "HIGH" ──► Business.is_high_priority_customer - writes: Business.is_high_priority_customer ── 2 buyers - │ - ▼ - STAGE 1 GRAPH (eigenvector centrality, WCC) - "Which sites are network bottlenecks?" - reads: Operation (op_type == "SHIP"), Site - writes: Site.centrality ── normalized [0,1] per site - │ - ▼ - STAGE 2 RULES (chained derivations) - "Which suppliers are risky, and which demands are escalated?" - reads: Business.reliability_score, DelayPrediction, Demand.priority - writes: Business.is_unreliable ── 1 supplier - Business.has_high_delay_risk ── 2 suppliers - Business.is_watch_level ── 2 suppliers - Demand.is_escalated ── 9 demands - │ - ▼ - STAGE 3 PRESCRIPTIVE (HiGHS LP) - "What's the minimum-cost flow plan that respects all of the above?" - reads: Site.centrality ──► objective coefficient (penalty) - Business.is_watch_level ──► objective coefficient (surcharge) - {avoid suppliers} ──► hard block (x_flow == 0) - Operation cost / capacity / SKU, Demand quantity / SKU / business - writes: Operation.x_flow ── 8 active flows - Demand.x_unmet ── 0 across all 20 demands - │ - ▼ - Re-solve per scenario (S004 offline, watch->avoid) - → cost-of-disruption table - - ────────────────────────────────────────────────────────────────── - No glue. No DataFrame ping-pong. No re-derivation per-reasoner. - Four stages, one ontology, one accretive thread. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## Why the chain matters (vs. any single stage) - -| Stage alone | What it tells you | What it doesn't | -|---|---|---| -| Reachability alone | "These 6 suppliers feed my critical customers" | Which are risky; which the optimizer would have used anyway | -| Graph alone | "S004 is the central hub" | Whether losing it is recoverable; at what cost | -| Rules alone | "B017 avoid, B003 watch" | Whether routing actually depends on them | -| Prescriptive alone | (degenerate — no risk filter, no bottleneck weight) | Picks cheapest lanes regardless of supplier risk or hub fragility | - -| Combined | Output | -|---|---| -| Reachability -> Graph | Customer-supplier dependency map + structural hub ranking | -| + Rules | Per-supplier risk class (avoid / watch / reliable) on top of the dependency map | -| + Prescriptive | Risk-adjusted min-cost flow ($1,865) + scenario deltas (+88.5% on S004 offline, +0% on watch->avoid) | - -**Multi-reasoner chaining grounded in (and contributing to) the ontology.** - ---- - -## Optional extension — predictive forecasting - -The bundled `delay_prediction.csv` is treated as already-trained ML output (a quarterly per-supplier delay probability table, `model_version='gnn_v2.0'`). A natural extension to the template is to replace the static CSV with a live GNN that retrains on `Shipment.delay_days` history. Skill: `/rai-predictive-modeling` + `/rai-predictive-training`. Reference templates with end-to-end GNN training: `templates/v1/fraud-detection`, `templates/v1/retail_planning`. The downstream stages (rules + prescriptive) wouldn't change — they read `DelayPrediction.predicted_delay_prob` either way. - ---- - -## Agent prompt sequence — recreate this template skill-by-skill - -Each row is a single agent prompt. Skills are loaded in order; each writes properties the next stage reads. - -| # | Skill | Prompt | What it produces | -|---|-------|--------|------------------| -| 1 | `/rai-build-starter-ontology` | "Build a starter ontology for a supply-chain dataset with 7 CSVs: site, business, operation, sku, demand, shipment, delay_prediction. Use Site/Business/Operation/SKU/Demand/Shipment/DelayPrediction as concepts. Render the result as an ASCII concept-relationship diagram." | Concepts, properties, relationships matching `supply_chain_resilience.py` lines 57–250 | -| 2 | `/rai-discovery` | "Given this ontology, what questions can each reasoner family answer? Group by graph / rules / prescriptive." | A reasoner-routing plan that covers Stages 0–3 below | -| 3 | `/rai-graph-analysis` | "Build a directed Business graph from Shipment.supplier->Shipment.customer. Run upstream reachability from every Business with a HIGH-priority demand. List the suppliers each high-priority customer transitively depends on." | Stage 0 — `Business.is_high_priority_customer`, blast-radius dependency map | -| 4 | `/rai-graph-analysis` | "Build an undirected Site graph from Operation rows where op_type == 'SHIP'. Compute weakly-connected components, then eigenvector centrality (filter to FACTORY/DC). Normalize and write the score back as Site.centrality." | Stage 1 — `Site.centrality` | -| 5 | `/rai-rules-authoring` | "Define three derived relationships on Business: is_unreliable (reliability_score < 0.80), has_high_delay_risk (any DelayPrediction for Q1-2025 with predicted_delay_prob > 0.15), and is_watch_level (union of the two). Also flag Demand.is_escalated for HIGH priority. Print the avoid (both flags) / watch (one flag) / reliable classification." | Stage 2 — risk flags, escalation flag | -| 6 | `/rai-prescriptive-problem-formulation` | "Formulate a minimum-cost network flow on Operation.x_flow with Demand.x_unmet slack. Constraint: inbound flow at customer site for demanded SKU + slack >= quantity. Hard-block operations sourced from avoid-tier businesses. Objective: transport cost + RISK_SURCHARGE * watch-supplier flow + CENTRALITY_WEIGHT * Σ flow · Site.centrality + UNMET_PENALTY * Σ unmet. Describe the formulation before solving — decision variables, constraints, objective, tunables." | Stage 3 formulation | -| 7 | `/rai-prescriptive-solver-management` | "Solve with HiGHS, time limit 120s. Report status, objective value, count of active flows, and total unmet demand." | Baseline solve — OPTIMAL, $1,865, 8 flows, 0 unmet | -| 8 | `/rai-prescriptive-results-interpretation` | "Re-solve two scenarios side-by-side: (a) top-centrality site offline (S004 TechAssembly), (b) all watch-level suppliers downgraded to avoid. Show cost delta vs baseline as a table. Explain why the deltas are asymmetric." | Scenario table + interpretation (S004 offline = +88.5%, watch->avoid = +0%) | - ---- - -## Data Reference - -- **Source data**: bundled CSVs in `../data/` — 31 sites across APAC / AMERICAS / EMEA, 31 businesses (6 suppliers, 6 component manufacturers, 2 manufacturers, 8 warehouses, 9 buyers), 9 SKUs (raw materials -> components -> finished goods ProPhone X1 / ProTab T1), 70 operations (SHIP + TRANSFER), 20 demand orders (9 HIGH, 5 MEDIUM, 6 LOW), 262 historical shipments (37 late), 36 quarterly delay predictions (4 quarters × 9 suppliers). To run against your own Snowflake schema instead, swap `read_csv(...)` for typed `model.Table(...)` loads against the equivalent table set. -- **Ontology**: defined inline in `../supply_chain_resilience.py` (lines 57–250) — 7 concepts plus the derived `Business.ships_to` and `Operation.source_business` relationships. -- **Stages**: implemented in `../supply_chain_resilience.py` as a single combined script with stage banners (`STAGE 0` through `STAGE 3` plus `SCENARIO ANALYSIS`). - ---- - -## Adapting this recipe to a new domain +## Workflow -The chain pattern transfers cleanly. To rebuild for a different problem: +| # | Skill + prompt | Expected output | +|---|----------------|-----------------| +| 0 | `/rai-discovery` "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" | Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). | +| 1 | `/rai-graph-analysis` "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." | `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. | +| 2 | `/rai-graph-analysis` "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." | `Site.centrality` normalized [0,1]: S004=1.000, S006=0.776, S003=0.735; 2 weakly-connected components. | +| 3 | `/rai-rules-authoring` "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." | `is_unreliable` (1: B017), `has_high_delay_risk` (2: B003, B017), `is_watch_level` (2), `Demand.is_escalated` (9). | +| 4 | `/rai-prescriptive-problem-formulation` "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." | MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. | +| 5 | `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" | Baseline OPTIMAL $1,865 / 8 flows / 0 unmet; S004 offline +88.5%; watch->avoid +0.0% (B003 already off optimal lanes). | -1. Re-run `rai-discovery` on the new business question — does it actually need all four reasoner families, or is one or two sufficient? Reachability + prescriptive alone is a viable shortcut if you don't need a centrality-based bottleneck weight or a per-entity risk classifier. -2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). The load-bearing concepts here are the node concept of each graph (`Business`, `Site`), the entity that carries risk flags (`Business`), and the entity that carries decision variables (`Operation`, `Demand`). -3. Stage 0 (reachability) is *optional context* — it scopes the conversation around the disruption table without changing any constraint. Keep it when stakeholders need to see the dependency cone; drop it when the optimizer's hard blocks already make exposure obvious. -4. Stages 1–3 are the load-bearing chain: graph centrality writes a per-site bottleneck weight, rules write per-supplier risk flags, and prescriptive consumes both — centrality as an objective coefficient, the avoid set as a hard block, the watch set as a surcharge. The scenario re-solves are cheap once the formulation exists. -5. Keep the validation checks at every stage: assert the source-business derivation populates, the centrality top-N looks plausible, the avoid/watch sets are non-empty (or non-empty by design), the baseline objective is not zero, and the disruption deltas have the sign you expected. +## Data -The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. +Bundled CSVs in `../data/`: 31 sites (APAC/AMERICAS/EMEA), 31 businesses, 9 SKUs, 70 operations, 20 demand orders, 262 shipments (37 late), 36 quarterly delay predictions. Combined script with stage banners: `../supply_chain_resilience.py`. diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 8cb676e..39b28ac 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -1,21 +1,8 @@ # Runbook: Telco WEST Recovery — Multi-Reasoner Walkthrough -Walk-through of the chained-reasoner pattern this template is built on. One realistic business thread — **WEST region recovery** — traced across all five RAI reasoner families, each stage writing properties back to the same ontology that downstream stages consume. +A regional telco is bleeding $791K/quarter from WEST while every other region grows. No single reasoner can answer where to spend a $5M recovery budget: descriptive scopes the crisis, rules flag broken towers, graph weights them by social blast radius, predictive forecasts forward demand, and prescriptive composes all three signals into the upgrade plan. Each stage writes derived properties back to the same ontology that downstream stages consume. -The template's combined script (`telco_network_recovery.py`) implements the predictive, rules, graph, and prescriptive stages directly; this runbook frames them with a descriptive Stage 1 diagnosis and a Stage 6 interpretation, so a non-OR reader can follow the full reasoning thread end-to-end. - ---- - -## How to read this runbook - -This runbook serves two audiences: - -- **Reading top-to-bottom**: the narrative + ASCII visualizations show what the chain produces stage-by-stage, with the same business framing the stakeholder would see. -- **Per-stage skill blocks**: the boxed `Skill / Prompt` callout at the start of each stage is the recipe — load that RAI agent skill, give it that prompt against the bundled demo data in `../data/`, and the agent will reproduce the stage. - ---- - -## TL;DR — the chain in one screen +## The chain ``` WEST is bleeding $791K/quarter from a network operations crisis. @@ -48,450 +35,18 @@ across all 15 critical towers, prioritized by social blast radius. ───────────────────────────────────────────────────────────────── ``` -A single-reasoner approach can't answer this. Descriptive alone tells the story but no plan. Rules alone identifies broken towers but not which matter most. Graph alone ranks subscribers but doesn't decide. Predictive alone forecasts but doesn't act. Prescriptive alone has no way to weigh "important" without rules + graph upstream. - ---- - -## Step 0 — Scope the question with `rai-discovery` - -> `/rai-discovery` "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" - -Discovery classifies the question by reasoner family and tells you which downstream skills to load: - -| Sub-question | Reasoner | Skill | -|---|---|---| -| Where is the revenue bleed coming from, and is it retention or operational? | Descriptive | `rai-querying` | -| Which WEST towers are technically broken enough to be in scope for upgrade? | Rules | `rai-rules-authoring` | -| Whose service depends on each broken tower — what is the social blast radius? | Graph (PageRank) | `rai-graph-analysis` | -| Is regional demand growing or contracting at those sites over the forward horizon? | Predictive (GNN) | _no public skill yet — see `v1/subscriber_retention/` and `v1/demand_forecasting/` as worked-example references_ | -| Given the cost cap and install-crew budget, which tier should each critical tower receive? | Prescriptive (MIP) | `rai-prescriptive-problem-formulation` | -| Which constraint is binding, and what would change if we relaxed it? | Prescriptive (post-solve) | `rai-prescriptive-results-interpretation` | - -Discovery's output is a *plan*, not code. Everything that follows materializes that plan. - ---- - -## Setup - -See the template's main `README.md` for installation, RAI connection setup, and how to run the script. The narrative below follows the actual stage outputs of `telco_network_recovery.py`. - ---- - -## Stage 1 — Descriptive: diagnose WEST - -> `/rai-querying` "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" - -``` -Q4 2024 — Daily KPIs by region - - avg avail % avg churn rate avg daily revenue - ────────────── ──────────────── ────────────────── - WEST 94.60 ███ 0.0256 █████████ $72,558 ███████ - CENTRAL 99.52 ████████ 0.0046 █ $101,517 ██████████ - SOUTH 99.53 ████████ 0.0049 █ $102,030 ██████████ - EAST 99.55 ████████ 0.0049 █ $103,840 ██████████ - NORTH 99.56 ████████ 0.0050 █ $103,810 ██████████ - NORTHEAST 99.56 ████████ 0.0050 █ $99,569 ██████████ - SOUTHEAST 99.58 ████████ 0.0050 █ $100,429 ██████████ - NORTHWEST 99.54 ████████ 0.0049 █ $100,995 ██████████ - SOUTHWEST 99.55 ████████ 0.0049 █ $101,716 ██████████ - ▲ - WEST is 5× worse on availability AND churn, │ - ~30% lower on daily revenue. │ - │ - Q3-Q4 monthly revenue gaps stack to $791K: ──────┘ - - Sep 2024 forecast $1.40M → actual $1.04M ── −25.9% (−$362K) - Oct 2024 forecast $0.94M → actual $0.70M ── −25.1% (−$236K) - Nov 2024 forecast $0.88M → actual $0.68M ── −22.0% (−$193K) - - WEST tower fleet (81 towers) - - ACTIVE ████████████████████████ 49 - DEGRADED ███████ 15 ← 15 critical_restore - MAINTENANCE ████████ 17 - - Top performance offenders (NetworkPerformance, all WEST DEGRADED): - TWR-0015 ── 8.87% loss 190ms (424 measurements) - TWR-0014 ── 8.75% loss 189ms (273 measurements) - TWR-0010 ── 8.73% loss 188ms (429 measurements) - (… all 15 cluster 8.1-8.9% loss / 185-200ms latency) - ──────────────────────────────────────── - Healthy WEST towers sit at <1% loss / ~30ms latency. - - Retention angle? Zero subs in collections, zero with churn_risk ≥ 0.5. - High-risk subscribers concentrate in SOUTH/NORTH/CENTRAL, not WEST. - This is a NETWORK-side crisis, not subscriber retention. -``` - -Subscriber.churn_risk_score is a static feature that hasn't caught up to WEST's empirical churn (TimeSeriesMetric.churn_rate of 2.6% is 5x other regions). The thread focuses on infrastructure recovery. - ---- - -## Stage 2 — Rules: flag critical_restore towers - -> `/rai-rules-authoring` "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." - -**Properties added to the ontology** (via `model.define(...)`): -- `CellTower.avg_packet_loss` (Float) — `aggs.avg(NetworkPerformance.packet_loss_pct).per(CellTower)` -- `CellTower.avg_latency_ms` (Float) -- `CellTower.avg_error_rate` (Float) -- `CellTower.avg_health_score` (Float) — across attached equipment via two-hop join - -**Rule** — `CellTower.is_critical_restore` (unary `Relationship`): - -```python -# Branch 1: WEST + DEGRADED + degraded equipment health -m.where( - CellTower.region == "WEST", - CellTower.status == "DEGRADED", - CellTower.avg_health_score < 0.85, # health is 0-1 scale -).define(CellTower.is_critical_restore()) - -# Branch 2: WEST + poor performance + degraded health (catches ACTIVE-but-failing) -m.where( - CellTower.region == "WEST", - CellTower.avg_packet_loss > 5.0, - CellTower.avg_health_score < 0.85, -).define(CellTower.is_critical_restore()) -``` - -``` -RULE FIRES → 15 towers flagged is_critical_restore (all WEST DEGRADED) - - TWR-0010 ── health 0.48 loss 8.73% cap_gbps 18 ███ - TWR-0015 ── health 0.60 loss 8.87% cap_gbps 60 ██████ - TWR-0009 ── health 0.62 loss 8.49% cap_gbps 17 ███ - TWR-0012 ── health 0.63 loss 8.59% cap_gbps 84 █████████ - TWR-0008 ── health 0.64 loss 8.51% cap_gbps 71 ████████ - TWR-0014 ── health 0.66 loss 8.75% cap_gbps 36 █████ - TWR-0002 ── health 0.66 loss 8.56% cap_gbps 17 ███ - TWR-0001 ── health 0.66 loss 8.56% cap_gbps 31 █████ - TWR-0011 ── health 0.66 loss 8.68% cap_gbps 61 ███████ - TWR-0005 ── health 0.68 loss 8.12% cap_gbps 94 ██████████ - TWR-0003 ── health 0.69 loss 8.45% cap_gbps 43 █████ - TWR-0007 ── health 0.72 loss 8.54% cap_gbps 17 ███ - TWR-0013 ── health 0.77 loss 8.62% cap_gbps 94 ██████████ - TWR-0006 ── health 0.78 loss 8.51% cap_gbps 37 █████ - TWR-0004 ── health 0.81 loss 8.54% cap_gbps 64 ███████ - -✓ is_critical_restore written back to CellTower (15 rows) -✓ avg_packet_loss / avg_latency_ms / avg_error_rate / avg_health_score - written back to all CellTowers (250 rows) -``` - -Branch 2 didn't fire — none of WEST's ACTIVE towers fall below health 0.85. The 15 flagged are exactly the WEST DEGRADED set, all with packet loss 8.1–8.9% and latency 185–200ms — sharp gap from the rest of WEST. - ---- - -## Stage 3 — Graph: subscriber influence + tower blast radius - -> `/rai-graph-analysis` "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." - -**Construction** — Pattern 3 (`edge_concept`): -- Node concept: `Subscriber` (1,200 nodes) -- Edge concept: `CallDetailRecord`, with `caller`→`callee` (directed) -- Aggregator: `"sum"` (collapse parallel calls between the same pair) - -**Algorithm:** `pagerank()` (default for directed graphs). - -``` -PageRank — top 10 social influencers (of 1,200 subscribers) - - SUB-CON-00900 CONSUMER $3,793 LTV ████████████ 0.002963 - SUB-CON-00723 CONSUMER $3,049 LTV ████████████ 0.002956 - SUB-CON-00262 CONSUMER $3,764 LTV ███████████ 0.002790 - SUB-CON-00274 CONSUMER $2,850 LTV ███████████ 0.002695 - SUB-ENT-0038 ENTERPRISE $283,233 LTV ██████████ 0.002637 ★ - SUB-CON-00705 CONSUMER $765 LTV ██████████ 0.002599 - SUB-CON-00393 CONSUMER $3,219 LTV ██████████ 0.002581 - SUB-ENT-0001 ENTERPRISE $393,340 LTV ██████████ 0.002575 ★ - SUB-CON-01066 CONSUMER $1,146 LTV ██████████ 0.002570 - SUB-CON-00762 CONSUMER $307 LTV ██████████ 0.002525 - - ★ Top enterprise accounts also rank — heavy inbound call traffic. - PageRank captures structural influence independent of LTV. - -Per-critical-tower blast radius (sorted by weighted_impact) - - TWR-0014 61 subs ████████████ 0.0502 ← largest social footprint - TWR-0008 56 subs ██████████ 0.0430 - TWR-0011 48 subs ██████████ 0.0428 - TWR-0012 50 subs █████████ 0.0394 - TWR-0003 43 subs █████████ 0.0393 - TWR-0013 46 subs █████████ 0.0379 - TWR-0004 46 subs █████████ 0.0378 - TWR-0010 48 subs █████████ 0.0375 - TWR-0015 45 subs ████████ 0.0361 - TWR-0002 46 subs ████████ 0.0331 - TWR-0007 44 subs ████████ 0.0330 - TWR-0005 45 subs ████████ 0.0330 - TWR-0009 44 subs ████████ 0.0330 - TWR-0001 41 subs ████████ 0.0322 - TWR-0006 41 subs ████████ 0.0316 - - ────────────────────────────────────────────────────────────────── - 404 distinct subscribers (33% of the 1,200-sub base) route at least - one call through a critical WEST tower. TWR-0014's failure ripples - to 61 subs whose combined social influence is highest. - ────────────────────────────────────────────────────────────────── - -✓ Subscriber.influence_score written back to all 1,200 subscribers -✓ CellTower.impact_count + weighted_impact written back to CellTower -``` - ---- - -## Stage 4 — Predictive: forecast WEST capacity demand - -> `/rai-predictive-modeling` + `/rai-predictive-training` "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history (subscriber_growth_rate, churn_rate, marketing_spend, network_availability_pct, total_calls). Bind each region's forecast back to its towers as a demand multiplier." - -**Method:** GNN node regression on `TimeSeriesMetric` (composite key `metric_date` + `region`). Target: `subscriber_growth_rate`. Features: the other 12 daily KPIs + 3 lag features (`prev_day_growth`, `prev_week_growth`, `growth_7d_mean`) + `region` as a category. Graph: same-region 1-day-lag temporal edges. Train < 2024-11-01 (includes the Sep–Oct WEST decline onset); validate on Nov 2024; test on Dec 2024. - -``` -Per-region GNN-predicted subscriber-growth-rate (Dec 2024 test horizon) - - CENTRAL ───── +0.0075 ████████ ▲ - EAST ───── +0.0073 ████████ │ - NORTH ───── +0.0071 ████████ │ 8 regions cluster - NORTHEAST ───── +0.0070 ████████ │ +0.59 to +0.75%/day - NORTHWEST ───── +0.0067 ████████ │ (mean predicted growth) - SOUTH ───── +0.0065 ████████ │ - SOUTHEAST ───── +0.0063 ███████ │ - SOUTHWEST ───── +0.0059 ███████ ▼ - ▲ - │ - WEST ───── −0.0071 ← anomaly: contracting - multiplier 0.993× - - ────────────────────────────────────────────────────────────────── - WEST projection: 0.7% demand decline over the test horizon. - Stage 5 picks up this multiplier as the 3rd objective coefficient. - ────────────────────────────────────────────────────────────────── - -✓ CellTower.projected_demand_growth written back to all 15 critical towers - (uniform 0.992871 — regional, not per-tower) -``` - -**Stage 5 objective with the predictive term:** - -``` -objective = sum( selected[t,tier] * - capacity_increase_gbps[t,tier] * - weighted_impact[t] * - projected_demand_growth[t] ) # ← Stage 4 contribution -``` - -**Snowflake setup for the GNN:** the template's main script computes lag features (prev-day, prev-week, 7-day mean) and same-region 1-day-lag temporal edges in pandas before loading, so no extra Snowflake DDL is required. To run on your own Snowflake schema instead of the bundled CSV, the equivalent SQL would be a typed copy of the time-series table plus per-region `LAG()` window functions. - -**Caveats:** -- The GNN was tuned for a single 80-epoch run with seed-42 reproducibility; production deployment would expand to a multi-seed average + a held-out holdout window. -- The WEST projection partially encodes the same network-degradation pattern Stages 2/3 flagged ("things have gotten worse and we expect them to keep getting worse if we don't act"). For an independent baseline, train on a pre-degradation slice (H1 2024 only) and compare. - ---- - -## Stage 5 — Prescriptive: tower upgrade selection MIP - -> `/rai-prescriptive-problem-formulation` "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." - -``` -FORMULATION - - Decision variable - TowerUpgradeOption.selected (binary) - 45 binaries = 15 critical-restore towers × {BRONZE, SILVER, GOLD} - - Constraints - 1. At-most-one tier per tower sum(selected).per(CellTower) ≤ 1 - 2. Total cost Σ selected · cost ≤ $5,000,000 - 3. Total install_weeks Σ selected · install_weeks ≤ 200 - - Objective (maximize) - Σ selected · capacity_increase_gbps · weighted_impact · projected_demand_growth - └────── Step 2 (rules) ─────┘└── Stage 3 ──┘└── Stage 4 ──┘ - -────────────────────────────────────────────────────────────────────── -SOLVE (Gurobi) → OPTIMAL 15 active flows 122 Gbps $4,956,843 -────────────────────────────────────────────────────────────────────── - -Tower-tier assignment (sorted by weighted_impact) - - TWR-0014 ── GOLD +6 Gbps $350,864 wgt 0.0502 ████████████ - TWR-0008 ── GOLD +10 Gbps $416,455 wgt 0.0430 ██████████ - TWR-0011 ── GOLD +9 Gbps $481,914 wgt 0.0428 ██████████ - TWR-0012 ── GOLD +8 Gbps $445,825 wgt 0.0394 █████████ - TWR-0003 ── GOLD +11 Gbps $360,785 wgt 0.0393 █████████ - TWR-0013 ── GOLD +9 Gbps $273,831 wgt 0.0379 █████████ - TWR-0004 ── GOLD +9 Gbps $275,353 wgt 0.0378 █████████ - TWR-0010 ── GOLD +12 Gbps $332,694 wgt 0.0375 █████████ - TWR-0015 ── GOLD +11 Gbps $438,932 wgt 0.0361 ████████ - TWR-0002 ── GOLD +11 Gbps $420,363 wgt 0.0331 ████████ - TWR-0007 ── GOLD +9 Gbps $416,640 wgt 0.0330 ████████ - TWR-0005 ── SILVER +3 Gbps $220,435 wgt 0.0330 ████████ ⚐ - TWR-0009 ── BRONZE +3 Gbps $97,784 wgt 0.0330 ████████ ⚐ - TWR-0001 ── GOLD +6 Gbps $274,561 wgt 0.0322 ████████ - TWR-0006 ── SILVER +5 Gbps $150,407 wgt 0.0316 ████████ ⚐ - - ⚐ Lowest weighted_impact towers — solver buys cheaper tiers - to free budget for the higher-impact GOLDs. - -Budget gauge - Cost ████████████████████████████████████████ $4,956,843 / $5,000,000 ── BINDING - Install-wks █████████████████████████████████ 164 / 200 (slack: 36) - -Headline metrics - Capacity restored: 122 Gbps - Tier mix: 12 GOLD · 2 SILVER · 1 BRONZE - Towers covered: 15 of 15 (no triage tradeoff) - Subs serviced: 404 distinct (33% of all 1,200) - Objective without Stage 4 (Σ capacity × weighted_impact): 4.6024 - Objective with Stage 4 (× 0.992871 uniform multiplier): 4.5696 - -✓ TowerUpgradeOption.selected written back — the optimization output - is now a queryable property of the model. -``` - -(Full decision matrix: `outputs/stage5_solution.csv`.) - -### Reading the solve - -- **GOLD dominates** (12/15) — for towers with high social blast radius, GOLD's 6–12 Gbps uplift outweighs its higher cost. -- **Budget is binding** ($4.96M / $5M) — relaxing to $6M would let TWR-0009 jump from BRONZE to GOLD ($481K → +9 Gbps) and lift the objective meaningfully. -- **Install-weeks are not binding** (164/200) — schedule is the looser constraint; budget holds back the plan. -- **Stage 4's uniform multiplier doesn't shift tiers** — the forecast says WEST is contracting (-0.7%), so every upgrade is slightly less valuable in absolute terms, but relative tower priority is unchanged. **A non-uniform forecast would be the more revealing test of the chain's value** — if some WEST towers sat in growth pockets and others in decline, the tier mix would shift accordingly. - ---- - -## Stage 6 — Interpretation - -> `/rai-prescriptive-results-interpretation` "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" - -``` -THE PLAN, IN BUSINESS TERMS - - • 122 Gbps of network capacity restored across all 15 critical towers - within the $5M capex budget. - • Every WEST DEGRADED tower gets an upgrade — no triage tradeoff. - • Service-affected subscribers drop from 404 to ~0 over the install - schedule (164 crew-weeks; 4-month rollout at 2 crews of 5). - • Budget binding — if CFO can flex to $6M, promote TWR-0009 to GOLD - for +9 Gbps marginal lift. - -ONTOLOGY ENRICHMENT — what each stage contributed back - - Stage 2 (rules) CellTower.is_critical_restore [15] - CellTower.avg_packet_loss [250] - CellTower.avg_latency_ms [250] - CellTower.avg_error_rate [250] - CellTower.avg_health_score [250] - - Stage 3 (graph) Subscriber.influence_score [1,200] - CellTower.impact_count [120] - CellTower.weighted_impact [120] - - Stage 4 (predictive) CellTower.projected_demand_growth [15] - - Stage 5 (prescriptive) TowerUpgradeOption.selected [45] - - ────────────────────────────────────────────────────────────────── - Each stage reads what the previous stage wrote. - Re-running any downstream stage automatically picks up enrichments. - No glue code, no DataFrame round-trip — same ontology throughout. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## The chain — accretive ontology enrichment - -``` -THE WEST RECOVERY CHAIN - - STAGE 1 DESCRIPTIVE - "Where is the bleed coming from?" - reads: RevenueForecast, TimeSeriesMetric, NetworkPerformance, CellTower - writes: (situational summary — no ontology mutation) - │ - ▼ - STAGE 2 RULES - "Which towers are critical to restore?" - reads: NetworkPerformance, EquipmentHealth, NetworkEquipment, CellTower - writes: CellTower.is_critical_restore ── 15 towers flagged - CellTower.avg_packet_loss / latency_ms / error_rate / health_score - │ - ▼ - STAGE 3 GRAPH (PageRank) - "Whose service depends on these towers — and who is socially central?" - reads: CallDetailRecord (caller→callee), CDR.routed_through(CellTower) - writes: Subscriber.influence_score ── per subscriber - CellTower.impact_count ── distinct subs served - CellTower.weighted_impact ── Σ subscriber influence - │ - ▼ - STAGE 4 PREDICTIVE (GNN node regression) - "What does the forecast say about future demand?" - reads: TimeSeriesMetric.subscriber_growth_rate × 365d × 9 regions - + 12 daily KPIs + 3 lag features + same-region temporal edges - writes: CellTower.projected_demand_growth ── per critical tower - │ - ▼ - STAGE 5 PRESCRIPTIVE (gurobi MIP) - "What's the optimal $5M tier-selection plan?" - reads: CellTower.is_critical_restore ──► decision-variable scope - CellTower.weighted_impact ──► objective coefficient - CellTower.projected_demand_growth ──► objective coefficient - TowerUpgradeOption.cost / capacity_increase / install_weeks - writes: TowerUpgradeOption.selected ── 15 upgrades chosen - │ - ▼ - Actionable plan, - grounded end-to-end in the same ontology. - - ────────────────────────────────────────────────────────────────── - No glue. No DataFrame ping-pong. No re-derivation per-reasoner. - Five reasoners, one ontology, one accretive thread. - ────────────────────────────────────────────────────────────────── -``` - ---- - -## Why the chain matters (vs. any single stage) - -| Stage alone | What it tells you | What it doesn't | -|---|---|---| -| Descriptive | "WEST is broken" | Which towers, how to fix | -| Rules alone | "These 15 towers are critical" | Which matter most; what to do | -| Graph alone | "These subscribers are influential" | Which towers serve them | -| Predictive alone | "WEST demand is contracting" | Where to spend the recovery budget | -| Prescriptive alone | (won't run — no flagged set, no impact weights, no forecast) | Whole pipeline misses | - -| Combined | Output | -|---|---| -| Descriptive → Rules | Crisis scoped + critical-tower set flagged | -| + Graph | Each flagged tower scored by social blast radius | -| + Predictive | Forward-looking demand multiplier per tower | -| + Prescriptive | $5M plan, 122 Gbps, all 15 covered, prioritized by social impact | - -**Multi-reasoner chaining grounded in (and contributing to) the ontology.** - ---- - -## Adapting this recipe to a new domain - -The chain pattern transfers cleanly. To rebuild for a different problem: - -1. Re-run `rai-discovery` on the new business question — does it actually need all 5 reasoner families, or is one or two sufficient? -2. Strip the demo ontology to the concepts the new chain needs (lean is better for type inference and solver compile time). -3. Stage 1 (descriptive triage) is *optional but high-leverage*: it scopes the problem and rules out a misdiagnosis (e.g., is this a retention crisis or a network crisis?) before any rule, graph, GNN, or solver runs. -4. Stages 2–5 are the load-bearing chain: rules write the flag that scopes graph aggregations and the solver's decision variables; graph writes the per-entity impact weight that becomes a solver objective coefficient; predictive writes the forward-looking multiplier that becomes the second objective coefficient; prescriptive composes both upstream signals into the final plan. -5. Keep the validation checks at every stage: assert flagged-set size, PageRank top-N looks plausible, the GNN forecast separates the anomalous segment from the rest, the solve status is OPTIMAL, the objective is not zero, and at least one constraint is binding (otherwise you're under-constrained). - -The shape this template demonstrates — *each reasoner writes a property the next reasoner reads* — is what makes the chain accretive rather than serial. The agent skills are how you reliably author each link. +## Workflow ---- +| # | Skill + prompt | Expected output | +|---|----------------|-----------------| +| 0 | `/rai-discovery` "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" | Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). | +| 1 | `/rai-querying` "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" | WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. | +| 2 | `/rai-rules-authoring` "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." | `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. | +| 3 | `/rai-graph-analysis` "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." | `Subscriber.influence_score` (PageRank) on all 1,200 subs; `CellTower.weighted_impact` on 15 critical towers; 404 distinct subs (33% of base) route through a critical tower; TWR-0014 has the largest footprint (61 subs, 0.0502). | +| 4 | `/rai-predictive-modeling` + `/rai-predictive-training` "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier." | GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.993× (contracting ~0.7%); 8 other regions +0.59 to +0.75%/day; written to `CellTower.projected_demand_growth` for 15 critical towers. | +| 5 | `/rai-prescriptive-problem-formulation` "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." | OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. | +| 6 | `/rai-prescriptive-results-interpretation` "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" | Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. | -## Data Reference +## Data -- **Source data**: bundled CSVs in `../data/` (the main template ships ~1.2 MB of synthetic-but-realistic telco data — 250 cell towers, 1,200 subscribers, 6,000 CDRs, 3,285 daily KPI rows across 9 regions). -- **Ontology**: the template's main script uses a focused 7-concept subset of a broader 18-concept telco knowledge graph (PostalArea, Subscriber, Contract, BillingEvent, CellTower, NetworkEquipment, EquipmentHealth, NetworkEvent, CallDetailRecord, SupplierOrder, Campaign, PromotionRedemption, RevenueForecast, NetworkPerformance, SupportTicket, TimeSeriesMetric, TowerUpgradeOption, Part) — sufficient for the four-stage chain. -- **Stages**: implemented in `../telco_network_recovery.py` as a single combined script with stage banners. +Bundled CSVs in `../data/`: 250 cell towers (15 WEST DEGRADED), 1,200 subscribers, 6,000 directed CDRs, ~5,000 NetworkPerformance measurements, 544 NetworkEquipment + EquipmentHealth rows, 360 TowerUpgradeOptions (3 tiers × 120 in-scope towers), 3,285 daily KPI rows (365 days × 9 regions). All stages run end-to-end via `../telco_network_recovery.py`. From 95277d58820b97e23bbba63c15d58682e5bf4772 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:28:58 -0700 Subject: [PATCH 06/22] Convert workflow tables to bullets for easier copy Bullet format gives each prompt its own line so users can triple-click to select and copy. Expected output renders as a paragraph under each bullet. No content changes, just structural. --- v1/energy_grid_planning/references/runbook.md | 32 ++++++++++++---- v1/machine_maintenance/references/runbook.md | 32 ++++++++++++---- v1/portfolio_balancing/references/runbook.md | 32 ++++++++++++---- .../references/runbook.md | 32 ++++++++++++---- .../references/runbook.md | 37 ++++++++++++++----- 5 files changed, 124 insertions(+), 41 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index 87242eb..20443d5 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -38,14 +38,30 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ## Workflow -| # | Skill + prompt | Expected output | -|---|----------------|-----------------| -| 0 | `/rai-discovery` "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" | Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. | -| 1 | `/rai-predictive-modeling` + `/rai-predictive-training` "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." | `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). | -| 2 | `/rai-graph-analysis` "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." | 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. | -| 3 | `/rai-rules-authoring` "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" | `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. | -| 4 | `/rai-prescriptive-problem-formulation` "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." | OPTIMAL MIP across 5 `InvestmentLevel` values in one solve; `x_approve` and `x_upgrade` written back per level. | -| 5 | `/rai-prescriptive-results-interpretation` "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" | Pareto frontier with knee at $300M (5 DCs, 1,500 MW, $264M net); marginal $995K/$M at knee, declining to $400K/$M by $600M; Google + Lambda never approved (DFW full). | +- `/rai-discovery` "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" + + Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. + +- `/rai-predictive-modeling` + `/rai-predictive-training` "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." + + `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). + +- `/rai-graph-analysis` "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." + + 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. + +- `/rai-rules-authoring` "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" + + `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. + +- `/rai-prescriptive-problem-formulation` "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." + + OPTIMAL MIP across 5 `InvestmentLevel` values in one solve; `x_approve` and `x_upgrade` written back per level. + +- `/rai-prescriptive-results-interpretation` "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" + + Pareto frontier with knee at $300M (5 DCs, 1,500 MW, $264M net); marginal $995K/$M at knee, declining to $400K/$M by $600M; Google + Lambda never approved (DFW full). + ## Data diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index d4efa03..c059acd 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -41,14 +41,30 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ## Workflow -| # | Skill + prompt | Expected output | -|---|----------------|-----------------| -| 0 | `/rai-discovery` "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" | Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. | -| 1 | `/rai-querying` "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" | Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4%; 7 of 9 anomalies at Plant_A; `MachinePeriod.predicted_fp` written for 120 rows. | -| 2 | `/rai-graph-analysis` "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." | 30 machines → 1 connected component; Pumps tie at top betweenness (24.0 raw, 1.0 normalized); `Machine.betweenness` stored. | -| 3 | `/rai-rules-authoring` "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." | 6 overdue, 1 high-risk, 3 chronic; M013 (Pump, Plant_A) = Critical; M016 (Turbine, Plant_A) = Elevated. | -| 4 | `/rai-prescriptive-problem-formulation` "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." | 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × criticality × (1 + 2.0 × betweenness)`. | -| 5 | `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Solve with HiGHS, then for each machine type check whether all qualified techs sit in one location and recommend the cheapest cross-training fix." | OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. | +- `/rai-discovery` "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" + + Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. + +- `/rai-querying` "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" + + Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4%; 7 of 9 anomalies at Plant_A; `MachinePeriod.predicted_fp` written for 120 rows. + +- `/rai-graph-analysis` "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." + + 30 machines → 1 connected component; Pumps tie at top betweenness (24.0 raw, 1.0 normalized); `Machine.betweenness` stored. + +- `/rai-rules-authoring` "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." + + 6 overdue, 1 high-risk, 3 chronic; M013 (Pump, Plant_A) = Critical; M016 (Turbine, Plant_A) = Elevated. + +- `/rai-prescriptive-problem-formulation` "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." + + 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × criticality × (1 + 2.0 × betweenness)`. + +- `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Solve with HiGHS, then for each machine type check whether all qualified techs sit in one location and recommend the cheapest cross-training fix." + + OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. + ## Data diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index 4a89259..636145f 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -37,14 +37,30 @@ base at every lambda — without the cluster collapse, the gap would grow. ## Workflow -| # | Skill + prompt | Expected output | -|---|----------------|-----------------| -| 0 | `/rai-discovery` "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" | Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. | -| 1 | `/rai-rules-authoring` "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." | 4 holdings flagged (AAPL/MSFT on Account 1, JNJ/PFE on Account 4); 2 (account, sector) pairs flagged (Account 1 Tech 34.0%, Account 4 Healthcare 32.2%); 2 users flagged (Alice Chen 0.85, Eve Taylor 0.92). | -| 2 | `/rai-graph-analysis` "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." | 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. | -| 3 | `/rai-prescriptive-problem-formulation` "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." | 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. | -| 4 | `/rai-prescriptive-results-interpretation` "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most?" | base_500 frontier: returns 32.43 -> 40.28, risk 1160 -> 1742. Marginal `delta_risk/delta_return` jumps ~3x at eps_1 (knee). Same shape across all 6 scenarios — risk scales as budget^2, rate-form frontier is budget-independent. | -| 5 | `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?" | Crisis vol +28-30% above base at every lambda (budget 500: min_risk 34.06 -> 43.74, eps_1 34.30 -> 44.54 peak). Gap peaks mid-frontier, narrows to +25.2% at eps_5 — the cluster-collapse payoff. | +- `/rai-discovery` "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" + + Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. + +- `/rai-rules-authoring` "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." + + 4 holdings flagged (AAPL/MSFT on Account 1, JNJ/PFE on Account 4); 2 (account, sector) pairs flagged (Account 1 Tech 34.0%, Account 4 Healthcare 32.2%); 2 users flagged (Alice Chen 0.85, Eve Taylor 0.92). + +- `/rai-graph-analysis` "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." + + 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. + +- `/rai-prescriptive-problem-formulation` "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." + + 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. + +- `/rai-prescriptive-results-interpretation` "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most?" + + base_500 frontier: returns 32.43 -> 40.28, risk 1160 -> 1742. Marginal `delta_risk/delta_return` jumps ~3x at eps_1 (knee). Same shape across all 6 scenarios — risk scales as budget^2, rate-form frontier is budget-independent. + +- `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?" + + Crisis vol +28-30% above base at every lambda (budget 500: min_risk 34.06 -> 43.74, eps_1 34.30 -> 44.54 peak). Gap peaks mid-frontier, narrows to +25.2% at eps_5 — the cluster-collapse payoff. + ## Data diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index c7a68ad..331bd3c 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -35,14 +35,30 @@ watch->avoid downgrade = +0.0% (optimizer already routed around it). ## Workflow -| # | Skill + prompt | Expected output | -|---|----------------|-----------------| -| 0 | `/rai-discovery` "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" | Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). | -| 1 | `/rai-graph-analysis` "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." | `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. | -| 2 | `/rai-graph-analysis` "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." | `Site.centrality` normalized [0,1]: S004=1.000, S006=0.776, S003=0.735; 2 weakly-connected components. | -| 3 | `/rai-rules-authoring` "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." | `is_unreliable` (1: B017), `has_high_delay_risk` (2: B003, B017), `is_watch_level` (2), `Demand.is_escalated` (9). | -| 4 | `/rai-prescriptive-problem-formulation` "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." | MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. | -| 5 | `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" | Baseline OPTIMAL $1,865 / 8 flows / 0 unmet; S004 offline +88.5%; watch->avoid +0.0% (B003 already off optimal lanes). | +- `/rai-discovery` "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" + + Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). + +- `/rai-graph-analysis` "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." + + `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. + +- `/rai-graph-analysis` "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." + + `Site.centrality` normalized [0,1]: S004=1.000, S006=0.776, S003=0.735; 2 weakly-connected components. + +- `/rai-rules-authoring` "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." + + `is_unreliable` (1: B017), `has_high_delay_risk` (2: B003, B017), `is_watch_level` (2), `Demand.is_escalated` (9). + +- `/rai-prescriptive-problem-formulation` "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." + + MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. + +- `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" + + Baseline OPTIMAL $1,865 / 8 flows / 0 unmet; S004 offline +88.5%; watch->avoid +0.0% (B003 already off optimal lanes). + ## Data diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 39b28ac..cf88bc2 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -37,15 +37,34 @@ across all 15 critical towers, prioritized by social blast radius. ## Workflow -| # | Skill + prompt | Expected output | -|---|----------------|-----------------| -| 0 | `/rai-discovery` "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" | Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). | -| 1 | `/rai-querying` "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" | WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. | -| 2 | `/rai-rules-authoring` "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." | `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. | -| 3 | `/rai-graph-analysis` "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." | `Subscriber.influence_score` (PageRank) on all 1,200 subs; `CellTower.weighted_impact` on 15 critical towers; 404 distinct subs (33% of base) route through a critical tower; TWR-0014 has the largest footprint (61 subs, 0.0502). | -| 4 | `/rai-predictive-modeling` + `/rai-predictive-training` "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier." | GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.993× (contracting ~0.7%); 8 other regions +0.59 to +0.75%/day; written to `CellTower.projected_demand_growth` for 15 critical towers. | -| 5 | `/rai-prescriptive-problem-formulation` "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." | OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. | -| 6 | `/rai-prescriptive-results-interpretation` "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" | Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. | +- `/rai-discovery` "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" + + Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). + +- `/rai-querying` "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" + + WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. + +- `/rai-rules-authoring` "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." + + `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. + +- `/rai-graph-analysis` "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." + + `Subscriber.influence_score` (PageRank) on all 1,200 subs; `CellTower.weighted_impact` on 15 critical towers; 404 distinct subs (33% of base) route through a critical tower; TWR-0014 has the largest footprint (61 subs, 0.0502). + +- `/rai-predictive-modeling` + `/rai-predictive-training` "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier." + + GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.993× (contracting ~0.7%); 8 other regions +0.59 to +0.75%/day; written to `CellTower.projected_demand_growth` for 15 critical towers. + +- `/rai-prescriptive-problem-formulation` "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." + + OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. + +- `/rai-prescriptive-results-interpretation` "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" + + Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. + ## Data From 818a4174d3a021d324792a4e533ff011851c8fe5 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:30:54 -0700 Subject: [PATCH 07/22] Reformat workflow steps with descriptive headers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each step is now `### N. ` followed by two bullets: - Prompt: (in code formatting, no quotes — easy to triple-click and copy as a single agent invocation) - Response: Same content, clearer structure for skim + copy. --- v1/energy_grid_planning/references/runbook.md | 31 +++++++++------- v1/machine_maintenance/references/runbook.md | 31 +++++++++------- v1/portfolio_balancing/references/runbook.md | 31 +++++++++------- .../references/runbook.md | 31 +++++++++------- .../references/runbook.md | 36 +++++++++++-------- 5 files changed, 93 insertions(+), 67 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index 20443d5..31984f5 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -38,30 +38,35 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ## Workflow -- `/rai-discovery` "We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?" +### 0. Discovery - Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. +- Prompt: `/rai-discovery We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?` +- Response: Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. -- `/rai-predictive-modeling` + `/rai-predictive-training` "Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it." +### 1. Forecast substation load - `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). +- Prompt: `/rai-predictive-modeling + /rai-predictive-training Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it.` +- Response: `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). -- `/rai-graph-analysis` "Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology." +### 2. Find structural bottlenecks - 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. +- Prompt: `/rai-graph-analysis Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology.` +- Response: 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. -- `/rai-rules-authoring` "Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?" +### 3. Screen DC requests - `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. +- Prompt: `/rai-rules-authoring Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?` +- Response: `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. -- `/rai-prescriptive-problem-formulation` "Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades." +### 4. Approve DCs and fund upgrades - OPTIMAL MIP across 5 `InvestmentLevel` values in one solve; `x_approve` and `x_upgrade` written back per level. +- Prompt: `/rai-prescriptive-problem-formulation Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades.` +- Response: OPTIMAL MIP across 5 `InvestmentLevel` values in one solve; `x_approve` and `x_upgrade` written back per level. -- `/rai-prescriptive-results-interpretation` "Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?" - - Pareto frontier with knee at $300M (5 DCs, 1,500 MW, $264M net); marginal $995K/$M at knee, declining to $400K/$M by $600M; Google + Lambda never approved (DFW full). +### 5. Read the frontier +- Prompt: `/rai-prescriptive-results-interpretation Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?` +- Response: Pareto frontier with knee at $300M (5 DCs, 1,500 MW, $264M net); marginal $995K/$M at knee, declining to $400K/$M by $600M; Google + Lambda never approved (DFW full). ## Data diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index c059acd..b9a52c8 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -41,30 +41,35 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ## Workflow -- `/rai-discovery` "We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?" +### 0. Discovery - Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. +- Prompt: `/rai-discovery We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?` +- Response: Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. -- `/rai-querying` "What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?" +### 1. Diagnose plant operations - Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4%; 7 of 9 anomalies at Plant_A; `MachinePeriod.predicted_fp` written for 120 rows. +- Prompt: `/rai-querying What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?` +- Response: Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4%; 7 of 9 anomalies at Plant_A; `MachinePeriod.predicted_fp` written for 120 rows. -- `/rai-graph-analysis` "Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines." +### 2. Find scheduling bottlenecks - 30 machines → 1 connected component; Pumps tie at top betweenness (24.0 raw, 1.0 normalized); `Machine.betweenness` stored. +- Prompt: `/rai-graph-analysis Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines.` +- Response: 30 machines → 1 connected component; Pumps tie at top betweenness (24.0 raw, 1.0 normalized); `Machine.betweenness` stored. -- `/rai-rules-authoring` "Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard." +### 3. Classify machine risk - 6 overdue, 1 high-risk, 3 chronic; M013 (Pump, Plant_A) = Critical; M016 (Turbine, Plant_A) = Elevated. +- Prompt: `/rai-rules-authoring Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard.` +- Response: 6 overdue, 1 high-risk, 3 chronic; M013 (Pump, Plant_A) = Critical; M016 (Turbine, Plant_A) = Elevated. -- `/rai-prescriptive-problem-formulation` "Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel." +### 4. Schedule maintenance - 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × criticality × (1 + 2.0 × betweenness)`. +- Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel.` +- Response: 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × criticality × (1 + 2.0 × betweenness)`. -- `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Solve with HiGHS, then for each machine type check whether all qualified techs sit in one location and recommend the cheapest cross-training fix." - - OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. +### 5. Stress-test concentration +- Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Solve with HiGHS, then for each machine type check whether all qualified techs sit in one location and recommend the cheapest cross-training fix.` +- Response: OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. ## Data diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index 636145f..efea678 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -37,30 +37,35 @@ base at every lambda — without the cluster collapse, the gap would grow. ## Workflow -- `/rai-discovery` "Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?" +### 0. Discovery - Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. +- Prompt: `/rai-discovery Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?` +- Response: Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. -- `/rai-rules-authoring` "Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions." +### 1. Compliance scan - 4 holdings flagged (AAPL/MSFT on Account 1, JNJ/PFE on Account 4); 2 (account, sector) pairs flagged (Account 1 Tech 34.0%, Account 4 Healthcare 32.2%); 2 users flagged (Alice Chen 0.85, Eve Taylor 0.92). +- Prompt: `/rai-rules-authoring Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions.` +- Response: 4 holdings flagged (AAPL/MSFT on Account 1, JNJ/PFE on Account 4); 2 (account, sector) pairs flagged (Account 1 Tech 34.0%, Account 4 Healthcare 32.2%); 2 users flagged (Alice Chen 0.85, Eve Taylor 0.92). -- `/rai-graph-analysis` "Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization." +### 2. Cluster correlated bets - 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. +- Prompt: `/rai-graph-analysis Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization.` +- Response: 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. -- `/rai-prescriptive-problem-formulation` "Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range." +### 3. Solve mean-variance frontier - 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. +- Prompt: `/rai-prescriptive-problem-formulation Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range.` +- Response: 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. -- `/rai-prescriptive-results-interpretation` "For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most?" +### 4. Read the frontier - base_500 frontier: returns 32.43 -> 40.28, risk 1160 -> 1742. Marginal `delta_risk/delta_return` jumps ~3x at eps_1 (knee). Same shape across all 6 scenarios — risk scales as budget^2, rate-form frontier is budget-independent. +- Prompt: `/rai-prescriptive-results-interpretation For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most?` +- Response: base_500 frontier: returns 32.43 -> 40.28, risk 1160 -> 1742. Marginal `delta_risk/delta_return` jumps ~3x at eps_1 (knee). Same shape across all 6 scenarios — risk scales as budget^2, rate-form frontier is budget-independent. -- `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?" - - Crisis vol +28-30% above base at every lambda (budget 500: min_risk 34.06 -> 43.74, eps_1 34.30 -> 44.54 peak). Gap peaks mid-frontier, narrows to +25.2% at eps_5 — the cluster-collapse payoff. +### 5. Stress under crisis +- Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?` +- Response: Crisis vol +28-30% above base at every lambda (budget 500: min_risk 34.06 -> 43.74, eps_1 34.30 -> 44.54 peak). Gap peaks mid-frontier, narrows to +25.2% at eps_5 — the cluster-collapse payoff. ## Data diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 331bd3c..39b7713 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -35,30 +35,35 @@ watch->avoid downgrade = +0.0% (optimizer already routed around it). ## Workflow -- `/rai-discovery` "We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?" +### 0. Discovery - Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). +- Prompt: `/rai-discovery We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?` +- Response: Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). -- `/rai-graph-analysis` "If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores." +### 1. Map upstream supplier exposure - `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. +- Prompt: `/rai-graph-analysis If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores.` +- Response: `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. -- `/rai-graph-analysis` "Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight." +### 2. Rank network hubs - `Site.centrality` normalized [0,1]: S004=1.000, S006=0.776, S003=0.735; 2 weakly-connected components. +- Prompt: `/rai-graph-analysis Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight.` +- Response: `Site.centrality` normalized [0,1]: S004=1.000, S006=0.776, S003=0.735; 2 weakly-connected components. -- `/rai-rules-authoring` "Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged)." +### 3. Classify supplier reliability - `is_unreliable` (1: B017), `has_high_delay_risk` (2: B003, B017), `is_watch_level` (2), `Demand.is_escalated` (9). +- Prompt: `/rai-rules-authoring Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged).` +- Response: `is_unreliable` (1: B017), `has_high_delay_risk` (2: B003, B017), `is_watch_level` (2), `Demand.is_escalated` (9). -- `/rai-prescriptive-problem-formulation` "Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit." +### 4. Solve risk-adjusted flow - MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. +- Prompt: `/rai-prescriptive-problem-formulation Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit.` +- Response: MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. -- `/rai-prescriptive-solver-management` + `/rai-prescriptive-results-interpretation` "Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?" - - Baseline OPTIMAL $1,865 / 8 flows / 0 unmet; S004 offline +88.5%; watch->avoid +0.0% (B003 already off optimal lanes). +### 5. Quantify disruption scenarios +- Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?` +- Response: Baseline OPTIMAL $1,865 / 8 flows / 0 unmet; S004 offline +88.5%; watch->avoid +0.0% (B003 already off optimal lanes). ## Data diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index cf88bc2..9eefd0e 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -37,34 +37,40 @@ across all 15 critical towers, prioritized by social blast radius. ## Workflow -- `/rai-discovery` "WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?" +### 0. Discovery - Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). +- Prompt: `/rai-discovery WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?` +- Response: Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). -- `/rai-querying` "Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?" +### 1. Diagnose WEST - WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. +- Prompt: `/rai-querying Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?` +- Response: WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. -- `/rai-rules-authoring` "Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85." +### 2. Flag critical-restore towers - `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. +- Prompt: `/rai-rules-authoring Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85.` +- Response: `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. -- `/rai-graph-analysis` "Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails." +### 3. Score subscriber blast radius - `Subscriber.influence_score` (PageRank) on all 1,200 subs; `CellTower.weighted_impact` on 15 critical towers; 404 distinct subs (33% of base) route through a critical tower; TWR-0014 has the largest footprint (61 subs, 0.0502). +- Prompt: `/rai-graph-analysis Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails.` +- Response: `Subscriber.influence_score` (PageRank) on all 1,200 subs; `CellTower.weighted_impact` on 15 critical towers; 404 distinct subs (33% of base) route through a critical tower; TWR-0014 has the largest footprint (61 subs, 0.0502). -- `/rai-predictive-modeling` + `/rai-predictive-training` "Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier." +### 4. Forecast regional demand - GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.993× (contracting ~0.7%); 8 other regions +0.59 to +0.75%/day; written to `CellTower.projected_demand_growth` for 15 critical towers. +- Prompt: `/rai-predictive-modeling + /rai-predictive-training Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier.` +- Response: GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.993× (contracting ~0.7%); 8 other regions +0.59 to +0.75%/day; written to `CellTower.projected_demand_growth` for 15 critical towers. -- `/rai-prescriptive-problem-formulation` "Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth." +### 5. Optimize tier selection - OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. +- Prompt: `/rai-prescriptive-problem-formulation Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth.` +- Response: OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. -- `/rai-prescriptive-results-interpretation` "Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?" - - Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. +### 6. Interpret the plan +- Prompt: `/rai-prescriptive-results-interpretation Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?` +- Response: Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. ## Data From 317960d8aecefff5867117d0b0d93f01bae4650c Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:36:39 -0700 Subject: [PATCH 08/22] Audit-pass fixes against template scripts and READMEs - telco: reorder to summit-demo workflow (descriptive -> rules -> graph -> predictive -> prescriptive -> interpret); fix WEST multiplier 0.993x -> 0.9998x; fix other-region growth range to +0.45-0.91%/day; clarify projected_demand_growth is written to all 250 towers via region join, not just 15 - energy: drop fabricated 36-month forecast horizon -> 24-month; rewrite Stage 3 low-carbon prompt to describe the actual rule (per-DC requirement vs zero-emission share) instead of a fabricated 25%/100% threshold - machine_maintenance: drop unsupported "Turbines need on-site qualified technician" hard constraint -- script penalizes travel cost, doesn't enforce co-location; add parts_cost factor to failure cost formula - portfolio: tighten crisis vol-gap range to actual 25-30% (peak +29.8% at eps_1, low +25.2% at eps_5) - supply_chain: no factual changes (verified clean) --- v1/energy_grid_planning/references/runbook.md | 4 ++-- v1/machine_maintenance/references/runbook.md | 4 ++-- v1/portfolio_balancing/references/runbook.md | 9 +++++---- v1/telco_network_recovery/references/runbook.md | 10 +++++----- 4 files changed, 14 insertions(+), 13 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index 31984f5..49d9653 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -45,7 +45,7 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ### 1. Forecast substation load -- Prompt: `/rai-predictive-modeling + /rai-predictive-training Can we forecast substation load growth over the next 36 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it.` +- Prompt: `/rai-predictive-modeling + /rai-predictive-training Can we forecast substation load growth over the next 24 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it.` - Response: `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). ### 2. Find structural bottlenecks @@ -55,7 +55,7 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ### 3. Screen DC requests -- Prompt: `/rai-rules-authoring Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) if 100% low-carbon required, region must have 25%+ renewable, (3) substation shouldn't be most structurally critical. Which requests pass all three?` +- Prompt: `/rai-rules-authoring Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) substation's low-carbon (renewable + nuclear) generation share must meet the DC's low-carbon requirement, (3) substation shouldn't be one of the top-3 structurally critical. Which requests pass all three?` - Response: `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. ### 4. Approve DCs and fund upgrades diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index b9a52c8..85f7293 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -63,8 +63,8 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ### 4. Schedule maintenance -- Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and Turbines need an on-site qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel.` -- Response: 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × criticality × (1 + 2.0 × betweenness)`. +- Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and each maintained machine needs exactly one qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel (penalized when the technician is not co-located).` +- Response: 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × parts_cost × criticality × (1 + 2.0 × betweenness)`. ### 5. Stress-test concentration diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index efea678..5937cec 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -29,9 +29,10 @@ base at every lambda — without the cluster collapse, the gap would grow. frontier per scenario. Knee at eps_1. ───────────────────────────────────────────────────────────────── STAGE 4 Stress ──► Stock.regime_covar (PSD-preserving) - Crisis vol 25-30% above base at every - lambda. Gap peaks mid-frontier, narrows - toward the concentrated end. + Crisis vol 25-30% above base across + the frontier. Gap peaks mid-frontier + (eps_1..eps_2 at +29.8%), narrows + toward the concentrated end (+25.2%). ───────────────────────────────────────────────────────────────── ``` @@ -65,7 +66,7 @@ base at every lambda — without the cluster collapse, the gap would grow. ### 5. Stress under crisis - Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?` -- Response: Crisis vol +28-30% above base at every lambda (budget 500: min_risk 34.06 -> 43.74, eps_1 34.30 -> 44.54 peak). Gap peaks mid-frontier, narrows to +25.2% at eps_5 — the cluster-collapse payoff. +- Response: Crisis vol 25-30% above base across the frontier (budget 500: min_risk 34.06 -> 43.74 at +28.4%, eps_1 34.30 -> 44.54 at +29.8% peak). Gap peaks mid-frontier (eps_1..eps_2 at +29.8%), narrows to +25.2% at eps_5 — the cluster-collapse payoff. ## Data diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 9eefd0e..33b1edd 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -1,6 +1,6 @@ # Runbook: Telco WEST Recovery — Multi-Reasoner Walkthrough -A regional telco is bleeding $791K/quarter from WEST while every other region grows. No single reasoner can answer where to spend a $5M recovery budget: descriptive scopes the crisis, rules flag broken towers, graph weights them by social blast radius, predictive forecasts forward demand, and prescriptive composes all three signals into the upgrade plan. Each stage writes derived properties back to the same ontology that downstream stages consume. +A regional telco is bleeding $791K/quarter from WEST while every other region grows. No single reasoner can answer where to spend a $5M recovery budget: descriptive scopes the crisis, rules flag broken towers, graph weights them by social blast radius, predictive forecasts forward demand, and prescriptive composes all four signals into the upgrade plan. Each stage writes derived properties back to the same ontology that downstream stages consume. ## The chain @@ -24,9 +24,9 @@ across all 15 critical towers, prioritized by social blast radius. 404 distinct subs (33% of base) route calls through a critical tower. ───────────────────────────────────────────────────────────────── - STAGE 4 Predictive ──► CellTower.projected_demand_growth (15) - (GNN) WEST: 0.993× ── shrinking ~0.7%/yr - while 8 other regions sit at +0.59 to +0.75%/day. + STAGE 4 Predictive ──► CellTower.projected_demand_growth (250) + (GNN) WEST: 0.9998× ── flat/slightly contracting + while 8 other regions sit at +0.45 to +0.91%/day. ───────────────────────────────────────────────────────────────── STAGE 5 Prescriptive ──► TowerUpgradeOption.selected (15) OPTIMAL · 12 GOLD · 2 SILVER · 1 BRONZE @@ -60,7 +60,7 @@ across all 15 critical towers, prioritized by social blast radius. ### 4. Forecast regional demand - Prompt: `/rai-predictive-modeling + /rai-predictive-training Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier.` -- Response: GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.993× (contracting ~0.7%); 8 other regions +0.59 to +0.75%/day; written to `CellTower.projected_demand_growth` for 15 critical towers. +- Response: GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.9998× (flat/slightly contracting); 8 other regions +0.45 to +0.91%/day; written to `CellTower.projected_demand_growth` for all 250 towers via region join. ### 5. Optimize tier selection From ec8ae9a032e45706c4c7c7fc84613a80610f5598 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:41:47 -0700 Subject: [PATCH 09/22] Strip mechanical bits from prompts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prompts should describe what the user wants, not how the agent should do it. The skill (with the agent) handles solver choice, formula construction, and implementation details. - telco Stage 5: drop the explicit Σ formula - portfolio Stage 2: 'force the rest to zero' -> 'only invest in those' - portfolio Stage 3: drop 'anchor / sweep / forced to zero' agent scaffolding; show 7 frontier points instead - supply chain Stage 3: rephrase as 'find the minimum-cost shipping plan' (no 'Solve a ... LP'), 'don't ship from avoid suppliers', 'prefer non-bottleneck sites' - machine_maintenance Stage 5: drop 'Solve with HiGHS' (mechanical) - machine_maintenance Stage 4: simplify cost-formula language --- v1/machine_maintenance/references/runbook.md | 4 ++-- v1/portfolio_balancing/references/runbook.md | 4 ++-- v1/supply_chain_resilience/references/runbook.md | 2 +- v1/telco_network_recovery/references/runbook.md | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index 85f7293..6a26b52 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -63,12 +63,12 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ### 4. Schedule maintenance -- Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and each maintained machine needs exactly one qualified technician. Minimize expected failure cost weighted by criticality and centrality, plus labor and travel (penalized when the technician is not co-located).` +- Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and each maintained machine needs a qualified technician. Minimize expected failure cost (weighted by criticality and centrality) plus labor and travel.` - Response: 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × parts_cost × criticality × (1 + 2.0 × betweenness)`. ### 5. Stress-test concentration -- Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Solve with HiGHS, then for each machine type check whether all qualified techs sit in one location and recommend the cheapest cross-training fix.` +- Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation For each machine type, check whether all qualified technicians sit in one location and recommend the cheapest cross-training fix.` - Response: OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. ## Data diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index 5937cec..85293ba 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -50,12 +50,12 @@ base at every lambda — without the cluster collapse, the gap would grow. ### 2. Cluster correlated bets -- Prompt: `/rai-graph-analysis Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and force the rest to zero in optimization.` +- Prompt: `/rai-graph-analysis Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and only invest in those.` - Response: 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. ### 3. Solve mean-variance frontier -- Prompt: `/rai-prescriptive-problem-formulation Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, non-representatives forced to zero. Anchor with min-risk and max-return, then sweep 5 epsilon points across the return range.` +- Prompt: `/rai-prescriptive-problem-formulation Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, only invest in cluster representatives. Show 7 points per frontier.` - Response: 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. ### 4. Read the frontier diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 39b7713..82bd56c 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -57,7 +57,7 @@ watch->avoid downgrade = +0.0% (optimizer already routed around it). ### 4. Solve risk-adjusted flow -- Prompt: `/rai-prescriptive-problem-formulation Solve a minimum-cost flow that fulfills all open demand orders at minimum total transport cost. Hard-block 'avoid' suppliers, surcharge 'watch' suppliers $5/unit, weight bottleneck sites by their centrality, and penalize unmet demand at $100/unit.` +- Prompt: `/rai-prescriptive-problem-formulation Find the minimum-cost shipping plan that fulfills all open demand. Don't ship from 'avoid' suppliers at all, add a $5/unit surcharge for 'watch' suppliers, prefer non-bottleneck sites, and charge $100/unit for any unmet demand.` - Response: MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. ### 5. Quantify disruption scenarios diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 33b1edd..9288476 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -64,7 +64,7 @@ across all 15 critical towers, prioritized by social blast radius. ### 5. Optimize tier selection -- Prompt: `/rai-prescriptive-problem-formulation Recover WEST capacity within $5M and 200 install-weeks, prioritizing towers by social blast radius and forward-looking demand. From TowerUpgradeOption, pick at most one upgrade tier (BRONZE/SILVER/GOLD) per critical-restore tower, maximizing Σ capacity_increase × weighted_impact × projected_demand_growth.` +- Prompt: `/rai-prescriptive-problem-formulation Recover WEST capacity within $5M and 200 install-weeks. For each critical-restore tower, pick at most one upgrade tier (BRONZE, SILVER, or GOLD) to maximize capacity restored, weighted by each tower's blast radius and the regional demand forecast.` - Response: OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. ### 6. Interpret the plan From 26833b4080579bac5e7d6e668378cba58dec11b3 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 09:55:22 -0700 Subject: [PATCH 10/22] Add ontology-build step at the front of each runbook Step 1 is now /rai-build-starter-ontology against the bundled CSVs. Discovery, the chain stages, and interpretation shift to 2..N. Reflects that users start with the demo data and need the ontology materialized before any reasoner skill can run. --- v1/energy_grid_planning/references/runbook.md | 17 +++++++++++------ v1/machine_maintenance/references/runbook.md | 17 +++++++++++------ v1/portfolio_balancing/references/runbook.md | 17 +++++++++++------ .../references/runbook.md | 17 +++++++++++------ .../references/runbook.md | 19 ++++++++++++------- 5 files changed, 56 insertions(+), 31 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index 49d9653..d4615f5 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -38,32 +38,37 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ## Workflow -### 0. Discovery +### 1. Build ontology + +- Prompt: `/rai-build-starter-ontology Build an ontology for grid infrastructure planning from the CSVs in ../data/ covering substations, generators, transmission lines, demand forecasts, data center requests, and substation upgrades.` +- Response: Concepts: `Substation`, `Generator`, `TransmissionLine`, `LoadZone`, `DemandPeriod`, `RenewableProfile`, `MaintenanceWindow`, `Customer`, `DataCenterRequest`, `SubstationUpgrade`, `DemandForecast`, `LoadHistory`, `DCAnnouncement` — bound to the bundled CSVs (12 substations, 10 DC requests, 18 transmission lines). + +### 2. Discovery - Prompt: `/rai-discovery We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?` - Response: Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. -### 1. Forecast substation load +### 3. Forecast substation load - Prompt: `/rai-predictive-modeling + /rai-predictive-training Can we forecast substation load growth over the next 24 months based on historical demand, planned generator additions, and the data center request pipeline? Bind each substation's predicted peak load back to the ontology so the rules engine and optimizer can read it.` - Response: `Substation.predicted_load` for all 12; DFW breaches at 1,700 MW vs 1,600 MW cap at 24 months (+54.6%). -### 2. Find structural bottlenecks +### 4. Find structural bottlenecks - Prompt: `/rai-graph-analysis Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology.` - Response: 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. -### 3. Screen DC requests +### 5. Screen DC requests - Prompt: `/rai-rules-authoring Screen each data center request against three criteria: (1) substation must have enough capacity after predicted load, (2) substation's low-carbon (renewable + nuclear) generation share must meet the DC's low-carbon requirement, (3) substation shouldn't be one of the top-3 structurally critical. Which requests pass all three?` - Response: `fails_capacity` / `fails_structural` / `fails_low_carbon` + `is_compliant`; 2 pass (Crusoe, Oracle), 8 flagged. -### 4. Approve DCs and fund upgrades +### 6. Approve DCs and fund upgrades - Prompt: `/rai-prescriptive-problem-formulation Decide which data center requests to approve and which substation upgrades to fund at $200M, $300M, $400M, $500M, and $600M investment levels. Maximize annual revenue. A request can only be approved if its substation has enough capacity after upgrades.` - Response: OPTIMAL MIP across 5 `InvestmentLevel` values in one solve; `x_approve` and `x_upgrade` written back per level. -### 5. Read the frontier +### 7. Read the frontier - Prompt: `/rai-prescriptive-results-interpretation Which data centers get approved, which upgrades are selected, and where's the biggest return on investment at each budget level?` - Response: Pareto frontier with knee at $300M (5 DCs, 1,500 MW, $264M net); marginal $995K/$M at knee, declining to $400K/$M by $600M; Google + Lambda never approved (DFW full). diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index 6a26b52..a48efa9 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -41,32 +41,37 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ## Workflow -### 0. Discovery +### 1. Build ontology + +- Prompt: `/rai-build-starter-ontology Build a manufacturing maintenance ontology from the CSVs in ../data/ covering machines, technicians, qualifications, periods, sensor readings, failure predictions, downtime events, and production runs.` +- Response: Concepts: `Machine`, `Technician`, `Qualification`, `Period`, `MachinePeriod`, `TechnicianPeriod`, `TechnicianMachinePeriod`, `Sensor`, `SensorReading`, `FailurePrediction`, `DowntimeEvent`, `ProductionRun`, `PartsInventory`, `TrainingOption` — bound to the bundled CSVs (30 machines × 3 plants, 10 technicians, 4 periods). + +### 2. Discovery - Prompt: `/rai-discovery We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?` - Response: Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. -### 1. Diagnose plant operations +### 3. Diagnose plant operations - Prompt: `/rai-querying What's the OEE by plant? Which machines have the most sensor anomalies, and which are most likely to fail by the end of the planning horizon?` - Response: Plant_C 79.8% > Plant_A 68.2% > Plant_B 61.4%; 7 of 9 anomalies at Plant_A; `MachinePeriod.predicted_fp` written for 120 rows. -### 2. Find scheduling bottlenecks +### 4. Find scheduling bottlenecks - Prompt: `/rai-graph-analysis Which machines share qualified technicians, and which are bottlenecks in the qualification network? Compute centrality and write it back to each machine so the optimizer can weight critical machines.` - Response: 30 machines → 1 connected component; Pumps tie at top betweenness (24.0 raw, 1.0 normalized); `Machine.betweenness` stored. -### 3. Classify machine risk +### 5. Classify machine risk - Prompt: `/rai-rules-authoring Rate each machine's risk: chronic if >8 downtime events, high-risk if failure prob >0.3 AND criticality 4+, plus overdue for maintenance. All three flags = Critical, two = Elevated, otherwise Standard.` - Response: 6 overdue, 1 high-risk, 3 chronic; M013 (Pump, Plant_A) = Critical; M016 (Turbine, Plant_A) = Elevated. -### 4. Schedule maintenance +### 6. Schedule maintenance - Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and each maintained machine needs a qualified technician. Minimize expected failure cost (weighted by criticality and centrality) plus labor and travel.` - Response: 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × parts_cost × criticality × (1 + 2.0 × betweenness)`. -### 5. Stress-test concentration +### 7. Stress-test concentration - Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation For each machine type, check whether all qualified technicians sit in one location and recommend the cheapest cross-training fix.` - Response: OPTIMAL · 20 jobs · $605,241; Turbine concentrated in Houston_TX (67% of jobs travel); cross-train T006 (Chicago_IL, Senior) for $3,200 / 5 weeks. diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index 85293ba..cc108ff 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -38,32 +38,37 @@ base at every lambda — without the cluster collapse, the gap would grow. ## Workflow -### 0. Discovery +### 1. Build ontology + +- Prompt: `/rai-build-starter-ontology Build a portfolio ontology from the CSVs in ../data/ covering stocks, sectors, the covariance matrix, accounts, holdings, users, and transactions.` +- Response: Concepts: `Stock`, `Sector`, `StockPair` (compound id on stock_i / stock_j with covariance), `User`, `Account`, `Holding`, `Transaction`, plus the `Scenario` Concept used by Stage 3 — bound to the bundled CSVs (8 stocks, 64 covariance entries, 6 scenarios). + +### 2. Discovery - Prompt: `/rai-discovery Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?` - Response: Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. -### 1. Compliance scan +### 3. Compliance scan - Prompt: `/rai-rules-authoring Flag any holding worth more than 15% of its account, any sector worth more than 30% of the account, and any user with a risk score above 0.8 and more than five flagged transactions.` - Response: 4 holdings flagged (AAPL/MSFT on Account 1, JNJ/PFE on Account 4); 2 (account, sector) pairs flagged (Account 1 Tech 34.0%, Account 4 Healthcare 32.2%); 2 users flagged (Alice Chen 0.85, Eve Taylor 0.92). -### 2. Cluster correlated bets +### 4. Cluster correlated bets - Prompt: `/rai-graph-analysis Cluster stocks by correlation — anything above 0.3 absolute is a redundant bet. Pick one representative per cluster (highest Sharpe ratio) and only invest in those.` - Response: 4 edges (|rho| >= 0.3), 5 Louvain clusters, intra +0.683 vs inter +0.131. 5 representatives picked: PFE, GOOGL, JPM, PG, XOM. AAPL/MSFT/JNJ flagged `is_non_representative`. -### 3. Solve mean-variance frontier +### 5. Solve mean-variance frontier - Prompt: `/rai-prescriptive-problem-formulation Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, only invest in cluster representatives. Show 7 points per frontier.` - Response: 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. -### 4. Read the frontier +### 6. Read the frontier - Prompt: `/rai-prescriptive-results-interpretation For each scenario, list the seven-point Pareto frontier and find the knee — where does the marginal risk per unit return jump the most?` - Response: base_500 frontier: returns 32.43 -> 40.28, risk 1160 -> 1742. Marginal `delta_risk/delta_return` jumps ~3x at eps_1 (knee). Same shape across all 6 scenarios — risk scales as budget^2, rate-form frontier is budget-independent. -### 5. Stress under crisis +### 7. Stress under crisis - Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Stress-test the frontier under crisis: shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product. How much volatility expansion at each frontier point?` - Response: Crisis vol 25-30% above base across the frontier (budget 500: min_risk 34.06 -> 43.74 at +28.4%, eps_1 34.30 -> 44.54 at +29.8% peak). Gap peaks mid-frontier (eps_1..eps_2 at +29.8%), narrows to +25.2% at eps_5 — the cluster-collapse payoff. diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 82bd56c..85f8901 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -35,32 +35,37 @@ watch->avoid downgrade = +0.0% (optimizer already routed around it). ## Workflow -### 0. Discovery +### 1. Build ontology + +- Prompt: `/rai-build-starter-ontology Build a supply chain ontology from the CSVs in ../data/ covering sites, businesses, SKUs, shipping operations, demand orders, historical shipments, and quarterly delay predictions.` +- Response: Concepts: `Site`, `Business`, `StockKeepingUnit`, `Operation`, `Demand`, `Shipment`, `Inventory`, `BillOfMaterial`, `DelayPrediction` — bound to the bundled CSVs (31 sites, 31 businesses, 9 SKUs, 262 shipments). + +### 2. Discovery - Prompt: `/rai-discovery We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?` - Response: Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). -### 1. Map upstream supplier exposure +### 3. Map upstream supplier exposure - Prompt: `/rai-graph-analysis If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores.` - Response: `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. -### 2. Rank network hubs +### 4. Rank network hubs - Prompt: `/rai-graph-analysis Which sites are the most influential hubs in the supply network — sites that connect to other influential sites, not just sites with many direct connections? Persist the centrality score back to each site so the optimizer can use it as a bottleneck weight.` - Response: `Site.centrality` normalized [0,1]: S004=1.000, S006=0.776, S003=0.735; 2 weakly-connected components. -### 3. Classify supplier reliability +### 5. Classify supplier reliability - Prompt: `/rai-rules-authoring Rate each supplier's delivery reliability. Flag any with reliability score below 0.80 as unreliable, any with a Q1 delay prediction above 0.15 as high-delay-risk, and call them 'watch-level' if either fires. Suppliers with **both** flags are 'avoid' (hard-blocked downstream); suppliers with **either** flag are 'watch' (surcharged).` - Response: `is_unreliable` (1: B017), `has_high_delay_risk` (2: B003, B017), `is_watch_level` (2), `Demand.is_escalated` (9). -### 4. Solve risk-adjusted flow +### 6. Solve risk-adjusted flow - Prompt: `/rai-prescriptive-problem-formulation Find the minimum-cost shipping plan that fulfills all open demand. Don't ship from 'avoid' suppliers at all, add a $5/unit surcharge for 'watch' suppliers, prefer non-bottleneck sites, and charge $100/unit for any unmet demand.` - Response: MILP on `Operation.x_flow` + `Demand.x_unmet`; objective = transport + risk surcharge + centrality weight + unmet penalty. -### 5. Quantify disruption scenarios +### 7. Quantify disruption scenarios - Prompt: `/rai-prescriptive-solver-management + /rai-prescriptive-results-interpretation Re-solve with the highest-centrality site offline, and again with watch-level suppliers downgraded to avoid. What's the cost delta in each, and why are they asymmetric?` - Response: Baseline OPTIMAL $1,865 / 8 flows / 0 unmet; S004 offline +88.5%; watch->avoid +0.0% (B003 already off optimal lanes). diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 9288476..9504ffa 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -37,37 +37,42 @@ across all 15 critical towers, prioritized by social blast radius. ## Workflow -### 0. Discovery +### 1. Build ontology + +- Prompt: `/rai-build-starter-ontology Build a telco network ontology from the CSVs in ../data/ covering cell towers, subscribers, call records, equipment health, network performance, time-series KPIs, and tower upgrade options.` +- Response: Concepts: `CellTower`, `Subscriber`, `Contract`, `BillingEvent`, `CallDetailRecord`, `NetworkEquipment`, `EquipmentHealth`, `NetworkEvent`, `NetworkPerformance`, `TimeSeriesMetric`, `RevenueForecast`, `TowerUpgradeOption`, `Part`, `SupplierOrder`, `PostalArea`, `Campaign`, `PromotionRedemption` — bound to the bundled CSVs. + +### 2. Discovery - Prompt: `/rai-discovery WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?` - Response: Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). -### 1. Diagnose WEST +### 3. Diagnose WEST - Prompt: `/rai-querying Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?` - Response: WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. -### 2. Flag critical-restore towers +### 4. Flag critical-restore towers - Prompt: `/rai-rules-authoring Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85.` - Response: `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. -### 3. Score subscriber blast radius +### 5. Score subscriber blast radius - Prompt: `/rai-graph-analysis Who are our most socially influential subscribers based on call patterns? For each critical-restore tower, count the distinct subscribers whose calls route through it and rank by total PageRank influence — that's the blast radius if it fails.` - Response: `Subscriber.influence_score` (PageRank) on all 1,200 subs; `CellTower.weighted_impact` on 15 critical towers; 404 distinct subs (33% of base) route through a critical tower; TWR-0014 has the largest footprint (61 subs, 0.0502). -### 4. Forecast regional demand +### 6. Forecast regional demand - Prompt: `/rai-predictive-modeling + /rai-predictive-training Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier.` - Response: GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.9998× (flat/slightly contracting); 8 other regions +0.45 to +0.91%/day; written to `CellTower.projected_demand_growth` for all 250 towers via region join. -### 5. Optimize tier selection +### 7. Optimize tier selection - Prompt: `/rai-prescriptive-problem-formulation Recover WEST capacity within $5M and 200 install-weeks. For each critical-restore tower, pick at most one upgrade tier (BRONZE, SILVER, or GOLD) to maximize capacity restored, weighted by each tower's blast radius and the regional demand forecast.` - Response: OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. -### 6. Interpret the plan +### 8. Interpret the plan - Prompt: `/rai-prescriptive-results-interpretation Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?` - Response: Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. From c9984fff0b7451ba9e3bb3d9397bb5da7eb4c3f5 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 10:03:54 -0700 Subject: [PATCH 11/22] Audit-pass corrections + rename Discovery step MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 5-runbook audit against template scripts and READMEs. Telco: fix concept list to the 9 the script defines (drop Contract/BillingEvent/etc. that are not in the script, add RegionMetric and TemporalEdge); sharpen Steps 2,3,4,6,7,8 prompts and responses; correct TWR-0009 BRONZE->GOLD delta to +5 Gbps (BRONZE=3, GOLD=8). Energy: sharpen Stage 4 graph prompt to ask for WCC + Louvain + centrality (script computes all three); reword "structurally constrained bottleneck" to clarify DFW is the binding capacity bottleneck specifically. Supply chain: fix concept list to actual 7 (drop StockKeepingUnit /Inventory/BillOfMaterial that aren't Concepts, rename to SKU); sharpen Step 2 discovery prompt + response to enumerate the 5 chained reasoning steps; name the 6 SUPPLIER-typed upstream nodes in Step 3. Machine maintenance: fix concept list (drop TrainingOption — used as DataFrame, not Concept; add CertificationExpiry); correct x_assigned binary count from ~250 to 384 (96 qualified pairs x 4 periods). Portfolio: drop fictitious StockPair Concept (script uses binary property Stock.covar(Stock,Stock) instead); add Regime to the Stage 5 Concept callout; enumerate the 6 actual constraint families. Step rename: '### 2. Discovery' -> '### 2. Discover reasoner questions' across all five runbooks for clearer step labelling. --- v1/energy_grid_planning/references/runbook.md | 8 ++-- v1/machine_maintenance/references/runbook.md | 8 ++-- v1/portfolio_balancing/references/runbook.md | 6 +-- .../references/runbook.md | 10 ++--- .../references/runbook.md | 38 +++++++++---------- 5 files changed, 35 insertions(+), 35 deletions(-) diff --git a/v1/energy_grid_planning/references/runbook.md b/v1/energy_grid_planning/references/runbook.md index d4615f5..9a5d5ba 100644 --- a/v1/energy_grid_planning/references/runbook.md +++ b/v1/energy_grid_planning/references/runbook.md @@ -6,7 +6,7 @@ ERCOT processes 10 hyperscaler interconnection requests (2,930 MW) against a 12- ``` ERCOT has 10 hyperscaler interconnection requests totalling 2,930 MW -on a 12-substation grid with one structurally constrained bottleneck (DFW). +on a 12-substation grid where DFW is the binding capacity bottleneck. The chain produces a Pareto frontier across 5 budget levels — the knee at $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. @@ -43,7 +43,7 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. - Prompt: `/rai-build-starter-ontology Build an ontology for grid infrastructure planning from the CSVs in ../data/ covering substations, generators, transmission lines, demand forecasts, data center requests, and substation upgrades.` - Response: Concepts: `Substation`, `Generator`, `TransmissionLine`, `LoadZone`, `DemandPeriod`, `RenewableProfile`, `MaintenanceWindow`, `Customer`, `DataCenterRequest`, `SubstationUpgrade`, `DemandForecast`, `LoadHistory`, `DCAnnouncement` — bound to the bundled CSVs (12 substations, 10 DC requests, 18 transmission lines). -### 2. Discovery +### 2. Discover reasoner questions - Prompt: `/rai-discovery We have 10 hyperscaler interconnection requests against a 12-substation grid. Which to approve, which substation upgrades to fund, at what budget level?` - Response: Plan routing sub-questions to predictive, graph, rules, and prescriptive reasoners. @@ -55,8 +55,8 @@ $300M unlocks 5 DCs (1,500 MW, $264M net value) including xAI Colossus. ### 4. Find structural bottlenecks -- Prompt: `/rai-graph-analysis Which substations are most critical to power flow based on grid topology? Use centrality on the transmission graph, then flag the top 3 as structurally critical and persist the scores back to the ontology.` -- Response: 1 component, 3 Louvain communities; DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests on critical nodes. +- Prompt: `/rai-graph-analysis Which substations are most critical to power flow based on grid topology? Check connectivity (WCC), regional structure (Louvain communities), and centrality (betweenness/degree/eigenvector); then flag the top 3 by combined centrality rank as structurally critical and persist the scores back to the ontology.` +- Response: 1 connected component, 3 Louvain communities (North Texas, West Texas, Gulf Coast); DFW, Houston, San Antonio flagged `is_structurally_critical`; 7 of 10 DC requests target critical nodes. ### 5. Screen DC requests diff --git a/v1/machine_maintenance/references/runbook.md b/v1/machine_maintenance/references/runbook.md index a48efa9..b42c9f4 100644 --- a/v1/machine_maintenance/references/runbook.md +++ b/v1/machine_maintenance/references/runbook.md @@ -43,10 +43,10 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ### 1. Build ontology -- Prompt: `/rai-build-starter-ontology Build a manufacturing maintenance ontology from the CSVs in ../data/ covering machines, technicians, qualifications, periods, sensor readings, failure predictions, downtime events, and production runs.` -- Response: Concepts: `Machine`, `Technician`, `Qualification`, `Period`, `MachinePeriod`, `TechnicianPeriod`, `TechnicianMachinePeriod`, `Sensor`, `SensorReading`, `FailurePrediction`, `DowntimeEvent`, `ProductionRun`, `PartsInventory`, `TrainingOption` — bound to the bundled CSVs (30 machines × 3 plants, 10 technicians, 4 periods). +- Prompt: `/rai-build-starter-ontology Build a manufacturing maintenance ontology from the CSVs in ../data/ covering machines, technicians, qualifications, periods, sensor readings, failure predictions, downtime events, production runs, parts inventory, and certification expiry.` +- Response: Concepts: `Machine`, `Technician`, `Qualification`, `Period`, `MachinePeriod`, `TechnicianPeriod`, `TechnicianMachinePeriod`, `Sensor`, `SensorReading`, `FailurePrediction`, `DowntimeEvent`, `ProductionRun`, `PartsInventory`, `CertificationExpiry` — bound to the bundled CSVs (30 machines × 3 plants, 10 technicians, 4 periods). `training_options.csv` is loaded as a DataFrame (read in Stage 4), not modeled as a concept. -### 2. Discovery +### 2. Discover reasoner questions - Prompt: `/rai-discovery We need to schedule preventive maintenance for 30 machines across 3 plants. Where does OEE alone mislead us, and what structural risks won't a pure optimizer surface?` - Response: Plan routing sub-questions to querying, graph, rules, prescriptive, and resilience skills. @@ -69,7 +69,7 @@ all 3 Turbine techs sit in one city, a $3,200 fix away from resolved. ### 6. Schedule maintenance - Prompt: `/rai-prescriptive-problem-formulation Schedule preventive maintenance for all 30 machines across 4 periods, capped at 5 jobs per period. Every overdue machine gets maintained by period 2, and each maintained machine needs a qualified technician. Minimize expected failure cost (weighted by criticality and centrality) plus labor and travel.` -- Response: 120 `x_maintain` + 120 `x_vulnerable` + ~250 `x_assigned` binaries; 5 constraint families; failure cost uses `predicted_fp × parts_cost × criticality × (1 + 2.0 × betweenness)`. +- Response: 120 `x_maintain` + 120 `x_vulnerable` + 384 `x_assigned` binaries (96 qualified tech×machine pairs × 4 periods); 5 constraint families (cumulative coverage, assignment-maintenance linkage, technician hours, parts/bay capacity, overdue deadline); failure cost uses `x_vulnerable × predicted_fp × parts_cost × criticality × (1 + 2.0 × betweenness)`. ### 7. Stress-test concentration diff --git a/v1/portfolio_balancing/references/runbook.md b/v1/portfolio_balancing/references/runbook.md index cc108ff..13df481 100644 --- a/v1/portfolio_balancing/references/runbook.md +++ b/v1/portfolio_balancing/references/runbook.md @@ -41,9 +41,9 @@ base at every lambda — without the cluster collapse, the gap would grow. ### 1. Build ontology - Prompt: `/rai-build-starter-ontology Build a portfolio ontology from the CSVs in ../data/ covering stocks, sectors, the covariance matrix, accounts, holdings, users, and transactions.` -- Response: Concepts: `Stock`, `Sector`, `StockPair` (compound id on stock_i / stock_j with covariance), `User`, `Account`, `Holding`, `Transaction`, plus the `Scenario` Concept used by Stage 3 — bound to the bundled CSVs (8 stocks, 64 covariance entries, 6 scenarios). +- Response: Concepts: `Stock` (with binary `Stock.covar(Stock, Stock)` property carrying covariance), `Sector`, `User`, `Account`, `Holding`, `Transaction` — bound to the bundled CSVs (8 stocks, 64 covariance entries). Stage 3 adds the `Regime` and `Scenario` Concepts (2 regimes x 3 budgets = 6 scenarios). -### 2. Discovery +### 2. Discover reasoner questions - Prompt: `/rai-discovery Our 8-stock book breaks compliance and concentrates risk. Rebuild it under Markowitz mean-variance with caps, deduplicate redundant bets via correlation clustering, and stress-test under crisis. What questions does each reasoner family handle?` - Response: Plan: rules for compliance flags, graph for correlation clustering + representatives, prescriptive QP indexed by Scenario, stress as regime-swap re-solve. @@ -61,7 +61,7 @@ base at every lambda — without the cluster collapse, the gap would grow. ### 5. Solve mean-variance frontier - Prompt: `/rai-prescriptive-problem-formulation Build a Markowitz mean-variance frontier across 6 scenarios = 3 budgets x 2 regimes. Position cap 30% of budget, sector cap 30%, only invest in cluster representatives. Show 7 points per frontier.` -- Response: 48 decision vars (8 stocks x 6 scenarios), 5 constraint families. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. +- Response: 48 decision vars (`Stock.x_quantity`, 8 stocks x 6 scenarios; non-reps forced to 0). Constraint families: non-negativity, budget equality (sum = budget per scenario), position cap (30%), sector cap (30%), non-representative = 0, plus epsilon return-rate floor on sweep solves. Return-rate range [0.0634, 0.0840]. 7 solves x 6 scenarios = 42 `LOCALLY_SOLVED` portfolios via Ipopt. ### 6. Read the frontier diff --git a/v1/supply_chain_resilience/references/runbook.md b/v1/supply_chain_resilience/references/runbook.md index 85f8901..66e583d 100644 --- a/v1/supply_chain_resilience/references/runbook.md +++ b/v1/supply_chain_resilience/references/runbook.md @@ -38,17 +38,17 @@ watch->avoid downgrade = +0.0% (optimizer already routed around it). ### 1. Build ontology - Prompt: `/rai-build-starter-ontology Build a supply chain ontology from the CSVs in ../data/ covering sites, businesses, SKUs, shipping operations, demand orders, historical shipments, and quarterly delay predictions.` -- Response: Concepts: `Site`, `Business`, `StockKeepingUnit`, `Operation`, `Demand`, `Shipment`, `Inventory`, `BillOfMaterial`, `DelayPrediction` — bound to the bundled CSVs (31 sites, 31 businesses, 9 SKUs, 262 shipments). +- Response: Concepts: `Site`, `Business`, `SKU`, `Operation`, `Demand`, `Shipment`, `DelayPrediction` — bound to the bundled CSVs (31 sites, 31 businesses, 9 SKUs, 70 operations, 20 demands, 262 shipments, 36 delay predictions). -### 2. Discovery +### 2. Discover reasoner questions -- Prompt: `/rai-discovery We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, and what does the minimum-cost flow look like once those risks are priced in?` -- Response: Reasoner-routing plan covering Stages 0–3 (graph, rules, prescriptive). +- Prompt: `/rai-discovery We need a risk-adjusted routing plan. What's our exposure to each supplier, which sites are bottlenecks, which suppliers are unreliable, what does the minimum-cost flow look like once those risks are priced in, and how robust is that plan to disruptions?` +- Response: Reasoner-routing plan: (1) Graph reachability for upstream supplier exposure, (2) Graph centrality for hub identification, (3) Rules for supplier reliability classification, (4) Prescriptive MILP for risk-adjusted flow, (5) Scenario re-solves for disruption quantification. ### 3. Map upstream supplier exposure - Prompt: `/rai-graph-analysis If a key supplier goes offline, which downstream buyers and finished products are at risk? For each HIGH-priority customer, list the suppliers it transitively depends on through the shipment graph, with their reliability scores.` -- Response: `Business.is_high_priority_customer` (2 buyers: B008, B009); shared 6-supplier upstream cone. +- Response: `Business.is_high_priority_customer` (2 buyers: B008 MegaCorp Enterprise, B009 TechGiant Inc); each transitively depends on the same 6 SUPPLIER-typed upstream nodes (B015, B016, B017, B018, B019, B020). ### 4. Rank network hubs diff --git a/v1/telco_network_recovery/references/runbook.md b/v1/telco_network_recovery/references/runbook.md index 9504ffa..02e85b9 100644 --- a/v1/telco_network_recovery/references/runbook.md +++ b/v1/telco_network_recovery/references/runbook.md @@ -1,19 +1,19 @@ # Runbook: Telco WEST Recovery — Multi-Reasoner Walkthrough -A regional telco is bleeding $791K/quarter from WEST while every other region grows. No single reasoner can answer where to spend a $5M recovery budget: descriptive scopes the crisis, rules flag broken towers, graph weights them by social blast radius, predictive forecasts forward demand, and prescriptive composes all four signals into the upgrade plan. Each stage writes derived properties back to the same ontology that downstream stages consume. +WEST revenue collapsed ~29% in Q4 2024 (a ~$2.7M shortfall vs the other-regions average) while every other region held flat or grew. No single reasoner can answer where to spend a $5M recovery budget: descriptive scopes the crisis, rules flag broken towers, graph weights them by social blast radius, predictive forecasts forward demand, and prescriptive composes all four signals into the upgrade plan. Each stage writes derived properties back to the same ontology that downstream stages consume. ## The chain ``` -WEST is bleeding $791K/quarter from a network operations crisis. +WEST Q4 revenue is down ~29% (~$2.7M gap vs the other-regions avg). The chain produces a $5M plan that recovers 122 Gbps capacity across all 15 critical towers, prioritized by social blast radius. ───────────────────────────────────────────────────────────────── - STAGE 1 Descriptive ──► WEST: Q3-Q4 revenue −22% to −26%, + STAGE 1 Descriptive ──► WEST: Q4 revenue −29% vs H1 baseline, avail 94.6 vs 99.5, 15 of 81 DEGRADED. - Retention angle? No — 0 high-risk - subs; this is operational. + Retention angle? No — this is + operational, not subscriber churn. ───────────────────────────────────────────────────────────────── STAGE 2 Rules ──► CellTower.is_critical_restore (15) 4 derived health metrics + a compound @@ -39,23 +39,23 @@ across all 15 critical towers, prioritized by social blast radius. ### 1. Build ontology -- Prompt: `/rai-build-starter-ontology Build a telco network ontology from the CSVs in ../data/ covering cell towers, subscribers, call records, equipment health, network performance, time-series KPIs, and tower upgrade options.` -- Response: Concepts: `CellTower`, `Subscriber`, `Contract`, `BillingEvent`, `CallDetailRecord`, `NetworkEquipment`, `EquipmentHealth`, `NetworkEvent`, `NetworkPerformance`, `TimeSeriesMetric`, `RevenueForecast`, `TowerUpgradeOption`, `Part`, `SupplierOrder`, `PostalArea`, `Campaign`, `PromotionRedemption` — bound to the bundled CSVs. +- Prompt: `/rai-build-starter-ontology Build a telco network ontology from the eight CSVs in ../data/: cell_towers, network_equipment, equipment_health, network_performance, subscribers, call_detail_records, tower_upgrade_options, time_series_metrics. The time-series file has one row per (date, region) — model that as a composite-key concept and add a same-region 1-day-lag edge concept to support temporal GNN message passing downstream.` +- Response: Concepts: `CellTower`, `NetworkEquipment`, `EquipmentHealth`, `NetworkPerformance`, `Subscriber`, `CallDetailRecord` (edge concept: caller → callee, routed_through tower), `TowerUpgradeOption` (composite key tower_id+tier), `RegionMetric` (composite key metric_date+region), `TemporalEdge` (composite key src_date+src_region+dst_date+dst_region) — all bound to the bundled CSVs. -### 2. Discovery +### 2. Discover reasoner questions -- Prompt: `/rai-discovery WEST is missing revenue while every other region grows. What questions do we need to answer to figure out where to spend $5M to fix it?` -- Response: Routes sub-questions to descriptive (revenue diagnosis), rules (critical-tower flag), graph (PageRank blast radius), predictive (demand forecast), prescriptive (tier-selection MIP + post-solve interpretation). +- Prompt: `/rai-discovery WEST is missing revenue targets while every other region grows. We have a $5M capex budget and 200 install crew-weeks to allocate to tower upgrades. Which RAI reasoners do we need, in what order, to land on a defensible upgrade plan grounded in the available data (towers, subscribers, calls, equipment health, performance, daily KPIs, and tiered upgrade options)?` +- Response: Plans the 4-reasoner chain on the shared ontology — descriptive (`/rai-querying`) to scope the WEST crisis and rule out a retention angle; rules (`/rai-rules-authoring`) to flag critical-restore towers; graph (`/rai-graph-analysis`) to score subscriber influence and aggregate per-tower blast radius; predictive (`/rai-predictive-modeling` + `/rai-predictive-training`) to forecast per-region growth and bind it as a per-tower demand multiplier; prescriptive (`/rai-prescriptive-problem-formulation` + `/rai-prescriptive-results-interpretation`) to compose all three signals into the tier-selection MIP and explain the binding constraint. ### 3. Diagnose WEST -- Prompt: `/rai-querying Where are we missing revenue targets? Which 10 cell towers have the worst average packet loss over 2024, and which region has the worst Q4 network availability?` -- Response: WEST avail 94.6 vs 99.5 elsewhere; Q3-Q4 revenue −22% to −26% ($791K gap); 15 of 81 WEST towers DEGRADED at 8.1–8.9% packet loss; zero high-risk subs — operational, not retention. +- Prompt: `/rai-querying Compare quarterly DAILY_REVENUE_USD by region. Which region has the worst Q4 2024 network availability? Show the WEST cell tower fleet broken down by status, and the average packet loss for the DEGRADED ones.` +- Response: WEST Q4 avail 94.6% vs 99.5% in every other region; WEST Q4 revenue $6.6M vs ~$9.0–9.5M everywhere else (≈$2.7M Q4 deficit, −29% vs WEST's own H1 baseline); 81 WEST towers split into 49 ACTIVE / 17 MAINTENANCE / 15 DEGRADED, with the 15 DEGRADED towers averaging 7.6–10.3% packet loss (median ~8.2%). Subscriber-churn signals stay flat — this is an operational network failure, not retention. ### 4. Flag critical-restore towers -- Prompt: `/rai-rules-authoring Flag CellTowers as 'critical-restore' if region is WEST AND status is DEGRADED AND avg equipment health is below 0.85, OR if avg packet loss > 5% with health below 0.85.` -- Response: `CellTower.is_critical_restore` fires on 15 WEST DEGRADED towers; 4 derived health metrics (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) written to all 250 towers. +- Prompt: `/rai-rules-authoring First derive per-tower averages for packet loss, latency, error rate (from NetworkPerformance) and average equipment health (via NetworkEquipment → EquipmentHealth). Then flag CellTower.is_critical_restore on either of two branches: (1) region == WEST AND status == DEGRADED AND avg_health_score < 0.85, OR (2) region == WEST AND avg_packet_loss > 5% AND avg_health_score < 0.85 (catches ACTIVE-but-failing).` +- Response: 4 derived health properties (`avg_packet_loss`, `avg_latency_ms`, `avg_error_rate`, `avg_health_score`) computed for all 250 towers via `aggs.avg(...).per(CellTower)`. The two-branch `CellTower.is_critical_restore` relationship fires on 15 towers — all 15 are WEST + DEGRADED + health < 0.85, so Branch 1 alone produces the same set, but Branch 2 is kept as a guard against ACTIVE-but-failing failure modes. ### 5. Score subscriber blast radius @@ -64,18 +64,18 @@ across all 15 critical towers, prioritized by social blast radius. ### 6. Forecast regional demand -- Prompt: `/rai-predictive-modeling + /rai-predictive-training Predict next-quarter subscriber-growth-rate per region using TimeSeriesMetric history. Bind each region's forecast back to its towers as a demand multiplier.` -- Response: GNN node regression on 365d × 9 regions with 1-day-lag temporal edges; WEST multiplier 0.9998× (flat/slightly contracting); 8 other regions +0.45 to +0.91%/day; written to `CellTower.projected_demand_growth` for all 250 towers via region join. +- Prompt: `/rai-predictive-modeling + /rai-predictive-training Train a regression GNN on RegionMetric (one row per date+region) to predict next-quarter SUBSCRIBER_GROWTH_RATE per region. Use TemporalEdge (same-region 1-day lag) for message passing, region as a category feature, and lag features (prev-day, prev-week, 7-day mean) as continuous inputs. Train < 2024-11-01, validate on Nov, test on Dec. Mean each region's Dec predictions, convert to 1+x multiplier, and bind back to CellTower.projected_demand_growth via region.` +- Response: GNN node regression on 365d × 9 regions with same-region 1-day-lag temporal edges; per-region mean of the Dec test predictions yields WEST multiplier ≈0.9998× (flat/slightly contracting) while the 8 other regions sit at +0.45% to +0.91%/day. The multiplier is loaded into a `RegionGrowth` concept and joined to `CellTower.projected_demand_growth` via region — populating all 250 towers (CellTower covers 5 regions; the other 4 RegionMetric regions are forecast but have no towers to bind to). ### 7. Optimize tier selection -- Prompt: `/rai-prescriptive-problem-formulation Recover WEST capacity within $5M and 200 install-weeks. For each critical-restore tower, pick at most one upgrade tier (BRONZE, SILVER, or GOLD) to maximize capacity restored, weighted by each tower's blast radius and the regional demand forecast.` -- Response: OPTIMAL · 12 GOLD / 2 SILVER / 1 BRONZE · 122 Gbps restored · $4.96M of $5M (binding) · 164 of 200 install-weeks (slack) · all 15 towers covered. +- Prompt: `/rai-prescriptive-problem-formulation Build a tower-upgrade MIP scoped to options where TowerUpgradeOption.for_tower(CellTower) AND CellTower.is_critical_restore(). Decision variable TowerUpgradeOption.selected is binary, keyed by (tower_id, tier). Constraints: at most one tier per tower, total cost ≤ $5M, total install_weeks ≤ 200. Maximize sum(selected · capacity_increase_gbps · CellTower.weighted_impact · CellTower.projected_demand_growth) — three coefficients, one from each upstream stage.` +- Response: Status OPTIMAL with all 15 critical towers covered (one tier each). Tier mix: 12 GOLD / 2 SILVER / 1 BRONZE. Total capacity restored 122 Gbps. Total cost $4,956,843 of the $5M budget (binding). Total install crew-weeks 164 of 200 (slack). The tier mix skews toward GOLD because the per-Gbps cost on GOLD is competitive once it is multiplied by `weighted_impact` and `projected_demand_growth` in the objective. ### 8. Interpret the plan - Prompt: `/rai-prescriptive-results-interpretation Summarize the plan: total cost, capacity restored, tier mix, towers covered. Which constraint is binding, and what would relaxing it by 10-20% unlock?` -- Response: Budget binds at $4.96M/$5M; flexing to $6M would promote TWR-0009 BRONZE→GOLD (+9 Gbps); install-weeks have 36-week slack; 404 service-affected subs drop to ~0 over the 4-month rollout. +- Response: Budget binds at $4.96M/$5M (only $43K of headroom); flexing the budget to $6M unlocks the TWR-0009 BRONZE→GOLD swap (+5 Gbps for ~$395K incremental cost). Install-weeks have 36 weeks of slack (164/200) so crew capacity is not the bottleneck. All 15 critical towers are covered, so the 404 service-affected subscribers identified by the graph stage are addressed within the rollout window. ## Data From c20345871c6b5e999363966f71c0143268623791 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 6 May 2026 10:45:20 -0700 Subject: [PATCH 12/22] Move runbook.md to template root Runbooks live alongside the template script, README, and data/ directory now (was under references/). Updated relative paths inside each runbook from ../data/ -> data/ and ../