PlanExeOrg · 82deutschmark · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026
diff --git a/docs/proposals/1-triage.md b/docs/proposals/1-triage.md
@@ -0,0 +1,74 @@
+---
+title: PlanExe Proposal Triage — 80/20 Landscape
+date: 2026-02-25
+status: working note
+author: Egon + Larry
+---
+
+# Overview
+
+Simon asked us to triage the proposal space with an 80:20 lens. The goal of this note is to capture:
+1. Which proposals deliver outsized value (the 20% that unlock 80% of the architecture)
+2. Which other proposals are nearby in the graph and could reuse their artifacts or reasoning
+3. High-leverage parameter tweaks, code tweaks, and second/third order effects
+4. Gaps in the current docs and ideas for new proposals
+5. Relevant questions/tasks you might not have asked yet
+
+We focused on the most recent proposals ("67+" cluster) plus the ones directly touching the validation/orchestration story that FermiSanityCheck will unlock.
+
+# High-Leverage Proposals (the 20%)
+
+1. **#07 Elo Ranking System (1,751 lines)** – Core ranking mechanism for comparing idea variants, plan quality, and post-plan summaries. Louis-level heuristics here inform nearly every downstream comparison use case.
+2. **#63 Luigi Agent Integration & #64 Post-plan Orchestration Layer** – These three documents (#63, #64, #66) describe how PlanExe schedules, retries, and enriches its Luigi DAG. Any change to the DAG (including FermiSanityCheck or arcgentica-style loops) ripples through this cluster.
+3. **#62 Agent-first Frontend Discoverability (609 lines)** – Defines the agent UX, which depends on the scoring/ranking engine (#07) and the reliability signals that our validation cluster will provide.
+4. **#69 Arcgentica Agent Patterns (279 lines)** – The arcgentica comparison is already referencing our validation work and sets the guardrails for self-evaluation/soft-autonomy.
+5. **#41 Autonomous Execution of Plan & #05 Semantic Plan Search Graph** – These represent core system-level capabilities (distributed execution and semantic search) whose outputs feed the ranking and reporting layers.
+
+These documents together unlock most of the architectural work. They interlock around: planning quality signals (#07, #69, Fermi), orchestration (#63, #64, #66), and the interfaces (#62, #41, #05).
+
+# Related Proposals & Reuse Opportunities
+
+- **#07 Elo Ranking + #62 Agent-first Frontend** can share heuristics. Instead of reinventing ranking weights in #62, reuse the cost/feasibility tradeoffs defined in #07 plus FermiSanityCheck flags as features.
+- **#63-66 orchestration cluster** already describe Luigi tasks. The validation loop doc should be cross-referenced there to show where FermiSanityCheck sits in the DAG and how downstream tasks like WBS, Scheduler, ExpertOrchestrator should consume the validation report.
+- **#69 + #56 (Adversarial Red Team) + #43 (Assumption Drift Monitor)** form a validation cluster. FermiSanityCheck is the front line; these others are observers (red team, drift monitor) that should consume the validation report and escalate to human review.
+- **#32 Gantt Parallelization & #33 CBS** could re-use the same thresholds as FermiSanityCheck when calculating duration plausibility (e.g., if duration falls outside the published feasible range, highlight the same issue in the Gantt UI).
+
+# 80:20 Tweaks & Parameter Changes
+
+- **Ranking weights (#07)** – adjust cost vs. feasibility vs. confidence to surface plans that pass quantitative grounding. No rewrite needed; just new weights (e.g., penalize plans where FermiSanityCheck flags >3 assumptions).
+- **Batch size thresholds (#63)** – the Luigi DAG currently runs every task. We can gate the WBS tasks with a flag that only fires if FermiSanityCheck passes or fails softly, enabling a smaller workflow for low-risk inputs without re-architecting.
+- **Risk terminology alignment (#38 & #44)** – harmonize the words used in the risk propagation network and investor audit pack so they can share visualization tooling, reducing duplicate explanations.
+
+# Second/Third Order Effects
+
+- **Validation loop → downstream trust**: Once FermiSanityCheck is in place, client reports (e.g., #60 plan-to-repo, #41 autonomous execution) can annotate numbers with the validation status, reducing rework.
+- **Arcgentica/agent patterns**: Hardening PlanExe encourages stricter typed outputs (#69). This lets the UI (#08) and ranking engine (#07) rely on structured data instead of parsing Markdown.
+- **Quantitative grounding improves ranking** (#07, #62) which in turn makes downstream dashboards (#60, #62) more actionable and reduces QA overhead.
+- **Clustering proposals** (#63-66, #69, #56) around validation/orchestration helps the next human reviewer (Simon) make a single decision that affects multiple docs.
+
+# Gaps & Future Proposal Ideas
+
+- **FermiSanityCheck Implementation Roadmap** – Document how MakeAssumptions output becomes QuantifiedAssumption, where heuristics live, and how Luigi tasks consume the validation_report. (We have the spec in `planexe-validation-loop-spec.md` but not a public proposal yet.)
+- **Validation Observability Dashboard** – A proposal capturing how the validation report is surfaced to humans (per #44, #60). Could cover alerts (Slack/Discord) when FermiSanityCheck fails or when repeated fails accumulate.
+- **Arbitration Workflow** – When FermiSanityCheck fails and ReviewPlan still thinks the plan is OK, we need a human-in-the-loop workflow. This is not yet documented anywhere.
+
+# Questions You Might Not Be Asking
+
+1. What are the acceptance criteria for FermiSanityCheck? (confidence levels, heuristics, why 100× spans?)
+2. Who owns the validation report downstream? Should ExpertOrchestrator or Governance phases be responsible for acting on it?
+3. Does FermiSanityCheck expire per run or is it stored for audit trails (per #42 evidence traceability)?
+4. Can we reuse the same heuristics for other tasks (#32 Gantt, #34 finance) to maximize payoff?
+5. How do we rank the outputs once FermiSanityCheck is added? Should ranking (#07) penalize low confidence even if the costs look good?
+6. Do we need a battle plan for manual overrides when FermiSanityCheck is overzealous (e.g., ROI assumptions where domain experts know the average is >100×)?
+
+# Tasks We Can Own Now
+
+- Extract the QuantifiedAssumption schema (claim, lower_bound, upper_bound, unit, confidence, evidence) and add it to PlanExe’s assumption bundle.
+- Implement a FermiSanityCheck Luigi task that runs immediately after MakeAssumptions and produces validation_report.json.
+- Hook the validation report into DistillAssumptions / ReviewAssumptions by adding a `validation_passed` flag.
+- Update #69 and #56 docs with references to the validation report to keep the narrative cohesive.
+- Create the proposed dashboard proposal (validation observability) to track how many plans fail numeric sanity each week.
+
+# Summary
+
+The high-leverage 20% of proposals are: ranking (#07), orchestration (#63-66), UI (#62), arcgentica patterns (#69), and autonomous execution/search (#41, #05). We can activate them by implementing FermiSanityCheck, aligning their heuristics, and surfacing the new validation signals in the UI/dashboards. The docs already cover most of the research; now we need a short, focused proposal/clustering doc (this one) plus the Fermi implementation and dashboards. After Simon approves, we can execute the chosen cluster.
diff --git a/worker_plan/worker_plan_api/filenames.py b/worker_plan/worker_plan_api/filenames.py
@@ -37,6 +37,8 @@ class FilenameEnum(str, Enum):
     REVIEW_ASSUMPTIONS_MARKDOWN = "003-9-review_assumptions.md"
     CONSOLIDATE_ASSUMPTIONS_FULL_MARKDOWN = "003-10-consolidate_assumptions_full.md"
     CONSOLIDATE_ASSUMPTIONS_SHORT_MARKDOWN = "003-11-consolidate_assumptions_short.md"
+    FERMI_SANITY_CHECK_REPORT = "003-12-fermi_sanity_check_report.json"
+    FERMI_SANITY_CHECK_SUMMARY = "003-13-fermi_sanity_check_summary.md"
     PRE_PROJECT_ASSESSMENT_RAW = "004-1-pre_project_assessment_raw.json"
     PRE_PROJECT_ASSESSMENT = "004-2-pre_project_assessment.json"
     PROJECT_PLAN_RAW = "005-1-project_plan_raw.json"

diff --git a/worker_plan/worker_plan_internal/assume/domain_normalizer.py b/worker_plan/worker_plan_internal/assume/domain_normalizer.py
@@ -0,0 +1,284 @@
+"""
+Author: Larry (Claude Opus 4.6)
+Date: 2026-02-25
+PURPOSE: Domain-aware normalization for FermiSanityCheck. Loads domain profiles (YAML),
+auto-detects project domain from assumptions, and normalizes currency/units/confidence
+to standard metric/English output for AI agents.
+SRP/DRY check: Pass - Consumes QuantifiedAssumption schema + domain profile YAML.
+Outputs normalized assumptions ready for validation.
+"""
+
+import logging
+import yaml
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import List, Optional, Dict, Any
+from pathlib import Path
+
+from worker_plan_internal.assume.quantified_assumptions import (
+    QuantifiedAssumption,
+    ConfidenceLevel,
+)
+
+LOGGER = logging.getLogger(__name__)
+
+# Find domain profiles YAML
+DOMAIN_PROFILES_PATH = Path(__file__).parent.parent / "docs" / "domain-profiles" / "domain-profile-schema.md"
+
+
+class DomainProfile:
+    """Represents a single domain profile (carpenter, dentist, etc.)"""
+
+    def __init__(self, profile_dict: Dict[str, Any]):
+        self.id = profile_dict.get("id")
+        self.name = profile_dict.get("name")
+        self.description = profile_dict.get("description")
+
+        # Currency
+        currency_cfg = profile_dict.get("currency", {})
+        self.default_currency = currency_cfg.get("default", "USD")
+        self.currency_aliases = set(currency_cfg.get("aliases", []))
+        self.currency_aliases.add(self.default_currency.lower())
+
+        # Units
+        units_cfg = profile_dict.get("units", {})
+        self.metric_first = units_cfg.get("metric", True)
+        self.unit_conversions = {}
+        for conv in units_cfg.get("convert", []):
+            self.unit_conversions[conv["from"].lower()] = {
+                "to": conv["to"],
+                "factor": conv["factor"],
+            }
+
+        # Heuristics
+        heuristics = profile_dict.get("heuristics", {})
+        self.budget_keywords = set(heuristics.get("budget_keywords", []))
+        self.timeline_keywords = set(heuristics.get("timeline_keywords", []))
+        self.team_keywords = set(heuristics.get("team_keywords", []))
+
+        confidence_kw = heuristics.get("confidence_keywords", {})
+        self.high_confidence_words = set(confidence_kw.get("high", []))
+        self.medium_confidence_words = set(confidence_kw.get("medium", []))
+        self.low_confidence_words = set(confidence_kw.get("low", []))
+
+        # Detection
+        detection = profile_dict.get("detection", {})
+        self.currency_signals = set(detection.get("currency_signals", []))
+        self.unit_signals = set(detection.get("unit_signals", []))
+        self.keyword_signals = set(detection.get("keyword_signals", []))
+
+    def score_match(self, currency_found: List[str], units_found: List[str], keywords_found: List[str]) -> int:
+        """Score how well this profile matches the found signals."""
+        score = 0
+        for c in currency_found:
+            if c.lower() in [s.lower() for s in self.currency_signals]:
+                score += 10
+        for u in units_found:
+            if u.lower() in [s.lower() for s in self.unit_signals]:
+                score += 5
+        for k in keywords_found:
+            if k.lower() in [s.lower() for s in self.keyword_signals]:
+                score += 3
+        return score
+
+
+@dataclass
+class NormalizedAssumption:
+    """Assumption after domain-aware normalization."""
+    assumption_id: str
+    original_claim: str
+    normalized_claim: str
+    domain_id: str
+    currency: str  # Normalized to domain default
+    currency_eur_equivalent: Optional[float] = None  # For comparison
+    unit: str = "metric"  # All converted to metric
+    confidence: ConfidenceLevel = ConfidenceLevel.medium
+    notes: List[str] = field(default_factory=list)
+
+
+class DomainNormalizer:
+    """Loads domain profiles and normalizes assumptions to metric/currency/confidence."""
+
+    def __init__(self, profiles_yaml_path: Optional[str] = None):
+        self.profiles: Dict[str, DomainProfile] = {}
+        self.default_profile = None
+
+        path = Path(profiles_yaml_path) if profiles_yaml_path else DOMAIN_PROFILES_PATH
+        self._load_profiles(path)
+
+    def _load_profiles(self, yaml_path: Path) -> None:
+        """Load domain profiles from YAML file."""
+        if not yaml_path.exists():
+            LOGGER.warning(f"Domain profiles not found at {yaml_path}; using defaults")
+            self._create_default_profiles()
+            return
+
+        try:
+            with open(yaml_path, "r") as f:
+                content = f.read()
+                # Extract YAML from markdown code block
+                if "```yaml" in content:
+                    yaml_start = content.index("```yaml") + 7
+                    yaml_end = content.index("```", yaml_start)
+                    yaml_str = content[yaml_start:yaml_end]
+                else:
+                    yaml_str = content
+
+                data = yaml.safe_load(yaml_str)
+                if data and "profiles" in data:
+                    for profile_dict in data["profiles"]:
+                        profile = DomainProfile(profile_dict)
+                        self.profiles[profile.id] = profile
+                        if not self.default_profile:
+                            self.default_profile = profile
+
+            LOGGER.info(f"Loaded {len(self.profiles)} domain profiles from {yaml_path}")
+        except Exception as e:
+            LOGGER.error(f"Error loading domain profiles: {e}; using defaults")
+            self._create_default_profiles()
+
+    def _create_default_profiles(self) -> None:
+        """Create minimal default profiles if YAML not available."""
+        default_profile_dict = {
+            "id": "default",
+            "name": "General Business",
+            "description": "Default profile for unclassified projects.",
+            "currency": {"default": "USD", "aliases": ["usd", "$"]},
+            "units": {"metric": True, "convert": []},
+            "heuristics": {
+                "budget_keywords": ["budget", "cost"],
+                "timeline_keywords": ["days", "weeks"],
+                "team_keywords": ["team", "people"],
+                "confidence_keywords": {
+                    "high": ["guarantee", "have done"],
+                    "medium": ["plan to", "expect"],
+                    "low": ["estimate", "maybe"],
+                },
+            },
+            "detection": {
+                "currency_signals": ["USD", "$"],
+                "unit_signals": [],
+                "keyword_signals": [],
+            },
+        }
+        self.default_profile = DomainProfile(default_profile_dict)
+        self.profiles["default"] = self.default_profile
+
+    def detect_domain(self, assumption: QuantifiedAssumption) -> DomainProfile:
+        """Auto-detect domain profile from assumption metadata."""
+        # Extract signals from assumption
+        currency_found = []
+        if assumption.unit:
+            currency_found.append(assumption.unit)
+
+        units_found = []
+        if assumption.unit:
+            units_found.append(assumption.unit)
+
+        keywords_found = []
+        # Extract keywords from claim + evidence
+        claim_lower = assumption.claim.lower()
+        evidence_lower = (assumption.evidence or "").lower()
+        combined = f"{claim_lower} {evidence_lower}".split()
+
+        # Score all profiles
+        scores = {}
+        for profile_id, profile in self.profiles.items():
+            score = profile.score_match(currency_found, units_found, combined)
+            scores[profile_id] = score
+
+        # Pick highest scoring profile
+        if scores:
+            best_profile_id = max(scores, key=scores.get)
+            if scores[best_profile_id] > 0:
+                return self.profiles[best_profile_id]
+
+        return self.default_profile
+
+    def normalize_currency(
+        self, value: Optional[float], from_currency: str, to_profile: DomainProfile
+    ) -> tuple[Optional[float], Optional[float]]:
+        """
+        Convert currency to profile default.
+        Returns (normalized_value, eur_equivalent).
+        """
+        if value is None:
+            return None, None
+
+        # Placeholder conversion rates (in production, use real FX API)
+        fx_rates = {
+            "USD": 0.92,  # USD → EUR
+            "DKK": 0.124,  # DKK → EUR
+            "EUR": 1.0,
+        }
+
+        # For now, assume all inputs are in the detected currency or profile default
+        normalized = value
+        eur_equiv = value * fx_rates.get(to_profile.default_currency, 1.0)
+
+        return normalized, eur_equiv
+
+    def normalize_unit(self, value: Optional[float], from_unit: str, to_profile: DomainProfile) -> Optional[float]:
+        """Convert unit to metric (based on profile conversions)."""
+        if value is None or not from_unit:
+            return value
+
+        from_unit_lower = from_unit.lower()
+        if from_unit_lower in to_profile.unit_conversions:
+            conversion = to_profile.unit_conversions[from_unit_lower]
+            return value * conversion["factor"]
+
+        return value
+
+    def normalize_confidence(self, assumption: QuantifiedAssumption, domain: DomainProfile) -> ConfidenceLevel:
+        """Re-assess confidence level based on domain keywords."""
+        claim_lower = assumption.claim.lower()
+        evidence_lower = (assumption.evidence or "").lower()
+        combined = f"{claim_lower} {evidence_lower}"
+
+        # Check high confidence
+        if any(word in combined for word in domain.high_confidence_words):
+            return ConfidenceLevel.high
+
+        # Check low confidence
+        if any(word in combined for word in domain.low_confidence_words):
+            return ConfidenceLevel.low
+
+        # Default to medium
+        return ConfidenceLevel.medium
+
+    def normalize(self, assumption: QuantifiedAssumption) -> NormalizedAssumption:
+        """Normalize a QuantifiedAssumption to domain standards."""
+        domain = self.detect_domain(assumption)
+
+        # Normalize currency
+        norm_currency, eur_equiv = self.normalize_currency(assumption.lower_bound, assumption.unit or "", domain)
+
+        # Normalize unit (keep as "metric" for now)
+        norm_unit = "metric"
+
+        # Re-assess confidence per domain
+        norm_confidence = self.normalize_confidence(assumption, domain)
+
+        # Build normalized claim
+        norm_claim = f"{assumption.claim} [normalized to {domain.id} domain]"
+
+        notes = []
+        if domain.id != "default":
+            notes.append(f"Auto-detected domain: {domain.name}")
+
+        return NormalizedAssumption(
+            assumption_id=assumption.assumption_id,
+            original_claim=assumption.claim,
+            normalized_claim=norm_claim,
+            domain_id=domain.id,
+            currency=domain.default_currency,
+            currency_eur_equivalent=eur_equiv,
+            unit=norm_unit,
+            confidence=norm_confidence,
+            notes=notes,
+        )
+
+    def normalize_batch(self, assumptions: List[QuantifiedAssumption]) -> List[NormalizedAssumption]:
+        """Normalize a batch of assumptions."""
+        return [self.normalize(assumption) for assumption in assumptions]