PR #566: XGBoost v2 training + Marcel regression + data-driven blend weights by jaayslaughter-cpu · Pull Request #435 · jaayslaughter-cpu/mework

jaayslaughter-cpu · 2026-05-15T02:46:25Z

PR #566: XGBoost v2 training + Marcel regression + data-driven blend weights

4 changes

1. `scripts/xgb_k_training.py` — Replaced with v2

Recent-season weighting: 2026=4×, 2025=2×, 2024=1.5×, 2022-2023=1×. Historical data no longer weighted equally — corrects for league K-rate drift since 2022.
Feature alignment fixed: Training now uses sv_era, sv_k_pct, sv_bb_pct, sv_whiff_pct (matches xgb_k_layer.py inference names exactly). Adds 4 previously-zero features: l3_ks, l3_ip, l5_ip, days_rest.
opp_whiff added to hit model: Pitcher SwStr% was missing from hit model training data — now populated from FanGraphs.
Ledger-first training: Loads real PropIQ graded legs from bet_ledger as primary source (3× weight bonus over Statcast); falls back to pybaseball only when ledger has <500 rows. After 500+ K legs accumulate, model trains primarily on actual PropIQ outcomes.
DB persistence: Models saved to xgb_model_store (base64-encoded PKL) — survives Railway restarts.
--status flag: python scripts/xgb_k_training.py --status shows Brier scores + blend recommendations.

2. `update_blend_weights.py` — New file (repo root)

Reads models/model_metrics.json after training and automatically patches xgb_k_layer.py blend weights based on actual Brier scores.

Blend schedule:

Brier < 0.23 → 70/30 (strong edge)
Brier < 0.25 → 80/20 (marginal edge — current default)
Brier ≥ 0.25 → 90/10 (worse than null — reduce XGB weight)
Brier ≥ 0.27 → 95/5 (actively hurting)

Usage: python update_blend_weights.py (preview) or python update_blend_weights.py --apply

3. `marcel_layer.py` — Replaced with production version

Replaces the stub (which was importing MarcelLayer, marcel_adjustment — classes that never existed — and returning 0.0).

New version implements Marcel regression-to-mean (Tango Tiger 2004):

get_marcel_k_rate(k_pct, season_bf) — pitcher K% regressed to league mean; 250 BF regression constant
get_marcel_hit_rate(avg, season_pa) — batter hit rate; 600 PA regression constant
get_marcel_xba(xba, season_pa) — xBA; 200 PA regression constant (stabilises faster)
enrich_prop_with_marcel(prop, hub) — top-level call that mutates sv_k_pct / sv_xba proportional to regression strength

Example: pitcher with 35% K-rate through 80 BF → Marcel regresses to ~27% (heavy). Same pitcher at 600 BF → ~27.5% (barely any change). In May (80-200 BF per starter), Marcel meaningfully corrects small-sample noise.

4. `prop_enrichment_layer.py` — Marcel call site updated

Old: _get_marcel_adj(player, prop_type, is_pitcher) → returned 0.0 (dead code)
New: enrich_prop_with_marcel(prop, hub) → mutates sv_k_pct/sv_xba directly

The adjusted sv_k_pct/sv_xba values flow into the XGBoost blend at inference time in tasklets.py.
_get_marcel_adj stub retained as no-op for backward compat; _MARCEL_LAYER global removed.

Summary by cubic

Upgrades K-training to XGBoost v2, adds Marcel regression, and auto-tunes blend weights so inference adapts to measured model quality. Improves early-season stability, aligns training/inference features, and persists models across deploys.

New Features
- XGBoost v2 training: recent-season weights (2026×4, 2025×2, 2024×1.5), fixed feature alignment (sv_era, sv_k_pct, sv_bb_pct, sv_whiff_pct), new features (l3_ks, l3_ip, l5_ip, days_rest), ledger-first training with 3× bonus, opp_whiff added to hit model, DB persistence to xgb_model_store, and --status for Brier + blend tips.
- Data‑driven blend weights: update_blend_weights.py reads models/model_metrics.json and patches xgb_k_layer.py using Brier thresholds (70/30, 80/20, 90/10, 95/5); run with --apply to write.
- Marcel regression (production): implements regression-to-mean for K% (250 BF), hit rate (600 PA), and xBA (200 PA); enrich_prop_with_marcel(prop, hub) mutates sv_k_pct/sv_xba; call site updated in prop_enrichment_layer.py (old stub kept as no-op).

^{Written for commit 19cfa4d. Summary will update on new commits.}

…arcel regression layer + data-driven blend weight updater

coderabbitai · 2026-05-15T02:46:33Z

Warning

Rate limit exceeded

@jaayslaughter-cpu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddfa33a1-5187-4b71-8ba9-d04e3258bcb1

📥 Commits

Reviewing files that changed from the base of the PR and between 76f309b and 19cfa4d.

📒 Files selected for processing (4)

marcel_layer.py
prop_enrichment_layer.py
scripts/xgb_k_training.py
update_blend_weights.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-566-xgb-v2-marcel

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ecc-tools · 2026-05-15T02:46:37Z

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

deepsource-io · 2026-05-15T02:46:53Z

DeepSource Code Review

We reviewed changes in 76f309b...19cfa4d on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade	Security Reliability Complexity Hygiene

Code Review Summary

Analyzer	Updated (UTC)	Details
Docker	May 15, 2026 2:46a.m.	Review ↗
JavaScript	May 15, 2026 2:46a.m.	Review ↗
Python	May 15, 2026 2:46a.m.	Review ↗
SQL	May 15, 2026 2:46a.m.	Review ↗
Secrets	May 15, 2026 2:46a.m.	Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

codacy-production · 2026-05-15T02:47:44Z

Not up to standards ⛔

🔴 Issues 2 critical · 9 high · 2 medium

Alerts:
⚠ 13 issues (≤ 0 issues of at least minor severity)

Results:
13 new issues

Category Results

ErrorProne 9 high

Security 2 critical
2 medium

View in Codacy

🟢 Metrics 59 complexity

Metric Results

Complexity 59

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

deepsource-io · 2026-05-15T02:47:47Z


-import json
 import logging
 import os


Unused import os

An object has been imported but is not used anywhere in the file.
It should either be used or the import should be removed.

deepsource-io · 2026-05-15T02:47:47Z

-from datetime import datetime, timezone
-
-import requests
+from functools import lru_cache


Unused lru_cache imported from functools

An object has been imported but is not used anywhere in the file.
It should either be used or the import should be removed.

deepsource-io · 2026-05-15T02:47:47Z

+    return max(5.0, min(40.0, regressed))
+
+
+def enrich_prop_with_marcel(prop: dict, hub: dict) -> dict:


Unused argument 'hub'

An unused argument can lead to confusions. It should be removed. If this variable is necessary, name the variable _ or start the name with unused or _unused.

deepsource-io · 2026-05-15T02:47:47Z

+        # (label, current, sample_n, hist, expected_direction, func)
+        ("K% elite early (35%, 80 BF)",
+         35.0, 80, None, "< 30",
+         lambda c, n, h: get_marcel_k_rate(c, n, h)),


Lambda may not be necessary

A lambda that calls a function without modifying any of its parameters is unnecessary. Python functions are first-class objects and can be passed around in the same way as the resulting lambda. It is recommended to remove the lambda and use the function directly.

deepsource-io · 2026-05-15T02:47:47Z

+         lambda c, n, h: get_marcel_k_rate(c, n, h)),
+        ("K% elite full season (28%, 600 BF)",
+         28.0, 600, None, "25-28",
+         lambda c, n, h: get_marcel_k_rate(c, n, h)),


Lambda may not be necessary

A lambda that calls a function without modifying any of its parameters is unnecessary. Python functions are first-class objects and can be passed around in the same way as the resulting lambda. It is recommended to remove the lambda and use the function directly.

deepsource-io · 2026-05-15T02:47:48Z

+    try:
+        import shap, pickle as _pkl
+        with open(model_path, "rb") as f:
+            model = _pkl.load(f)


Pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue.

The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

deepsource-io · 2026-05-15T02:47:48Z

+        print(f"  {key:<10} {b_str:>8} {a_str:>8} {n_test:>8}   {status}")
+
+    print(f"\n  Null model Brier: {null_brier} (always predict 50%)")
+    print(f"  Target Brier:     <0.23 to justify current blend weights")


`f-string` used without any expression

It is wasteful to use f-string mechanism if there are no expressions to be extrapolated. It is recommended to use regular strings instead.

deepsource-io · 2026-05-15T02:47:48Z

+
+    print(f"\n  Null model Brier: {null_brier} (always predict 50%)")
+    print(f"  Target Brier:     <0.23 to justify current blend weights")
+    print(f"\n  Blend recommendations:")


`f-string` used without any expression

It is wasteful to use f-string mechanism if there are no expressions to be extrapolated. It is recommended to use regular strings instead.

deepsource-io · 2026-05-15T02:47:48Z

+NULL_BRIER = 0.25  # null model always predicts 50%
+
+
+def _get_blend_weight(brier: float | None, model_name: str) -> tuple[float, float, str]:


Unused argument 'model_name'

An unused argument can lead to confusions. It should be removed. If this variable is necessary, name the variable _ or start the name with unused or _unused.

deepsource-io · 2026-05-15T02:47:48Z

+        return False
+
+    content = XGB_LAYER.read_text()
+    original = content


Unused variable 'original'

An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.

gemini-code-assist

Code Review

This pull request overhauls the Marcel projection system, transitioning from a standalone probability adjustment layer to a regression-to-the-mean mechanism that directly mutates input features for the XGBoost models. Key changes include the introduction of a new marcel_layer.py with sample-size-dependent regression formulas, updated training scripts for XGBoost models incorporating recency weighting, and a new utility to automatically update model blend weights based on Brier scores. Feedback focuses on a critical units mismatch where absolute rates are assigned to probability delta fields, potentially causing extreme probability distortions. Additionally, the review identifies a "double regression" logic error that over-penalizes extreme stats, a discrepancy between the implementation and documentation of weighted historical averages, and risks associated with using truthy fallbacks for zero-count statistics.

gemini-code-assist · 2026-05-15T02:48:37Z

+            prop = _emp(prop, hub)
+        except Exception:
+            pass
+        prop["_marcel_adj"] = prop.get("_marcel_k_pct") or prop.get("_marcel_hit_rate") or 0.0


This assignment introduces a critical units mismatch. _marcel_adj was previously a probability delta (e.g., ±0.018), but it is now being set to an absolute projected rate (e.g., 25.0). The consumer in tasklets.py (_BaseAgent._model_prob) multiplies this value by 100.0 and adds it to the win probability, which will result in massive adjustments (e.g., +2500pp) and peg almost all probabilities to the 95% cap. Since Marcel influence is now primarily handled via feature mutation (sv_k_pct/sv_xba), this legacy nudge should either be removed or converted back to a small probability delta.

Suggested change

prop["_marcel_adj"] = prop.get("_marcel_k_pct") or prop.get("_marcel_hit_rate") or 0.0

prop["_marcel_adj"] = 0.0 # Marcel influence now handled via feature mutation

gemini-code-assist · 2026-05-15T02:48:37Z

+def _weighted_hist(current: float, hist: Optional[float],
+                   weights=(5, 4, 3)) -> float:
    """
-    Parse FanGraphs percentage field.
-    Handles both string format ("22.0 %") and decimal float (0.22 or 22.0).
-    Returns a decimal fraction (0.22, not 22).
+    Three-year weighted average (current season × 5, prev × 4, prev-prev × 3).
+    Uses available data — if hist not provided, current season dominates.
    """
-    if val is None:
-        return 0.0
-    if isinstance(val, (int, float)):
-        v = float(val)
-        return v / 100.0 if v > 1.0 else v
-    s = str(val).strip().rstrip("%").strip()
-    try:
-        v = float(s)
-        return v / 100.0 if v > 1.0 else v
-    except ValueError:
-        return 0.0
-
-
-def _parse_float(val, default: float = 0.0) -> float:
-    """Safe float parse from any type."""
-    if val is None:
-        return default
-    try:
-        return float(val)
-    except (ValueError, TypeError):
-        return default
-
-
-# ---------------------------------------------------------------------------
-# FanGraphs data fetcher
-# ---------------------------------------------------------------------------
-
-def _fetch_fg_data(stats: str, season_start: int, season_end: int) -> list[dict]:
-    """
-    Fetch multi-year leaderboard from FanGraphs JSON API.
+    if hist is None:
+        return current
+    # hist is a single prior-season value (could represent 1 or 2 seasons)
+    total_w = weights[0] + weights[1]
+    return (weights[0] * current + weights[1] * hist) / total_w


The _weighted_hist function implementation does not match its docstring. The docstring describes a "Three-year weighted average (current season × 5, prev × 4, prev-prev × 3)", but the code only uses the first two weights (weights[0] and weights[1]), effectively performing a 2-point weighted average. If a 3-year system is intended, the logic needs to handle a list of historical values or a pre-weighted historical aggregate.

gemini-code-assist · 2026-05-15T02:48:37Z

+
+    # ── K props — Marcel pitcher K-rate ───────────────────────────────────────
+    if prop_type in ("strikeouts", "pitching_outs", "pitcher_strikeouts"):
+        raw_k_pct  = float(prop.get("sv_k_pct")  or prop.get("fg_kpct")   or LEAGUE_AVG["k_pct"])


Using the or operator for fallbacks here is risky because a valid 0.0 rate (e.g., a pitcher with zero strikeouts in a very small early-season sample) will be treated as falsy and overwritten by the league average. This unintentionally biases the regression toward the mean for players with zero-count stats.

Suggested change

raw_k_pct = float(prop.get("sv_k_pct") or prop.get("fg_kpct") or LEAGUE_AVG["k_pct"])

raw_k_pct = float(prop.get("sv_k_pct") if prop.get("sv_k_pct") is not None else prop.get("fg_kpct") if prop.get("fg_kpct") is not None else LEAGUE_AVG["k_pct"])

gemini-code-assist · 2026-05-15T02:48:37Z

+        regression_strength = min(1.0, max(0.0, 1.0 - season_bf / 250))
+        if regression_strength > 0.3 and abs(marcel_k - raw_k_pct) > 1.5:
+            # Blend raw and Marcel proportional to regression strength
+            blended_k = (1 - regression_strength) * raw_k_pct + regression_strength * marcel_k
+            prop["sv_k_pct"] = round(blended_k, 2)
+            logger.debug(
+                "[Marcel] K-rate: raw=%.1f%% Marcel=%.1f%% → blended=%.1f%% (BF=%d reg=%.0f%%)",
+                raw_k_pct, marcel_k, blended_k, season_bf, regression_strength * 100,
            )


This logic implements a form of "double regression" that may be mathematically unsound. marcel_k is already a regressed value (it has been pulled toward the league mean based on sample size inside get_marcel_k_rate). Blending it again with raw_k_pct using regression_strength (which is also sample-size dependent) applies the regression penalty twice, resulting in an overly conservative estimate that is pulled too hard toward the league average.

# Use the Marcel projection directly as it already incorporates regression if abs(marcel_k - raw_k_pct) > 1.5: prop["sv_k_pct"] = round(marcel_k, 2) logger.debug( "[Marcel] K-rate: raw=%.1f%% Marcel=%.1f%% (BF=%d)", raw_k_pct, marcel_k, season_bf )

gemini-code-assist · 2026-05-15T02:48:37Z

+        raw_whiff  = float(prop.get("sv_whiff_pct") or prop.get("sv_swstr_pct") or LEAGUE_AVG["whiff_pct"])
+        season_p   = season_bf * 3  # rough pitch count from BF
+        marcel_whiff = get_marcel_whiff_pct(raw_whiff, season_p)
+        prop["_marcel_whiff_pct"] = marcel_whiff


While sv_k_pct and sv_xba are mutated with their regressed values, sv_whiff_pct (or sv_swstr_pct) is not. Since whiff rate is a key feature in the XGBoost model and is highly susceptible to small-sample noise early in the season, it should also be regressed and mutated to ensure consistency across the feature vector.

PR #566: XGBoost v2 training (season weights + feature alignment) + M…

19cfa4d

…arcel regression layer + data-driven blend weight updater

jaayslaughter-cpu merged commit 42b06e0 into main May 15, 2026
7 of 9 checks passed

deepsource-io Bot reviewed May 15, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

		return max(5.0, min(40.0, regressed))


		def enrich_prop_with_marcel(prop: dict, hub: dict) -> dict:

		NULL_BRIER = 0.25 # null model always predicts 50%


		def _get_blend_weight(brier: float \| None, model_name: str) -> tuple[float, float, str]:

	prop["_marcel_adj"] = prop.get("_marcel_k_pct") or prop.get("_marcel_hit_rate") or 0.0
	prop["_marcel_adj"] = 0.0 # Marcel influence now handled via feature mutation

	raw_k_pct = float(prop.get("sv_k_pct") or prop.get("fg_kpct") or LEAGUE_AVG["k_pct"])
	raw_k_pct = float(prop.get("sv_k_pct") if prop.get("sv_k_pct") is not None else prop.get("fg_kpct") if prop.get("fg_kpct") is not None else LEAGUE_AVG["k_pct"])

Conversation

jaayslaughter-cpu commented May 15, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!