PR #573: Live MLB game logs refresh (tnestico/mlb_scraper pattern) — daily 4:15 AM by jaayslaughter-cpu · Pull Request #442 · jaayslaughter-cpu/mework

jaayslaughter-cpu · 2026-05-15T05:10:55Z

Changes

New file: `game_logs_refresh.py`

Daily refresh of mlb_batting_logs.csv and mlb_pitching_logs.csv using the MLB Stats API (same source as tnestico/mlb_scraper).

Architecture (inspired by mlb_scraper):

Fetch completed game IDs from MLB Stats API schedule for last 7 days
Parallel-fetch /api/v1/game/{id}/boxscore (8 threads)
Parse batter rows: mlbam_id, player, date, starter, home_runs, h_1b, h_2b, h_3b, b_ab, b_pa, b_runs, b_rbi, b_k
Parse pitcher rows: mlbam_id, player, date, starter, outs, strikeouts, earnedruns, walks, hits
Upsert into new Postgres tables live_batting_logs + live_pitching_logs (auto-created)
Rebuild CSVs from full Postgres history — keeps static files fresh

First-run behavior: Seeds Postgres from existing static CSVs (9,819 batting + 4,007 pitching rows) before upserts.

`orchestrator.py`

New job_game_logs_refresh at 4:15 AM PT daily (after savant_refresh at 4:00 AM)

`requirements_army.txt`

git+https://github.com/tnestico/mlb_scraper.git — available for future pitch-level enrichment
polars>=0.20.0 — required by mlb_scraper

Why

Static CSVs were last updated manually. This keeps 2026 batting/pitching logs current through yesterday's games, ensuring team_form_layer.py and other consumers always see fresh data.

Summary by cubic

Automates a daily MLB batting and pitching game-log refresh at 4:15 AM PT using the MLB Stats API, keeping 2026 CSVs and Postgres tables up to date. This keeps team_form_layer.py and other consumers current through yesterday’s games.

New Features
- Added game_logs_refresh.py: fetches last 7 days of completed games, parallel boxscore pulls, parses batter/pitcher stats, upserts to live_batting_logs/live_pitching_logs, and rebuilds data/stats/2026/mlb_batting_logs.csv and data/stats/2026/mlb_pitching_logs.csv.
- Scheduled game_logs_refresh in orchestrator.py (daily 4:15 AM PT; runs after savant_refresh).
- First run seeds Postgres from existing CSVs.
Dependencies
- Added git+https://github.com/tnestico/mlb_scraper.git and polars>=0.20.0.

^{Written for commit c39dc1b. Summary will update on new commits.}

…attern)

coderabbitai · 2026-05-15T05:11:01Z

Warning

Rate limit exceeded

@jaayslaughter-cpu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 44 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f1a571a-90d3-4dee-9e4c-3702534af514

📥 Commits

Reviewing files that changed from the base of the PR and between 951b5d4 and c39dc1b.

📒 Files selected for processing (3)

game_logs_refresh.py
orchestrator.py
requirements_army.txt

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-573-game-logs-refresh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

deepsource-io · 2026-05-15T05:11:19Z

DeepSource Code Review

We reviewed changes in 951b5d4...c39dc1b on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade	Security Reliability Complexity Hygiene

Code Review Summary

Analyzer	Updated (UTC)	Details
Docker	May 15, 2026 5:10a.m.	Review ↗
JavaScript	May 15, 2026 5:10a.m.	Review ↗
Python	May 15, 2026 5:10a.m.	Review ↗
SQL	May 15, 2026 5:10a.m.	Review ↗
Secrets	May 15, 2026 5:10a.m.	Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

codacy-production · 2026-05-15T05:12:20Z

Not up to standards ⛔

🔴 Issues 1 critical · 3 high · 2 medium

Alerts:
⚠ 6 issues (≤ 0 issues of at least minor severity)

Results:
6 new issues

Category Results

ErrorProne 3 high

Security 1 critical
2 medium

View in Codacy

🟢 Metrics 60 complexity

Metric Results

Complexity 60

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request introduces a new script, game_logs_refresh.py, and a corresponding scheduled job in orchestrator.py to automate the daily retrieval and storage of MLB batting and pitching logs for the 2026 season. The implementation includes parallel fetching from the MLB Stats API and synchronization between a Postgres database and local CSV files. Feedback highlights a bug in the inningsPitched parsing logic that fails on whole numbers and suggests optimizing database performance by replacing row-by-row insertions with bulk operations in the upsert and seeding functions.

gemini-code-assist · 2026-05-15T05:13:24Z

+                    innings, thirds = ip_str.split(".")
+                    outs = int(innings) * 3 + int(thirds)


The current logic for parsing inningsPitched will raise a ValueError if the string does not contain a decimal point (e.g., "5" instead of "5.0"). This results in 0 outs being credited for full innings in the except block.

Suggested change

innings, thirds = ip_str.split(".")

outs = int(innings) * 3 + int(thirds)

parts = ip_str.split(".")

innings = int(parts[0])

thirds = int(parts[1]) if len(parts) > 1 else 0

outs = innings * 3 + thirds

gemini-code-assist · 2026-05-15T05:13:24Z

+    sql = """
+    INSERT INTO live_batting_logs
+        (mlbam_id, player, game_date, starter,
+         home_runs, h_1b, h_2b, h_3b, b_ab, b_pa, b_runs, b_rbi, b_k)
+    VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)
+    ON CONFLICT (mlbam_id, game_date) DO UPDATE SET
+        player=EXCLUDED.player, starter=EXCLUDED.starter,
+        home_runs=EXCLUDED.home_runs, h_1b=EXCLUDED.h_1b,
+        h_2b=EXCLUDED.h_2b, h_3b=EXCLUDED.h_3b,
+        b_ab=EXCLUDED.b_ab, b_pa=EXCLUDED.b_pa,
+        b_runs=EXCLUDED.b_runs, b_rbi=EXCLUDED.b_rbi, b_k=EXCLUDED.b_k
+    """
+    with conn.cursor() as cur:
+        for r in rows:
+            cur.execute(sql, (
+                r["mlbam_id"], r["player"], r["date"], r["starter"],
+                r["home_runs"], r["h_1b"], r["h_2b"], r["h_3b"],
+                r["b_ab"], r["b_pa"], r["b_runs"], r["b_rbi"], r["b_k"],
+            ))
+    conn.commit()
+    return len(rows)


Performing row-by-row inserts in a loop is inefficient. Using psycopg2.extras.execute_values allows for bulk upserts, which significantly reduces database round-trips and improves performance.

from psycopg2.extras import execute_values sql = """ INSERT INTO live_batting_logs (mlbam_id, player, game_date, starter, home_runs, h_1b, h_2b, h_3b, b_ab, b_pa, b_runs, b_rbi, b_k) VALUES %s ON CONFLICT (mlbam_id, game_date) DO UPDATE SET player=EXCLUDED.player, starter=EXCLUDED.starter, home_runs=EXCLUDED.home_runs, h_1b=EXCLUDED.h_1b, h_2b=EXCLUDED.h_2b, h_3b=EXCLUDED.h_3b, b_ab=EXCLUDED.b_ab, b_pa=EXCLUDED.b_pa, b_runs=EXCLUDED.b_runs, b_rbi=EXCLUDED.b_rbi, b_k=EXCLUDED.b_k """ data = [ ( r["mlbam_id"], r["player"], r["date"], r["starter"], r["home_runs"], r["h_1b"], r["h_2b"], r["h_3b"], r["b_ab"], r["b_pa"], r["b_runs"], r["b_rbi"], r["b_k"] ) for r in rows ] with conn.cursor() as cur: execute_values(cur, sql, data) conn.commit() return len(rows)

gemini-code-assist · 2026-05-15T05:13:24Z

+    sql = """
+    INSERT INTO live_pitching_logs
+        (mlbam_id, player, game_date, starter,
+         outs, strikeouts, earnedruns, walks, hits)
+    VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)
+    ON CONFLICT (mlbam_id, game_date) DO UPDATE SET
+        player=EXCLUDED.player, starter=EXCLUDED.starter,
+        outs=EXCLUDED.outs, strikeouts=EXCLUDED.strikeouts,
+        earnedruns=EXCLUDED.earnedruns, walks=EXCLUDED.walks,
+        hits=EXCLUDED.hits
+    """
+    with conn.cursor() as cur:
+        for r in rows:
+            cur.execute(sql, (
+                r["mlbam_id"], r["player"], r["date"], r["starter"],
+                r["outs"], r["strikeouts"], r["earnedruns"],
+                r["walks"], r["hits"],
+            ))
+    conn.commit()
+    return len(rows)


Similar to the batting upsert, this pitching upsert should use bulk operations for better efficiency.

from psycopg2.extras import execute_values sql = """ INSERT INTO live_pitching_logs (mlbam_id, player, game_date, starter, outs, strikeouts, earnedruns, walks, hits) VALUES %s ON CONFLICT (mlbam_id, game_date) DO UPDATE SET player=EXCLUDED.player, starter=EXCLUDED.starter, outs=EXCLUDED.outs, strikeouts=EXCLUDED.strikeouts, earnedruns=EXCLUDED.earnedruns, walks=EXCLUDED.walks, hits=EXCLUDED.hits """ data = [ ( r["mlbam_id"], r["player"], r["date"], r["starter"], r["outs"], r["strikeouts"], r["earnedruns"], r["walks"], r["hits"] ) for r in rows ] with conn.cursor() as cur: execute_values(cur, sql, data) conn.commit() return len(rows)

gemini-code-assist · 2026-05-15T05:13:24Z

+def _seed_from_csv(conn) -> None:
+    """If Postgres tables are empty, seed from existing CSVs."""
+    with conn.cursor() as cur:
+        cur.execute("SELECT COUNT(*) FROM live_batting_logs")
+        count = cur.fetchone()[0]
+    if count > 0:
+        return
+
+    logger.info("[GameLogs] Seeding Postgres from existing CSVs...")
+    for table, path, cols, pk_col in [
+        ("live_batting_logs",  _BATTING_CSV,  _BATTING_COLS,  "game_date"),
+        ("live_pitching_logs", _PITCHING_CSV, _PITCHING_COLS, "game_date"),
+    ]:
+        if not path.exists():
+            continue
+        rows_inserted = 0
+        with path.open("r", encoding="utf-8") as f:
+            reader = csv.DictReader(f)
+            with conn.cursor() as cur:
+                for row in reader:
+                    placeholders = ", ".join(["%s"] * len(cols))
+                    col_names    = ", ".join(
+                        [c if c != "date" else "game_date" for c in cols]
+                    )
+                    values = [row.get(c, row.get("date" if c == "date" else c, None))
+                              for c in cols]
+                    cur.execute(
+                        f"INSERT INTO {table} ({col_names}) VALUES ({placeholders}) "
+                        f"ON CONFLICT DO NOTHING",
+                        values,
+                    )
+                    rows_inserted += 1
+        conn.commit()
+        logger.info("[GameLogs] Seeded %d rows into %s", rows_inserted, table)
+


Seeding nearly 10,000 rows from CSV using individual INSERT statements is extremely slow. Refactoring this to use bulk insertion with execute_values will drastically improve the first-run experience.

def _seed_from_csv(conn) -> None: """If Postgres tables are empty, seed from existing CSVs.""" with conn.cursor() as cur: cur.execute("SELECT COUNT(*) FROM live_batting_logs") count = cur.fetchone()[0] if count > 0: return logger.info("[GameLogs] Seeding Postgres from existing CSVs...") from psycopg2.extras import execute_values for table, path, cols in [ ("live_batting_logs", _BATTING_CSV, _BATTING_COLS), ("live_pitching_logs", _PITCHING_CSV, _PITCHING_COLS), ]: if not path.exists(): continue with path.open("r", encoding="utf-8") as f: reader = csv.DictReader(f) rows = list(reader) if not rows: continue col_names = ", ".join([c if c != "date" else "game_date" for c in cols]) sql = f"INSERT INTO {table} ({col_names}) VALUES %s ON CONFLICT DO NOTHING" data = [tuple(row.get(c) for c in cols) for row in rows] with conn.cursor() as cur: execute_values(cur, sql, data) conn.commit() logger.info("[GameLogs] Seeded %d rows into %s", len(rows), table)

ecc-tools · 2026-05-15T05:14:29Z

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

deepsource-io · 2026-05-15T05:17:34Z

+from __future__ import annotations
+
+import csv
+import io


Unused import io

An object has been imported but is not used anywhere in the file.
It should either be used or the import should be removed.

deepsource-io · 2026-05-15T05:17:34Z

+        starter_batter_id  = batting_order[0]  if batting_order  else None
+        starter_pitcher_id = pitching_order[0] if pitching_order else None
+
+        for pid_str, pdata in players.items():


Unused variable 'pid_str'

An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.

deepsource-io · 2026-05-15T05:17:34Z

+
+            # ── Batting ───────────────────────────────────────────────────────
+            if pid in batters:
+                b = stats.get("batting", {}).get("summary", None)


Unused variable 'b'

An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.

deepsource-io · 2026-05-15T05:17:34Z

+        return
+
+    logger.info("[GameLogs] Seeding Postgres from existing CSVs...")
+    for table, path, cols, pk_col in [


Unused variable 'pk_col'

An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.

PR #573: Live MLB game logs refresh — daily 4:15 AM PT (mlb_scraper p…

c39dc1b

…attern)

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

jaayslaughter-cpu merged commit 52e9c24 into main May 15, 2026
7 of 9 checks passed

deepsource-io Bot reviewed May 15, 2026

View reviewed changes

		innings, thirds = ip_str.split(".")
		outs = int(innings) * 3 + int(thirds)

-                    innings, thirds = ip_str.split(".")
-                    outs = int(innings) * 3 + int(thirds)
+                    parts = ip_str.split(".")
+                    innings = int(parts[0])
+                    thirds = int(parts[1]) if len(parts) > 1 else 0
+                    outs = innings * 3 + thirds

Conversation

jaayslaughter-cpu commented May 15, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

New file: game_logs_refresh.py

orchestrator.py

requirements_army.txt

Why

Summary by cubic

Uh oh!

coderabbitai Bot commented May 15, 2026

Rate limit exceeded

Uh oh!

deepsource-io Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DeepSource Code Review

PR Report Card

Code Review Summary

Uh oh!

codacy-production Bot commented May 15, 2026

Not up to standards ⛔

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ecc-tools Bot commented May 15, 2026

Uh oh!

deepsource-io Bot May 15, 2026

Choose a reason for hiding this comment

Unused import io

Uh oh!

deepsource-io Bot May 15, 2026

Choose a reason for hiding this comment

Unused variable 'pid_str'

Uh oh!

deepsource-io Bot May 15, 2026

Choose a reason for hiding this comment

Unused variable 'b'

Uh oh!

deepsource-io Bot May 15, 2026

Choose a reason for hiding this comment

Unused variable 'pk_col'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jaayslaughter-cpu commented May 15, 2026 •

edited by cubic-dev-ai Bot

Loading

New file: `game_logs_refresh.py`

`orchestrator.py`

`requirements_army.txt`

deepsource-io Bot commented May 15, 2026 •

edited

Loading