You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upsert into new Postgres tables live_batting_logs + live_pitching_logs (auto-created)
Rebuild CSVs from full Postgres history — keeps static files fresh
First-run behavior: Seeds Postgres from existing static CSVs (9,819 batting + 4,007 pitching rows) before upserts.
orchestrator.py
New job_game_logs_refresh at 4:15 AM PT daily (after savant_refresh at 4:00 AM)
requirements_army.txt
git+https://github.com/tnestico/mlb_scraper.git — available for future pitch-level enrichment
polars>=0.20.0 — required by mlb_scraper
Why
Static CSVs were last updated manually. This keeps 2026 batting/pitching logs current through yesterday's games, ensuring team_form_layer.py and other consumers always see fresh data.
Summary by cubic
Automates a daily MLB batting and pitching game-log refresh at 4:15 AM PT using the MLB Stats API, keeping 2026 CSVs and Postgres tables up to date. This keeps team_form_layer.py and other consumers current through yesterday’s games.
New Features
Added game_logs_refresh.py: fetches last 7 days of completed games, parallel boxscore pulls, parses batter/pitcher stats, upserts to live_batting_logs/live_pitching_logs, and rebuilds data/stats/2026/mlb_batting_logs.csv and data/stats/2026/mlb_pitching_logs.csv.
Scheduled game_logs_refresh in orchestrator.py (daily 4:15 AM PT; runs after savant_refresh).
First run seeds Postgres from existing CSVs.
Dependencies
Added git+https://github.com/tnestico/mlb_scraper.git and polars>=0.20.0.
Written for commit c39dc1b. Summary will update on new commits.
@jaayslaughter-cpu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 44 seconds before requesting another review.
You’ve run out of usage credits. Purchase more in the billing tab.
⌛ How to resolve this issue?
After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.
We recommend that you space out your commits to avoid hitting the rate limit.
🚦 How do rate limits work?
CodeRabbit enforces hourly rate limits for each developer per organization.
Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.
We reviewed changes in 951b5d4...c39dc1b on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer TIP This summary will be updated as you push new changes.
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new script, game_logs_refresh.py, and a corresponding scheduled job in orchestrator.py to automate the daily retrieval and storage of MLB batting and pitching logs for the 2026 season. The implementation includes parallel fetching from the MLB Stats API and synchronization between a Postgres database and local CSV files. Feedback highlights a bug in the inningsPitched parsing logic that fails on whole numbers and suggests optimizing database performance by replacing row-by-row insertions with bulk operations in the upsert and seeding functions.
The reason will be displayed to describe this comment to others. Learn more.
The current logic for parsing inningsPitched will raise a ValueError if the string does not contain a decimal point (e.g., "5" instead of "5.0"). This results in 0 outs being credited for full innings in the except block.
The reason will be displayed to describe this comment to others. Learn more.
Performing row-by-row inserts in a loop is inefficient. Using psycopg2.extras.execute_values allows for bulk upserts, which significantly reduces database round-trips and improves performance.
The reason will be displayed to describe this comment to others. Learn more.
Seeding nearly 10,000 rows from CSV using individual INSERT statements is extremely slow. Refactoring this to use bulk insertion with execute_values will drastically improve the first-run experience.
def_seed_from_csv(conn) ->None:
"""If Postgres tables are empty, seed from existing CSVs."""withconn.cursor() ascur:
cur.execute("SELECT COUNT(*) FROM live_batting_logs")
count=cur.fetchone()[0]
ifcount>0:
returnlogger.info("[GameLogs] Seeding Postgres from existing CSVs...")
frompsycopg2.extrasimportexecute_valuesfortable, path, colsin [
("live_batting_logs", _BATTING_CSV, _BATTING_COLS),
("live_pitching_logs", _PITCHING_CSV, _PITCHING_COLS),
]:
ifnotpath.exists():
continuewithpath.open("r", encoding="utf-8") asf:
reader=csv.DictReader(f)
rows=list(reader)
ifnotrows:
continuecol_names=", ".join([cifc!="date"else"game_date"forcincols])
sql=f"INSERT INTO {table} ({col_names}) VALUES %s ON CONFLICT DO NOTHING"data= [tuple(row.get(c) forcincols) forrowinrows]
withconn.cursor() ascur:
execute_values(cur, sql, data)
conn.commit()
logger.info("[GameLogs] Seeded %d rows into %s", len(rows), table)
The reason will be displayed to describe this comment to others. Learn more.
Unused variable 'pid_str'
An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.
The reason will be displayed to describe this comment to others. Learn more.
Unused variable 'b'
An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.
The reason will be displayed to describe this comment to others. Learn more.
Unused variable 'pk_col'
An unused variable takes up space in the code, and can lead to confusion, and it should be removed. If this variable is necessary, name the variable _ to indicate that it will be unused, or start the name with unused or _unused.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
New file:
game_logs_refresh.pyDaily refresh of
mlb_batting_logs.csvandmlb_pitching_logs.csvusing the MLB Stats API (same source as tnestico/mlb_scraper).Architecture (inspired by mlb_scraper):
/api/v1/game/{id}/boxscore(8 threads)mlbam_id, player, date, starter, home_runs, h_1b, h_2b, h_3b, b_ab, b_pa, b_runs, b_rbi, b_kmlbam_id, player, date, starter, outs, strikeouts, earnedruns, walks, hitslive_batting_logs+live_pitching_logs(auto-created)First-run behavior: Seeds Postgres from existing static CSVs (9,819 batting + 4,007 pitching rows) before upserts.
orchestrator.pyjob_game_logs_refreshat 4:15 AM PT daily (after savant_refresh at 4:00 AM)requirements_army.txtgit+https://github.com/tnestico/mlb_scraper.git— available for future pitch-level enrichmentpolars>=0.20.0— required by mlb_scraperWhy
Static CSVs were last updated manually. This keeps 2026 batting/pitching logs current through yesterday's games, ensuring
team_form_layer.pyand other consumers always see fresh data.Summary by cubic
Automates a daily MLB batting and pitching game-log refresh at 4:15 AM PT using the MLB Stats API, keeping 2026 CSVs and Postgres tables up to date. This keeps
team_form_layer.pyand other consumers current through yesterday’s games.New Features
game_logs_refresh.py: fetches last 7 days of completed games, parallel boxscore pulls, parses batter/pitcher stats, upserts tolive_batting_logs/live_pitching_logs, and rebuildsdata/stats/2026/mlb_batting_logs.csvanddata/stats/2026/mlb_pitching_logs.csv.game_logs_refreshinorchestrator.py(daily 4:15 AM PT; runs aftersavant_refresh).Dependencies
git+https://github.com/tnestico/mlb_scraper.gitandpolars>=0.20.0.Written for commit c39dc1b. Summary will update on new commits.