PR #571: Fix bp2vec TensorFlow crash — graceful fallback + add TF to requirements#440
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe PR adds TensorFlow as a dependency and refactors the training pipeline to gracefully handle missing TensorFlow. Instead of terminating on import failure, ChangesTensorFlow graceful failure
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~4 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
Overall Grade |
Security Reliability Complexity Hygiene |
Code Review Summary
| Analyzer | Status | Updated (UTC) | Details |
|---|---|---|---|
| Docker | May 15, 2026 4:32a.m. | Review ↗ | |
| JavaScript | May 15, 2026 4:32a.m. | Review ↗ | |
| Python | May 15, 2026 4:32a.m. | Review ↗ | |
| SQL | May 15, 2026 4:32a.m. | Review ↗ | |
| Secrets | May 15, 2026 4:32a.m. | Review ↗ |
Important
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
Up to standards ✅🟢 Issues
|
Micro-Learning Topic: Cross-site scripting (Detected by phrase)Matched on "XSS"Cross-site scripting vulnerabilities occur when unescaped input is rendered into a page displayed to the user. When HTML or script is included in the input, it will be processed by a user's browser as HTML or script and can alter the appearance of the page or execute malicious scripts in their user context. Try a challenge in Secure Code WarriorHelpful references
Micro-Learning Topic: External entity injection (Detected by phrase)Matched on "xXE"An XML External Entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser. This attack may lead to the disclosure of confidential data, denial of service, server-side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts. Try a challenge in Secure Code WarriorHelpful references
|
There was a problem hiding this comment.
Code Review
This pull request updates bp2vec_train.py to handle missing TensorFlow dependencies more gracefully by returning None instead of calling sys.exit(1), and adds tensorflow to the requirements file. Feedback suggests addressing a remaining sys.exit(1) call in load_statcast_seasons to ensure consistent behavior and recommended moving the dependency check to the beginning of the train function to avoid unnecessary data processing.
| log.error("TensorFlow not installed. Run: pip install tensorflow") | ||
| sys.exit(1) | ||
| log.error("TensorFlow not installed — bp2vec training skipped. Add tensorflow to requirements_army.txt") | ||
| return None |
There was a problem hiding this comment.
The transition from sys.exit(1) to return None correctly addresses the issue of disrupting the scheduler's event loop. However, please note that load_statcast_seasons (line 130) still contains a sys.exit(1) call on pybaseball import failure. To fully achieve the goal of a graceful fallback and prevent process termination in a shared environment, that instance should also be updated to return or raise an exception instead of exiting.
| if model is None: | ||
| log.warning("bp2vec training aborted — TensorFlow unavailable. Add tensorflow to requirements_army.txt") | ||
| return |
There was a problem hiding this comment.
This check correctly handles the case where TensorFlow is missing. However, it occurs after load_statcast_seasons and build_indices, which can be time-consuming (the docstring mentions 10-20 minutes). Consider performing a quick check for TensorFlow availability at the very beginning of the train function to "fail fast" and avoid unnecessary data processing when the dependency is unavailable.
Problem
job_bp2vec_retraincrashed every run withModuleNotFoundError: No module named 'tensorflow', then calledsys.exit(1)which disrupted the APScheduler event loop (caused oneCancelledErrorinjob_data_hub).Fixes
1.
bp2vec_train.py— graceful fallbackbuild_model():sys.exit(1)→return Nonein ImportError handlertrain(): guard added afterbuild_model()— if model is None, log warning and return cleanly instead of crashing onmodel.summary()2.
requirements_army.txt— add TensorFlowtensorflow>=2.15.0added — enables actual bp2vec embedding trainingAfter this deploy,
job_bp2vec_retrainwill complete successfully.Summary by cubic
Stops
job_bp2vec_retrainfrom crashing whentensorflowis missing and addstensorflowto requirements so bp2vec training can run without disrupting APScheduler.Bug Fixes
build_model(): on ImportError, log and returnNoneinstead ofsys.exit(1).train(): guard forNonemodel; log and exit early to avoid APScheduler cancellations.Dependencies
tensorflow>=2.15.0torequirements_army.txt.Written for commit bb35f4a. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
Chores