Image Source: nfldraftdiamonds
Troy J., Adi B., Romith C.
Report Bug
·
Request Feature
Table of Contents
NFL play-calling unfolds under extreme time pressure, where even a small informational advantage can swing the outcome. It’s a real-time decision problem, driven by field position, risk tolerance, and the evolving game situation. While modern football analytics excels at post-game evaluation, few tools attempt to predict or guide decisions under a live-game environment. The purpose of this project is to explore the potential of tree-based ensemble methods, motivated by a simple question: can play-calling be predicted based solely on the pre-snap game situation?
SnapIQ frames play prediction as a multi-class classification problem, forecasting play type using pre-snap information from a canonical dataset spanning the 2014-2018 seasons. Key features include down, distance to first down, field position, score differential, time remaining, and timeouts. All features are pre-processed for consistency, with categorical variables indexed, numeric variables scaled, and missing or stale team codes standardized.
This project targets tree-based learning methods of increasing sophistication:
- Decision Tree: for a simple, interpretable baseline
- Gradient Boosting: for sequential learning from residual errors
- XG-Boost: an optimized gradient-boosting framework with built-in regularization and parallelization for runtime efficiency
Each model is trained on a shared feature set, with hyperparameters tuned to balance predictive performance, interpretability, and speed, in order to reflect the constraints of a live-game decision support system. The chart below illustrates the values tested and the selected configuration for the XG-Boost model:
A season-based train/test split is used to ensure realistic evaluation and prevent information leakage. Rather than optimizing purely for accuracy, this project prioritizes practical deployability.
To mirror real-world usage, SnapIQ extends beyond batch modeling into a simulated streaming environment (on Databricks), where sequential game data is processed in order. A Random Forest model is applied in this phase to evaluate trade-offs in performance, latency, and operational feasibility. The result is a lightweight predictor capable of informing coaching preparation, defensive alignment, and game-planning workflows, while laying the groundwork for more advanced real-time applications in the future.
1. Tree-based ensembles substantially outperform single decision trees, capturing non-linear interactions between the situational variables.
The figure below shows the structure of a simple decision tree, followed by a sample portion of a single tree from the final iteration of a boosted model. This contrast illustrates the increased complexity of boosted models in order to capture nuanced interactions between features.
Simple decision tree model:
Sample portion of a single iteration of a boosted tree model:
2. Field position, down, and the distance to first down emerge as the most influential factors.
During feature-importance analysis, these pre-snap situational variables consistently drove model decisions, confirming that fundamental game context strongly shapes play selection. Further breakdown by down revealed how accuracy can be skewed in predictable situations, such as obvious punts on long 4th downs.
3. XG-Boost offers the best accuracy–runtime tradeoff among the evaluated models, making it well-suited for real-time decision-support scenarios.
However, it is important to note that predictive accuracy plateaus quickly when restricted to pre-snap situational data, highlighting the inherent uncertainty in play-calling and the limits of prediction without personnel, formation, or coverage information. Even with advanced ensemble models, limiting inputs to pre-snap context caps performance, reflecting the strategic and stochastic nature of NFL play-calling. A more detailed, multi-class breakdown of accuracy, precision, and error rates was also explored to better understand the strengths and limitations of these models across each play type.
4. Season-based evaluation confirms stable generalization, suggesting that play-calling tendencies are learnable across years, even as teams and personnel change.
5. Distributed pipelines in a simulated streaming environment can add operational value even when accuracy gains are marginal.
By enabling sequential inference on season-long data, these pipelines demonstrate the feasibility of near real-time decision support for coaching and game planning.
Ultimately, SnapIQ serves as a practical exploration of how predictive modeling, feature engineering, and scalable pipelines intersect in sports analytics, laying the groundwork for more advanced real-time strategy and forecasting systems. By framing play prediction as a decision-support problem rather than a pure modeling exercise, the project prioritizes clarity, speed, and deployability.
However, play prediction remains a challenging multi-class classification task, marked by class imbalance, strong contextual dependence, and non-linear feature interactions. Limiting the model to pre-snap situational data captures meaningful structure in play-calling behavior, but only to a certain extent. Future iterations could incorporate richer contextual signals such as formations, personnel groupings, and historical tendencies, and also explore sequential or reinforcement learning approaches to better model strategic decision-making over the course of a game.
-
4442_NFLClassification.R: Runs the full offline modeling pipeline: data preparation, feature engineering, model training, and evaluation for decision trees, gradient boosting, and XG-Boost -
4442_NFLClassification.Rmd: Annotated walkthrough of the modeling pipeline with visuals, intermediate results, and supporting discussion / technical reasoning (feature selection, model choice, evaluation strategy, etc.) -
simulated_stream_predictor.py: Simulates real-time streaming by sequentially ingesting season-ordered game data and applying a trained model, highlighting operational considerations such as latency, feature consistency, and inference stability



