This research examines the winning probabilities across various scoring scenarios in tennis and reveals that the common-sense theoretical model makes accurate probability predictions at the macro level (set level) but inaccurate predictions at the micro level (game level). This suggests that our intuitive probabilistic estimates of tennis game outcomes are often incorrect but could make the game more exciting and engaging for spectators.
📄 paper/
Abstract
In sports games, the excitement and suspense felt by the spectators are essential to their entertainment experience. The level of excitement and suspense is linked to the spectators' reasoning about the probability of winning or losing. In tennis, as in many other sports, spectators' predictions of winning probabilities largely hinge on the scores. Given tennis's hierarchical scoring system, its probabilistic reasoning is multifaceted and complex. This research examines the winning probabilities across various scoring scenarios, using data from thousands of professional tennis matches and comparing them with theoretical models generally aligned with spectators' common beliefs. The analysis reveals that the theoretical model makes accurate probability predictions at the macro level but inaccurate predictions at the micro level, pointing to possible biases in micro-level probabilistic reasoning. A recent behavioral economic theory may help explain the causes of such biases. Biases are generally seen as undesirable errors, but this study offers a counterargument that biases in micro-level probabilistic reasoning actually enhance the enjoyment of tennis matches by creating expectations, anxiety, and surprises.
Python scripts and notebooks for analysis and modeling.
Tennis_Analytics.ipynb: This program loads professional tennis match data from tennisabstract.com, cleans the data, and calculates the win probability for all possible game-score and point-score combinations. The goal is to provide empirical win rates for different score combinations and compare them with the win rates generated by a theoretical model.tennis_probabilities.py: This program calculates a player's game and set winning probabilities for different game and point score combinations, based on the player's service and return point winning probability. This method is commonly used to predict the game and set winner in online sports betting. The win probability data generated from this model will be compared with the empirical win probability data calculated from the pro tennis match data. The goal is to check if the model makes accurate predictions.
📂 data/
Cleaned and processed match datasets.
- Contains point-by-point and match-level data for both Men's (ATP) and Women's (WTA) tennis.
- See
data/README.mdfor detailed file descriptions.
📊 results/
Output files and verification benchmarks.
- Contains CSV comparisons of win probabilities used to verify the Python model against theoretical baselines.
- See
results/README.mdfor details.