Methodology
Every module is built on the same expected-value framework. Every shortcut is stated plainly. A clear limitation beats a suspiciously clean black box.
The shared framework
Baseball Savant grades player ability— given a runner's tools, did he execute? We grade the decision layer: given what was knowable at the moment of the call, was the decision correct in expected-value terms?
These are distinct questions. A talented runner sent on a 50/50 opportunity produces the same decision grade regardless of whether he happens to be safe or out. This is not a player evaluation tool.
Every module follows the same three steps:
- Identify the decision. Isolate plays where a discrete, attributable in-game call was made (send/hold, steal/no-steal, etc.) and the outcome is observable.
- Compute the break-even. Using a 24-state RE24 run-expectancy table built from 2020–2024 Statcast data, find the success probability at which the risky action produces equal expected run value to the conservative alternative:
P_be = (RE_hold − RE_out) / (RE_safe − RE_out)
- Estimate P(success) and grade. We use an empirical bin approach: divide historical plays into bins defined by observable pre-play factors and compute the observed success rate within each bin. This avoids the selection bias of training a model only on cases where the risky action was attempted. Good decision = P(success) ≥ P_be.
Run value
Each graded decision carries a run value:
run_value = P(success) × RE_success + (1 − P(success)) × RE_failure − RE_hold
Positive run value = decision added expected runs relative to the alternative. Negative = runs left on the table (or needlessly risked). Leaderboards normalize this to run value per 100 decisions to account for differing opportunity counts.
Module-specific methodology
Each module applies the shared framework to a specific decision type with its own data sources, bin structure, and known limitations.
Send/Hold Grader
Was the third-base coach's send or hold call correct by expected value?
Steal Attempt Grader
Was each stolen base attempt above the RE24 break-even success rate?
IBB Decision Grader
Did the matchup gain from the intentional walk justify its run-expectancy cost?
Common data sources
- Play-by-play (batting events): MLB Statcast via pybaseball (2020–2024).
- Play-by-play (steal events): MLBAM Stats API (
/api/v1/game/{game_pk}/playByPlay) — Statcast does not surface between-pitch steal events. - Runner sprint speed: Baseball Savant sprint speed leaderboard via pybaseball.
- Catcher pop time: Baseball Savant pop time leaderboard CSV (2020+ only).
- Outfielder arm strength: Baseball Savant arm strength leaderboard CSV (2020+ only — reason v1 is bounded to 2020–2024).
- Run expectancy: 24-state RE24 table computed from 2020–2024 Statcast data.