Steal Attempt Grader — Methodology

Detailed data sources, modeling choices, approximations, and known limitations for the stolen base decision module.

What we measure

Every stolen base attempt carries a break-even success rate — the threshold at which the attempt produces equal expected run value to not going. We estimate each runner's true probability of success given observable pre-play factors, compare it to the break-even, and grade the attempt accordingly.

This grades the decision to attempt, not the execution. A fast runner who attempts against an elite catcher in a situation where the break-even is 80% and his estimated P(safe) is 70% receives a BAD_STEAL regardless of whether he happens to be safe.

Scope

Situations graded: Stolen base attempts of 2B and 3B only. Steal of home is excluded from v1 (rarer, more situationally complex, different decision-maker).
Includes: Successful steals, caught stealing, and pickoff-caught-stealing plays.
Years: 2020–2026 MLB regular season. 2026 is in progress; entries are flagged Live in the leaderboard.
Total graded (2020–2025): 16,526 attempts across 5 completed seasons.

Data source — why not pybaseball?

Steal events are not available in Baseball Savant's CSV export or pybaseball's statcast(). That feed is pitch-level: each row represents a pitch. Stolen bases that happen between pitches — the runner breaks on a pitch, the ball is not put in play — do not appear as their own row. They show up only as a change in the base state between consecutive pitches, with no attributable play row.

We use the MLBAM Stats API (/api/v1/game/{game_pk}/playByPlay) instead. This endpoint returns a structured play-by-play where steal events appear inside each at-bat's runners array with a details.eventType field (e.g., stolen_base_2b, caught_stealing_3b). We fetch all ~2,400 games per season concurrently using 15 parallel workers (~2 min/season).

Merging steal events with Statcast context

The MLBAM API provides the steal event and runner identity. We join each steal event to the corresponding Statcast at-bat (via game_pk and at_bat_number, where MLBAM's 0-based atBatIndex + 1 = statcast's 1-based at_bat_number) to obtain base-out state, catcher identity, and other context.

Break-even probability — the double-steal problem

RE24 break-even requires knowing the base-out state before the steal. Statcast records base state at the start of the at-bat, not at the moment of the steal mid-at-bat. For most plays this is fine — but double steals create a problem.

When runners on 1B and 2B attempt to steal simultaneously, Statcast's at-bat-start state shows both bases occupied. The standard check "is the target base empty?" would incorrectly flag the steal of 2B as having on_2b = 1 and return null break-even, dropping ~30% of steal-of-2B attempts.

Fix — definitional pre-steal state: We assert the state the steal type requires by rule, regardless of what Statcast records for the at-bat start:

Steal of 2B → on_1b = 1, on_2b = 0 (definitionally)
Steal of 3B → on_2b = 1, on_3b = 0 (definitionally)
Uninvolved runners (e.g., on_3b for a steal-of-2B) are taken from Statcast as-is.

After this fix, break-even coverage is 100% (16,526 / 16,526 graded attempts).

P(safe) — empirical bin approach

We use a 27-bin empirical model: runner speed tier (fast/medium/slow) × catcher pop time tier (fast/medium/slow) × outs (0/1/2).

Runner speed tier: Tertile split of Baseball Savant sprint speed (ft/s) by season. Fast = top third, slow = bottom third.
Catcher pop time tier: Tertile split of Baseball Savant pop time (2B SBA time for steal-of-2B; 3B SBA time for steal-of-3B). Fast = lowest pop time (best catcher, hardest to steal on), slow = highest.
Outs: 0, 1, or 2.

Bins with fewer than 30 observations fall back to the outs-level marginal P(safe) across all speed/pop tiers. No bin is dropped entirely.

Overall success rate across all 22,735 graded attempts: 78.7%, consistent with the MLB new-rule era. 67.3%of attempts were GOOD_STEAL (P(safe) > P_be). Mean run value per attempt: +0.014.

Full 27-bin empirical P(safe) table:

Runner speed	Catcher pop	Outs	n	P(safe)
Fast	Slow	2	878	0.858
Fast	Slow	0	548	0.839
Fast	Slow	1	801	0.838
Medium	Slow	2	1,068	0.834
Fast	Medium	2	973	0.825
Fast	Medium	0	568	0.822
Slow	Slow	2	1,120	0.801
Medium	Fast	2	1,058	0.800
Medium	Slow	0	588	0.803
Slow	Medium	2	1,169	0.800
Medium	Medium	2	1,132	0.807
Fast	Fast	2	954	0.814
Medium	Medium	1	933	0.774
Slow	Slow	0	483	0.774
Fast	Fast	1	1,109	0.773
Medium	Medium	0	599	0.758
Medium	Fast	1	1,132	0.767
Medium	Slow	1	925	0.781
Slow	Slow	1	954	0.740
Slow	Medium	0	497	0.740
Slow	Medium	1	957	0.721
Fast	Fast	0	521	0.760
Medium	Fast	0	492	0.732
Fast	Medium	1	911	0.786
Slow	Fast	2	994	0.786
Slow	Fast	1	981	0.730
Slow	Fast	0	390	0.685

P(safe) range: 0.685 (slow runner vs. fast catcher, 0 outs) to 0.858 (fast runner vs. slow catcher, 2 outs). The 2-out elevation is consistent across all speed/pop tiers — runners and managers take more risks with 2 outs, and the actual success rate is also higher (catcher throw accuracy may decline when the out is harder to record). No bin had fewer than 30 observations; no fallback to marginal rate was triggered.

Catcher pop time — data and imputation

Pop time data comes from the Baseball Savant pop time leaderboard CSV (2020+ only — a key reason the module is bounded to 2020–2024). Pop time is a season-level average per catcher, not a play-level measurement.

For catchers with fewer than 10 steal opportunities in a given season (insufficient to establish a reliable pop time), we impute with the league-average pop time for that season. These entries are not separately flagged in the leaderboard because the imputation affects P(safe) at the individual-play level, not the aggregate.

For successful steals (no caught-stealing), catcher identity comes from the Statcast fielder_2 field. For caught-stealing plays, catcher identity comes from the MLBAM creditsarray (position code "2"). Credits take priority when both are available.

Break-even probability (RE24)

P_be = (RE_hold − RE_out) / (RE_safe − RE_out)

RE states reflect the definitional pre-steal base state described above. For a steal of 2B, RE_hold uses the state with the runner on 1B; RE_safe uses the runner now on 2B; RE_out uses the runner retired with an out added. Uninvolved runners (e.g., a runner on 3B) are held constant across all three states.

Grading logic

GOOD_STEAL: Empirical P(safe) ≥ P_be — the attempt was positive expected value.
BAD_STEAL: Empirical P(safe) < P_be — the attempt was negative expected value.

Run value = P(safe) × RE_safe + (1 − P(safe)) × RE_out − RE_hold. Positive = attempt added expected runs. Negative = attempt cost expected runs relative to not going.

65.9% of all 2020–2024 steal attempts were GOOD_STEAL. Mean run value per attempt: +0.014 (slight positive overall — the new-rule era has made stealing more viable on average).

Aggregation and leaderboards

The team-year leaderboard (150 team-seasons) reports run value per 100 attempts as the primary metric. Entries with fewer than 50 attempts are flagged Low sample.

The runner career leaderboard (820 runners with ≥ 1 attempt, 2020–2024) aggregates across all seasons. Runners with fewer than 50 career attempts are flagged Low sample. This module grades the runner's decision to go, not a coach or manager — the runner initiates the steal.

Data sources

Steal events: MLBAM Stats API play-by-play, fetched for all ~12,000 games (2020–2024).
Base-out state context: MLB Statcast via pybaseball, first pitch per at-bat.
Runner sprint speed: Baseball Savant sprint speed leaderboard via pybaseball.
Catcher pop time: Baseball Savant pop time leaderboard CSV (2020+ only).
Run expectancy: 24-state RE24 table computed from 2020–2024 Statcast data.

Known limitations

▲Runner speed and catcher pop time are season averages, not play-level measurements. In-season fatigue, injury, and handedness matchups are not captured.
▲Pitcher delivery time is not modeled. A slow-to-the-plate lefty gives the runner a meaningful head start. This data is not in the public MLBAM feed at the play level.
▲Steal of home is excluded. Home steals involve a fundamentally different decision-maker (the runner and/or manager) and situational complexity (pitcher windup, squeeze play). Excluded from v1.
▲Double-steal base state uses the definitional approach. When two runners steal simultaneously, we assert the base state each steal requires by rule, which is correct for the primary runner but slightly approximates the true simultaneous-movement state for the partner runner.
▲Pop time imputed for low-volume catchers. Catchers with fewer than 10 steal attempts in a season use league-average pop time. This moderates the grading of attempts against backup catchers with thin samples.
▲2020 entries carry higher uncertainty due to the 60-game shortened season.
▲Empirical P(safe) uses bin averages.Within-bin variation (e.g., a "fast" runner at the top of the tier vs. the middle) is not captured. The empirical approach trades granularity for freedom from selection bias.

Methodology overview View the Steal Attempt Grader →