Boxscorus | Introducing BXS MLB v2

mlb

Introducing BXS MLB v2

BXS meets expected runs: a smarter way to predict baseball

Greg Lamp · March 3, 2026

The original Boxscorus model asked a simple question: which team wins? The new model asks a better one: how many runs does each team score?

That shift changes everything. Instead of flipping a weighted coin to decide each simulated game, Model v2 generates actual run totals for both sides using a Negative Binomial distribution calibrated to real MLB scoring patterns. The result is a system that not only picks winners, but also projects final scores, run distributions, and over/under probabilities for every game on the schedule.

Here's what changed, why it matters, and how the pieces fit together.

The Original Model: BXS v1 and a Coin Flip

The v1 model was a straightforward Elo rating system. Each team started at 1500. Win a game, gain points. Lose, drop points. Beat a strong team, gain more. Lose to a weak one, drop harder. The system accounts for home field advantage (24 BXS points), travel distance (cube-root penalty based on coordinate distance), and rest days (up to 6.9 points for three days off).

To simulate a season, the model computed each team's win probability from the BXS gap, then flipped a weighted coin thousands of times across every remaining game. This approach works well for predicting playoff odds and win totals. FiveThirtyEight's MLB model used a similar methodology for years.

But it has a blind spot. A coin flip tells you who wins. It says nothing about the score.

Why Runs Matter More Than Wins

If you know the Dodgers have a 58% chance of beating the Rockies tonight, that's useful. But if you know the model projects the Dodgers to score 5.2 runs against Colorado's pitching staff while the Rockies manage 3.8 runs at Coors Field, that's a different level of detail. You can start answering questions like:

What's the probability the total goes over 8.5 runs?
How often does the underdog win by 3+ runs?
Is a -1.5 run line a good bet on the favorite?

Model v2 answers all of these. Every simulated game produces two run totals, not just a winner.

The Negative Binomial: Baseball's Natural Distribution

The naive approach would be to model runs with a Poisson distribution. After all, runs are discrete counts, and Poisson is the textbook choice for "how many events in a fixed period."

The problem: baseball runs are overdispersed. In a Poisson distribution, the variance equals the mean. In real MLB games, the variance is about 2.37 times the mean. Teams don't just score 4 runs on average with normal Poisson scatter. They score 0 or 1 in some games and explode for 10+ in others, more often than Poisson predicts. Blowouts and shutouts happen more frequently than a Poisson model allows.

The Negative Binomial distribution handles this naturally. It has two parameters instead of one, letting it capture both the expected scoring rate and the extra variance. The model uses an overdispersion parameter of 2.37 (the OVERDISPERSION constant), derived from historical MLB run-scoring data. Given the expected runs for each team, the model converts to Negative Binomial parameters:

Parameter	Formula	Role
p	1 / 2.37 = 0.42	Success probability per "trial"
r	expected_runs * p / (1 - p)	Number of successes before stopping
Mean	expected_runs	Same as the input
Variance	expected_runs * 2.37	Wider than Poisson

This is the same distribution family that FanGraphs community research and Sean Dolinar's work have identified as the best fit for MLB run scoring. The extra variance means the model properly handles the fat tails of baseball: the games where a team puts up 12 runs or gets shut out.

When two simulated teams tie, the model breaks it with a BXS-weighted coin flip, reflecting team quality without artificially extending games into fake extra innings.

Run Distribution: Poisson vs. Negative Binomial (Mean = 4.4 runs)

The Negative Binomial's fatter tails capture real baseball: more shutouts and more blowouts than Poisson predicts.

Computing Expected Runs: Four Ingredients

The expected runs for each team in a game come from four multiplicative factors, all anchored to the league average of 4.40 runs per game:

1. Team Offense Factor (OPS-based)

The model tracks every team's cumulative OBP and SLG across the season, then computes OPS relative to the league average. A team with a 1.05 offense factor is hitting 5% better than average. The key detail: this uses a Bayesian prior of 40 games at league-average performance. Early in the season, when a team has only played 10 games, the prior pulls their offensive rating toward the mean. By mid-season, the prior's influence fades and the team's actual performance dominates.

2. Opponent Pitching Factor (FIP-based, SP/RP split)

Rather than treating pitching as a monolith, the model tracks starting pitchers and relievers separately. It computes FIP (Fielding Independent Pitching) for each role using the standard formula: (13HR + 3(BB+HBP) - 2*K) / IP + 3.10. A team whose starters have a FIP of 3.50 when the league average is 4.00 gets a pitching factor of 4.00/3.50 = 1.14, meaning they suppress runs by 14%.

The starter and bullpen factors are weighted 55/45 to reflect the modern game, where relievers throw nearly half the innings. A Bayesian prior of 30 games keeps early-season estimates stable.

3. Park Factor

Coors Field is not Petco Park. The model computes park factors from three years of historical venue data, comparing each stadium's average total runs per game to the league average. A park factor of 1.12 means 12% more runs score there than average. This multiplier applies to both teams equally.

4. ELO Adjustment

The ELO gap between teams provides a final scaling factor via the formula 10^(elo_diff / 4000). This is a gentle logarithmic curve. A 50-point BXS edge translates to roughly a 3% run-scoring boost. It's subtle but meaningful over 10,000 simulations.

Home teams also receive a flat +0.17 run bonus (with away teams getting -0.17), reflecting the empirical home-field scoring advantage.

Park Factors: How Venues Shift Expected Runs

A park factor above 1.0 inflates scoring; below 1.0 suppresses it. Coors Field adds 20% more runs than average.

The final formula:

expected_runs = 4.40 * offense_factor * opp_pitching * park_factor * elo_adj + home_adj

Pitcher Adjustments: Aces and Spot Starters

BXS treats every game as team-vs-team. But the Yankees with Gerrit Cole on the mound are a meaningfully different team than the Yankees with their fifth starter. The model adjusts for this.

Every starting pitcher gets a rolling quality score based on Bill James' Game Score, which boils each outing down to a single number. A score of 47 is league average. A dominant start lands around 70. A rough one sits below 30. The model tracks each pitcher's average across their starts, blended with a prior of 10 league-average appearances to keep early-season estimates stable.

The adjustment works by comparing a pitcher to their own team's average. If a starter's rolling Game Score is 55 and the team average is 47, that 8-point gap translates to roughly a 38-point BXS boost for the game — enough to shift win probability by about 5 percentage points. A below-average starter flips the sign: a Game Score of 40 against a team average of 47 costs the team about 33 BXS points for that matchup.

When you see tomorrow's pitching matchup and one team is sending out their ace while the other rolls with a spot starter, the model already accounts for that gap. The projected run totals and win probabilities reflect who's actually on the mound, not just which jersey they're wearing.

Bayesian Tracking: Stable Early, Responsive Late

A recurring theme across the entire model is Bayesian regression. Every tracked quantity uses the same pattern: accumulate real observations, blend them with a prior of league-average performance, and let the data gradually take over.

Component	Prior Weight	What It Anchors
Pitcher game score	10 starts	Individual SP quality
Team pitching (FIP)	30 games	SP and RP effectiveness
Team offense (OPS)	40 games	Batting quality

This design means that on Opening Day, every team looks similar. The model knows it doesn't have enough data yet. By the All-Star break, priors contribute less than 25% of the estimate. By September, they're noise.

Between seasons, the model carries forward 50% of accumulated stats and regresses BXS ratings two-thirds toward 1500. A team that finished with a 1560 BXS enters the next season at 1520. Last year matters, but it doesn't define you.

Game Distributions: Beyond Win Probability

For every scheduled game, Model v2 runs 10,000 Negative Binomial simulations and produces a full distribution of outcomes. The output includes:

Expected runs for each team
Win probability for each side
Run distribution (probability of scoring 0, 1, 2, ... 15+ runs)
Over/under probabilities for common totals (7.5 and 8.5)
Mean run differential

This is the kind of output that used to require a paid subscription to a sports analytics platform. The model generates it for every game, every day, updated after each result.

The 2026 Preseason Picture

With spring training underway, the model's preseason BXS ratings reflect last year's results regressed toward the mean. Here's where things stand:

Team	BXS	World Series %
Phillies	1514	11.4%
Dodgers	1515	10.5%
Yankees	1514	8.8%
Mariners	1508	5.8%
Red Sox	1508	5.4%
Padres	1508	4.5%
Brewers	1514	3.8%
Guardians	1508	3.4%
Cubs	1509	3.3%
Blue Jays	1512	2.8%

Source: Boxscorus 10,000-season Monte Carlo simulation using 2025 BXS ratings regressed two-thirds toward 1500.

2026 Preseason World Series Odds

The Phillies and Dodgers are nearly even at the top. The field drops off sharply after the top three.

The top BXS ratings are tightly clustered (a 50-point spread across all 30 teams), which is exactly what you'd expect after two-thirds regression. Notice that the Phillies lead in World Series probability despite being slightly behind the Dodgers in raw BXS. That's the simulation engine at work: division strength, schedule, and opponent matchups all flow through the 10,000-season Monte Carlo.

At the bottom, the Rockies (1465 BXS, 0.0% World Series) and White Sox (1479, 0.1%) are the teams the model is most bearish on. Even with heavy regression, last season's results leave a mark.

What the Model Sees That BXS v1 Alone Misses

The most interesting outputs from v2 aren't the win probabilities. They're the places where the run-scoring model disagrees with a simple BXS rating.

Park factors reshape the same matchup. Take the Dodgers. In a mid-season game at Coors Field (park factor ~1.20) against the Rockies, the model projects roughly 6.4 runs for Los Angeles and 4.5 for Colorado, a combined 10.9-run game. Move that same Dodgers team to Oracle Park in San Francisco (park factor ~0.92), and the projected total drops to about 7.7 runs. That 3.2-run swing comes entirely from venue. The BXS coin-flip model gives you a win probability and nothing else. The v2 model tells you the total at Coors has a 65% chance of going over 8.5, while at Oracle Park it's under 7.5 more than half the time.

Low-scoring games are where v2 earns its keep. In a pitchers' duel at a suppressive park, the Negative Binomial distribution reveals something counterintuitive: tight games are simultaneously more likely and less predictable. In a game between two league-average teams at Oracle Park (expected scoring around 3.6 runs per side after park suppression), the model projects a shutout by one side in about 17% of simulations and a one-run game in 21%. The overdispersion parameter captures this. When expected scoring is low, the variance-to-mean ratio pushes the probability mass toward zero, making shutouts far more common than a Poisson model would suggest. A v1 coin flip says "Team A wins 55%." The v2 model says "there's a one-in-six chance someone throws a shutout, and one-in-five this game is decided by a single run."

Division context matters more than raw BXS. The preseason projections reveal a pattern that pure ratings miss. The Astros, who sit just outside the top 10 in BXS at 1506, illustrate this well. They enter 2026 two points lower than the Guardians at 1508. But the Astros' World Series probability (6.1%) nearly doubles Cleveland's (3.4%). Why? Division strength. The Guardians compete with the Twins and Royals in the AL Central, where multiple teams are clumped near 1490-1500 and can steal divisional games. The Astros' primary competition in the AL West is the Mariners, and a weaker remaining field gives Houston a 41% shot at the division title versus Cleveland's 29%.

The full picture, in one table:

Team	BXS	WS %	Div %	Division
Astros	1506	6.1%	41.0%	AL West
Guardians	1508	3.4%	28.6%	AL Central
Cubs	1509	3.3%	28.3%	NL Central
Mets	1501	4.8%	21.4%	NL East
Braves	1499	4.8%	22.5%	NL East

Source: Boxscorus 10,000-season Monte Carlo simulation.

The simulation captures these interactions because it plays out every remaining game with actual run totals, not isolated coin flips. Division rivals face each other 13 times. Those repeated matchups compound in ways that a single win-probability number can never reflect.

What to Watch For

As the 2026 season gets underway, keep an eye on how quickly the model's team strength estimates diverge from the preseason cluster. By late April (roughly 25 games in), the Bayesian priors start releasing their grip and real performance drives the projections.

The biggest early-season edge often comes from the pitcher tracker. While team offense and pitching factors need 30-40 games to stabilize, individual pitcher adjustments can meaningfully shift game probabilities after just 5-6 starts. If a team's ace is dealing or their number five starter is getting shelled, the model picks up on it before the season-level metrics catch up.

Check back at boxscorus.com for daily updated projections, game distributions, and the full standings simulation.