Boxscorus | How the Boxscorus World Cup Simulator Actually Works

soccer

How the Boxscorus World Cup Simulator Actually Works

Rate 48 teams, roll real scorelines, and play the whole tournament 100,000 times. Here is the machine, one layer at a time.

Greg Lamp · June 7, 2026

Posts / SOCCER

I built the World Cup simulator because I got tired of arguing about the wrong question. People ask "who's the best team?" as if that settles it. It doesn't. The best team loses all the time, and a 48-team tournament gives them a lot of chances to do it.

Think about what you are actually trying to forecast. Twelve groups, then a knockout bracket, then a champion. Every match has three possible outcomes, and the bracket branches at every round. The number of distinct ways the 2026 World Cup can unfold is astronomically large. No single "who's best" guess can describe that. You cannot eyeball a tree with that many branches.

So the simulator does not guess. It plays the tournament 100,000 times and counts what happens. This is how it works, one layer at a time.

The Fix: Two Layers, a Rating and a Dice Roll

The engine has exactly two jobs. First, rate every team so it knows who is better and by how much. Second, use those ratings to roll the outcome of a match, then chain thousands of those rolls into a full tournament.

Layer one is slow and stable. Ratings are built from years of real results and barely move from one game to the next. Layer two is fast and random. Given two ratings, it produces a scoreline in microseconds, and it produces a different one every time you ask. Stack 64 matches into a bracket, run that bracket 100,000 times, and the randomness averages into probabilities.

Start with the rating.

Layer One: Rate Teams by How Much the Game Mattered

The rating is an international football ELO system. Win and your number goes up. Lose and it drops. The size of the move depends on how much the match mattered, controlled by a competition-weighted K-factor.

Match type	K-factor	Why
Friendly	15	Rotated squads, low stakes
Qualifier / Nations League	25	Real lineups, real points
Biennial regional (AFCON, Gold Cup)	15	Frequent, uneven fields
Continental (Euros, Copa, Asian Cup)	30	Best-on-best, every four years
World Cup	35	The biggest results count most

Beating Brazil in a World Cup knockout (K=35) swings your rating more than twice as hard as beating them in a March friendly (K=15), and that is correct. Teams experiment in friendlies and go all out in tournaments, so the model trusts tournament results more.

Two other knobs finish the rating. Home advantage is worth 50 ELO points, lower than club football because so many international games are at neutral sites. And between tournament cycles, ratings regress 10% toward the mean of 1500, a roughly seven-year half-life. National rosters turn over slowly, so the model keeps ratings sticky instead of resetting them.

Soccer Has a Third Outcome: The Draw

Before any match can be rolled, the model has to handle something baseball and basketball never deal with. Those sports cannot tie. Soccer ties constantly, close to a quarter of evenly matched games.

A plain ELO only knows win probability. It has no slot for the draw, so it quietly mismodels every tight match. The fix is Davidson's 1970 paired-comparison model, which carves out a third outcome with one tunable knob. The model sets that draw margin at 0.36, producing about a 26% draw rate between equal sides. Get it wrong and your group tables drift from reality. Get it right and a 1-1 nobody expected reshapes a group, exactly like the real thing.

Layer Two: Simulate the Score, Not Just the Winner

A coin flip tells you who wins. It says nothing about the score. That gap is fatal in the group stage, where standings break ties on goal difference first, then goals scored. You cannot rank a group correctly without actual scorelines.

So the model rolls goals with a Poisson process. Each team's expected goals scale off the rating gap:

Parameter	Value	What it controls
Mean goals per side	1.3	Baseline scoring rate
ELO scaling factor	600	How a rating gap bends expected goals
Goal multiplier	exp(elo_diff / 600)	The favorite's scoring boost
Home goal bump	+0.1	Small edge in non-neutral group games

Numbers in the abstract are easy to nod along to and hard to actually picture. So let me run one match through the whole machine.

Follow One Match: Spain vs Switzerland

Take Spain (ELO 1789) against Switzerland (1612) at a neutral venue, so no home bump. Here is every step the engine takes.

1. Measure the gap. Spain is 177 rating points ahead.

2. Turn the gap into expected goals. The multiplier is exp(177 / 600) = 1.34. So Spain's expected goals are 1.3 × 1.34 = 1.75, and Switzerland's are 1.3 ÷ 1.34 = 0.97. The better team is favored to score more, but not by a blowout margin.

3. Roll the dice. The engine draws each team's goal count from a Poisson distribution around those expected values. Poisson is lumpy, so this is genuinely random. In this particular run, say it comes up Spain 2, Switzerland 1.

4. Record what matters. That 2-1 gives Spain 3 points, a +1 goal difference, and 2 goals scored. Those three numbers feed straight into the group tiebreakers.

5. Remember it is one roll, not the truth. Run this same match thousands of times and Spain wins 55.7%, draws 23.5%, and loses 20.8%. Even a clear favorite at neutral goes home unhappy one time in five. That one-in-five is the entire reason the trophy is hard to predict, and it only shows up because the model rolls the match instead of declaring a winner.

Follow One Tournament: Group Stage to Final

One match is a single roll. A tournament chains them. Here is how the engine plays all the way through once.

Group stage. Each of the 12 groups has four teams playing a round robin, six matches per group. The engine rolls every match with the Poisson model above, then sorts each group by points, then goal difference, then goals scored. The top two in every group advance.

The eight best thirds. This is the new 48-team wrinkle. The 12 third-place teams go into a pool, and the eight best of them also advance. That fills the bracket to 32 teams. The model evaluates all 495 ways those eight slots can be assigned (that is C(12,8)) every single run, because which third-place teams survive changes who plays whom.

The full Boxscorus 2026 World Cup forecast board with all 48 teams and 12 groups

All 48 teams and 12 groups on one board. Top two per group plus the eight best third-place teams make the Round of 32.

Knockouts. From the Round of 32 onward, the engine rolls the same Poisson match, with one extra rule: nobody advances on goal difference. If 90 minutes end level, a penalty-shootout model decides it, a weighted coin flip where the better-rated team gets a slight edge. Somebody has to win. That is why even great teams bleed probability late, because every knockout round is another flip they have to survive.

Ratings drift as the run goes. Most simulators freeze every rating at kickoff. This one does not, and it is the choice I would defend hardest. After each decisive knockout result, the winner's rating ticks up and the loser's down, using a small in-run K-factor of 12. Two evenly matched teams each have a win expectation of 0.5, so the winner gains 12 × (1 − 0.5) = 6 points and the loser drops 6. A team grinding through the bracket gets microscopically stronger as it goes.

Why bother? Because real tournaments have streaks, and a frozen-rating model erases them. It treats Croatia reaching back-to-back finals as pure luck. The effect compounds, which is the whole point. An underdog that pulls off two upsets has clawed back about 12 rating points (6 per win), so its next match tilts a little more its way, and the round after that more still. A favorite that scrapes through on penalties gets no bump at all, because a draw moves nobody. Stack those small per-round swings across a five-round bracket and the math quietly makes long runs less rare than a frozen model allows. That is what fattens the tails of the distribution: more Cinderella runs, more favorite collapses, which is what actually happens at World Cups. I will be honest that this is a modeling choice, not a law of nature. Turn it off and the favorites look a touch safer and the long shots a touch deader.

Play that whole chain once and you get one champion. The single run tells you almost nothing. So the engine does it 100,000 times.

Run It 100,000 Times: What the Engine Outputs

After 100,000 tournaments, the model just counts. How often did each team escape its group, reach the quarterfinal, win the whole thing? Those counts are the forecast. Here is the top of the board.

Team	BXS ELO	Reach Final	Champion
Spain	1789	20.3%	13.5%
France	1751	16.0%	9.6%
England	1719	11.2%	6.3%
Argentina	1710	10.0%	5.7%
Portugal	1705	9.7%	5.5%
Netherlands	1688	9.0%	4.7%

Boxscorus model output, 100,000 simulations (src/data/worldcup-tournament.json). The live site rounds the champion column to whole percent.

The real payoff is watching a single team's odds get built round by round. Spain is the model's favorite, and even Spain has to survive the whole gauntlet.

Round Spain reaches	Share of runs
Round of 32 (escapes the group)	93.7%
Round of 16	66.9%
Quarterfinal	46.9%
Semifinal	33.6%
Final	20.3%
Champion	13.5%

Spain's path to the trophy, round by round, from the same 100,000 runs.

Boxscorus 2026 World Cup forecast board showing each team's odds of reaching every round

The live board shows a probability for every round, from group survival to the trophy.

Read that path top to bottom and you can watch the gauntlet bite. Spain clears the group nearly every time. By the quarterfinal they are a coin flip. The drop from reaching the final (20.3%) to winning it (13.5%) is the last opponent taking a third of Spain's would-be trophies. A 13.5% favorite in a 48-team field is not a weak forecast. It is a strong one, and it still means 86.5% of runs end with someone else lifting the cup.

Now Change a Result

Everything above assumes no games have been played. The moment you give the engine a known result, it re-runs all 100,000 tournaments around it. That is the part worth playing with.

Go open the simulator and run your own scenarios. Lock Spain into a group-stage exit and watch their path table collapse. Hand Argentina a soft bracket. Knock out your rival and see every other team's odds shift in response. A static forecast tells you what the model thinks today. A simulator lets you ask what if, which is the only honest way to think about a tournament with this many branches.

Open the World Cup Simulator →