Guide · Polymarket

Brier Score for Prediction Markets

What it measures, how to read a calibration plot, and how to use both to optimize a Polymarket bot.

The math

Brier score is the mean squared error between your bot's predicted probability for each YES outcome and the realized outcome:

Brier = (1/N) * Σ (predicted_prob_i - realized_i)²

Where predicted_prob_i is the price you paid for YES (Polymarket prices ∈ [0, 1] are direct probability quotes), realized_i is 1 if YES paid out and 0 otherwise, and N is the number of resolved buy trades.

Lower is better. A bot that always bets YES at 0.50 on coin-flip markets scores 0.25 (the maximum random score). A bot that's perfectly correlated with truth scores 0. Most well-tuned Polymarket strategies live in the 0.15-0.20 range on diverse universes.

Why not just look at win rate?

Win rate is the % of trades that ended in the money. It tells you whether your bot was directionally right but not whether theprice you paid was reasonable. Two bots can both have 65% win rate but very different Brier scores — the one with stronger probabilistic calibration is the one you want.

Brier explicitly rewards bots that get the magnitude of their confidence right. Paying $0.05 for a YES that wins is rewarded more than paying $0.45 for the same YES — even though both end in the money.

The calibration plot

Brier collapses everything into one number — useful for ranking, useless for diagnosing. The companion visual is the calibration plot: predicted-probability buckets on the x-axis, actual frequency on the y-axis. A perfectly calibrated bot lies on the 45° diagonal.

Bagtester computes 10 buckets of width 0.1. Each shows how often the YES outcome actually happened among trades that the market priced at that probability range. Bubble size = sample count per bucket.

Reading the plot:

Bubbles above the diagonal: your bot under-priced YES — these were actually winners more often than the market thought.
Bubbles below the diagonal: your bot over-priced YES — these were losers more often than the market thought.
Tight clusters of bubbles near the diagonal: your bot is well-calibrated; the market is efficient on these prices and you're not finding edge.
Sparse bubbles far from the diagonal: small sample size in those buckets — don't over-interpret.

Using Brier to optimize a bot

Three productive ways to use Brier when iterating on a strategy:

Compare to the market baseline. Always run a "buy at price = 1 - implied prob" baseline (use the Sports Favorite template at threshold 0.5 with a flip). If your bot beats that on Brier, you have edge.
Slice by tag. Check the by_tag Brier breakdown in the result. If your bot has Brier 0.18 on sports but 0.30 on geopolitics, focus on sports — your strategy is finding edge there and not in the other category.
Filter the calibration plot. If the 0.20-0.30 and 0.40-0.50 buckets are well-calibrated but 0.60-0.70 is consistently below the diagonal, drop trades in that price range (your bot over-confident on near-favorites).

Log-loss as a sibling metric

Bagtester also returns summary.log_loss, defined as:

log_loss = -(1/N) * Σ [r·log(p) + (1-r)·log(1-p)]

It penalizes extreme confident wrong predictions more harshly than Brier. If your bot tends to bet very small probabilities (say YES at $0.02), use log-loss alongside Brier — the asymmetric penalty surfaces strategies that fail catastrophically on rare high-confidence misses.

Next steps

Start with one of the ten templates in polymarket_list_strategy_templates. Run it on a tightly-scoped universe (one tag, ≥$100k volume, 2024 only). Read the resulting Brier + calibration plot. Iterate on the threshold until the calibration plot tightens up. Then broaden the universe and re-test.

Backtest a Polymarket bot Connect Claude Code