Guide · Polymarket
Brier Score for Prediction Markets
What it measures, how to read a calibration plot, and how to use both to optimize a Polymarket bot.
The math
Brier score is the mean squared error between your bot's predicted probability for each YES outcome and the realized outcome:
Brier = (1/N) * Σ (predicted_prob_i - realized_i)²
Where predicted_prob_i is the price you paid for YES (Polymarket prices ∈ [0, 1] are direct probability quotes), realized_i is 1 if YES paid out and 0 otherwise, and N is the number of resolved buy trades.
Lower is better. A bot that always bets YES at 0.50 on coin-flip markets scores 0.25 (the maximum random score). A bot that's perfectly correlated with truth scores 0. Most well-tuned Polymarket strategies live in the 0.15-0.20 range on diverse universes.
Why not just look at win rate?
Win rate is the % of trades that ended in the money. It tells you whether your bot was directionally right but not whether theprice you paid was reasonable. Two bots can both have 65% win rate but very different Brier scores — the one with stronger probabilistic calibration is the one you want.
Brier explicitly rewards bots that get the magnitude of their confidence right. Paying $0.05 for a YES that wins is rewarded more than paying $0.45 for the same YES — even though both end in the money.
The calibration plot
Brier collapses everything into one number — useful for ranking, useless for diagnosing. The companion visual is the calibration plot: predicted-probability buckets on the x-axis, actual frequency on the y-axis. A perfectly calibrated bot lies on the 45° diagonal.
Bagtester computes 10 buckets of width 0.1. Each shows how often the YES outcome actually happened among trades that the market priced at that probability range. Bubble size = sample count per bucket.
Reading the plot:
- Bubbles above the diagonal: your bot under-priced YES — these were actually winners more often than the market thought.
- Bubbles below the diagonal: your bot over-priced YES — these were losers more often than the market thought.
- Tight clusters of bubbles near the diagonal: your bot is well-calibrated; the market is efficient on these prices and you're not finding edge.
- Sparse bubbles far from the diagonal: small sample size in those buckets — don't over-interpret.
Using Brier to optimize a bot
Three productive ways to use Brier when iterating on a strategy:
- Compare to the market baseline. Always run a "buy at price = 1 - implied prob" baseline (use the
Sports Favoritetemplate at threshold 0.5 with a flip). If your bot beats that on Brier, you have edge. - Slice by tag. Check the
by_tagBrier breakdown in the result. If your bot has Brier 0.18 on sports but 0.30 on geopolitics, focus on sports — your strategy is finding edge there and not in the other category. - Filter the calibration plot. If the 0.20-0.30 and 0.40-0.50 buckets are well-calibrated but 0.60-0.70 is consistently below the diagonal, drop trades in that price range (your bot over-confident on near-favorites).
Log-loss as a sibling metric
Bagtester also returns summary.log_loss, defined as:
log_loss = -(1/N) * Σ [r·log(p) + (1-r)·log(1-p)]
It penalizes extreme confident wrong predictions more harshly than Brier. If your bot tends to bet very small probabilities (say YES at $0.02), use log-loss alongside Brier — the asymmetric penalty surfaces strategies that fail catastrophically on rare high-confidence misses.
Next steps
Start with one of the ten templates in polymarket_list_strategy_templates. Run it on a tightly-scoped universe (one tag, ≥$100k volume, 2024 only). Read the resulting Brier + calibration plot. Iterate on the threshold until the calibration plot tightens up. Then broaden the universe and re-test.