Polymarket bot backtesting · MCP-native

Backtest your Polymarket bot through our MCP.

4,600+ resolved markets with full historical price series. Brier-score calibration. Lookahead-safe by construction. Vectorized engine runs the full universe in ~23 seconds. Your agent writes Python, calls our MCP, gets a structured result back — real-money-bot-ready.

Honest about depth: resolved-market price-series backtests are live and validated across 4,900+ markets back to 2023. The tick-level L2 order-book archive is captured live and young. Order-book archive: live since 21 May 2026. Today's backtests use clean CLOB-derived price bars (fine for time-decay, mean-reversion, momentum, contrarian, sports-favorite and news-fade); order-book fills land as the archive matures.

4,600+ resolved markets

Every Polymarket CTF market since 2023 with cumulative volume ≥ $1k USDC. Politics (1,250), sports + NBA + soccer + NFL (~2,500), crypto + crypto-prices (~960), global elections + geopolitics (~575), plus weather, economics, pop-culture. Full historical price series from the same prices-history endpoint that powers the Polymarket UI. Resolutions confirmed against Polygon ConditionalTokens.

Brier + calibration

Result schema includes Brier score, log-loss, 10-bucket calibration curve, edge-captured (bps), by-tag PnL breakdown, equity curve, max drawdown, win rate, and 11 PM-specific quality flags. The signature visual: predicted vs. actual frequency, bucketed.

Lookahead-safe

Three independent enforcement vectors guarantee your bot can't peek at outcomes. Reading market.final_outcome before resolution raises. Calling ctx.buy after the cutoff raises. Asking for a future price raises. No override flag — this is correctness, not a config knob.

10 lines of Python.

Your agent writes a class. We run it across the filtered market universe and give back a structured result. The strategy below buys cheap YES on long-dated markets and holds to resolution.

from bagtester.polymarket import PolymarketStrategy

class TimeDecayCarry(PolymarketStrategy):
    """Buy YES at price < 0.20 on markets > 30 days out."""

    def on_market_update(self, ctx):
        yes_idx = 0
        if ctx.portfolio.has_position(ctx.market.condition_id, yes_idx):
            return
        price = ctx.current_price(yes_idx)
        if price is None or price >= 0.20:
            return
        days_left = (ctx.market.effective_resolution_ts_ns - ctx.now_ns) / 86_400e9
        if days_left > 30:
            ctx.buy(yes_idx, size_usdc=20.0)

Tell your agent.

Once Bagtester is connected to Claude Code / Codex / Cursor, run the prompt below and watch the agent backtest a Polymarket bot end-to-end.

backtest a mean-reversion bot on Polymarket
politics markets resolved in 2024 with at
least $100k volume

The agent will call polymarket_list_strategy_templates first, pick the best fit (vectorized news-spike fade), then submit via polymarket_submit_bot. You get a Brier score, calibration curve, by-tag breakdown, and a share URL.

Setup your agent →

Pricing

One paid plan. Free to confirm the data is real; Pro for the whole archive at full scale.

Free

$0

  • 12-hour price bars
  • ≤ 50 markets / backtest
  • Brier + calibration
  • midpoint fills

Pro

$59/mo

  • full resolved universe
  • tick + L2 order-book data
  • sweeps + walk-forward
  • order-book fills, concurrent
Full pricing details →

FAQ

What is Brier score?

Brier score is the mean squared error between your bot's predicted probability and the realized outcome. Lower is better; range [0, 1]. A perfectly calibrated bot scores around 0.2 on typical Polymarket markets; well-tuned ones cluster around 0.15.

How do you prevent lookahead bias?

Three independent enforcement vectors. Reading market.final_outcome before the effective resolution timestamp raises LookaheadError. Calling ctx.buy() or ctx.sell() after the effective resolution raises LookaheadError. Calling ctx.current_price(ts=future) raises LookaheadError. No override flag, no escape hatch — a strategy that triggers LookaheadError is a strategy bug, not a sandbox bug.

Which markets do you have data for?

4,600+ resolved Polymarket markets since 2023-01-01 with cumulative volume >= $1,000 USDC. Coverage spans politics (1,250 markets), sports/NBA/soccer/NFL (~2,500), crypto + crypto prices (~960), global elections + geopolitics (~575), plus weather, economics, and pop-culture. Historical price series come from Polymarket's CLOB prices-history endpoint (the same data the Polymarket UI renders). Resolution outcomes come from the Polygon ConditionalTokens contract. The vectorized engine path runs a full-universe backtest (4,666 markets) in ~23 seconds.

How does the calibration curve work?

Your bot's buy trades are bucketed by predicted probability (the price you paid) into 10 buckets of width 0.1. Within each bucket, we compute the actual frequency that YES resolved. A perfectly calibrated bot lies on the 45° line: predicted 0.30 → actual frequency 30%.

Which strategy types work best?

Mean-reversion (news-spike fade), time-decay carry on long-dated YES, sports-favorite momentum, and tag-stratified plays. Vectorized templates run 20-50× faster than event-driven for the same logic. Call polymarket_list_strategy_templates to see all 10 canonical patterns.

What's the difference between event-driven and vectorized?

Event-driven (PolymarketStrategy) gets a callback per trade-print so it can maintain state per market across trade prints. Vectorized (PolymarketVectorizedStrategy) returns a signal frame once that the engine walks with polars window expressions — orders of magnitude faster for bar-driven rules.

Can I trade on Polymarket through this?

No. Bagtester is a backtesting service only — we don't route orders to Polymarket. The output is a strategy you can then run live against the Polymarket CLOB API yourself, with the confidence that comes from a clean backtest.

How does the cross-market portfolio work?

Your strategy's `self` instance persists across every market in the backtest window. ctx.portfolio is one shared cash pool — buying YES in market A draws from the same balance as buying NO in market B. Position keys are (condition_id, outcome_index) so YES and NO on the same market are distinct positions.

Are resolutions actually on-chain?

Yes. We subscribe to ConditionResolution events on the Polygon ConditionalTokens contract. Disputed UMA resolutions are flagged in the result so your strategy can choose to exclude them via market_filter.exclude_disputed_resolutions=true.

What happens if my strategy crashes mid-backtest?

Regular Python exceptions in user callbacks (on_market_open, on_market_update, on_resolution) are logged into the result's strategy_errors list (capped at 20 to avoid noise). The backtest continues — partial results are still useful. LookaheadError specifically is NOT caught — it's a hard correctness bug and kills the run so you see it immediately.