Claude Code · Polymarket

Backtest Polymarket bots from Claude Code

Claude Code has native MCP support. Add Bagtester once and every session has a 7-tool Polymarket surface — submit a bot, list past runs, compare results, browse strategy templates. All from natural-language prompts; on-chain trade data; Brier + calibration results returned synchronously.

Setup (2 minutes)

Get an API key

Add the MCP

claude mcp add bagtester \
  --transport http https://bagtester.com/api/mcp \
  --header "Authorization: Bearer bag_YOUR_KEY"

Verify Polymarket tools are exposed

In Claude Code, ask: "list polymarket tools". You should see seven polymarket_* tools. The first one to call for any new bot idea is polymarket_list_strategy_templates — ten ready-to-go templates (cheap-YES fade, news-spike fade, time-decay carry, sports favorite, vectorized momentum, etc.).

Ask for a backtest

> backtest a mean-reversion bot on Polymarket
  politics markets resolved in 2024 with at
  least 100k volume

Claude will call polymarket_list_strategy_templates, pick the news-spike-fade template, then submit via polymarket_submit_bot with market_filter.tags=["Politics"] and min_volume_usdc=100000. The result lands inline: Brier score, calibration curve, by-tag PnL breakdown, equity curve, 11 PM-specific quality flags.

Try these prompts

"Test a time-decay carry strategy on Polymarket sports markets, hold YES under $0.15 to resolution"
"Find Polymarket crypto markets above $10M volume that resolved in 2024, then backtest a vectorized momentum strategy on them"
"Compare three Polymarket strategies (cheap-yes fade, favorite momentum, news-spike fade) head-to-head on the 2024 politics universe"

What you get back

Every polymarket_submit_bot response is the same pm_v1.0 schema: summary.brier_score, summary.calibration_curve (10 buckets), summary.by_tag (per-tag PnL + Brier), summary.equity_curve_usdc, quality_flags_pm._summary (the 11 flags), plus next_steps with agent-friendly suggestions (refactor-to-vectorized, broaden-filter, focus-on-best-tag).

See full result schema and FAQ →