# Bagtester

> Backtesting-as-a-service for AI coding agents. Submit Python strategies via MCP, get standardized results back.

Bagtester runs an MCP (Model Context Protocol) server you can add to Claude Code, Codex, Cursor, or any MCP-compatible client. Your agent writes a Python strategy, calls our MCP, gets a structured backtest result. No data wrangling, no infrastructure, no deploy step.

## Endpoint

- MCP server: `https://bagtester.com/api/mcp`
- Transport: streamable HTTP / JSON-RPC over POST
- Auth: `Authorization: Bearer bag_…` (per-user API key)
- Generate keys: https://bagtester.com/dashboard

## Tools (14 total)

### Core backtest tools

- `submit_strategy` — Run a single backtest. Args: `code`, `mode` (minute|hybrid|tick), `symbol`, `from_date`, `to_date`, `initial_capital`, `params`, `fee_profile`, `commission_bps`, `slippage_bps`, `include_benchmark`, `wait` (auto|true|false). Returns schema v2.0: 107 metrics across 8 categories (returns, risk, risk_adjusted, trades, exposure, benchmark, distributions), 12 `quality_flags`, sparkline previews, share URL.
- `get_backtest_results` — Retrieve a job by `job_id`. Use after async submit.
- `list_my_jobs` — Recent jobs for this account, most recent first.
- `improve_strategy` — Re-run a previous job with one improvement: `extend_time_range`, `use_hybrid_mode`, `use_tick_mode`, or `parameter_sweep`.

### Data tools

- `list_available_data` — Asset catalog with tier metadata. Vendor-free capabilities only.
- `get_data_sample` — Last N OHLCV rows for a symbol. Use to verify schema before submitting strategy.

### Optimization (Quant tier)

- `optimize_strategy` — Parameter sweep ranked by `primary_metric` (sharpe_ratio | sortino_ratio | total_return_pct | cagr_pct). Trader tier capped at 50 combinations; Quant at 1,000.
- `walk_forward` — Rolling in/out-of-sample test. Args include `in_sample_days`, `out_sample_days`, and a `param_grid` to re-optimize per window.

### Discovery + multi-result (Phase B)

- `search_backtests` — Filter your historical backtests by symbol, tag, mode, status, date range, experiment, strategy.
- `list_experiments` — List experiment groups (param sweeps, walk-forwards, manual groupings).
- `get_experiment` — Fetch an experiment with all child runs + heatmap data for `param_sweep` type. Optional `top_n_by_metric` flags best runs.
- `compare_backtests` — N-way side-by-side (2-10 runs). Returns markdown table, JSON comparison, `winner_per_metric` map, and a stacked-equity PNG URL.
- `create_experiment` — Manually group existing jobs into an experiment for later retrieval.

### Subscription management

- `manage_subscription` — View current plan, credits remaining, recent invoices. Get a Stripe customer-portal link to upgrade, change plan, update payment method, or cancel — all without leaving your agent.

## Asset coverage (capabilities, not vendors)

- Crypto: top 50 USDT pairs, 1m bars from ~2019; tick (aggregate trades) for BTC, ETH, SOL from ~2020.
- FX: 16 major pairs with tick history from 2003+.
- US stocks: top ~200, 1m intraday + EOD daily. Intraday is single-venue (~2-3% of consolidated tape — fine for directional strategies on large-caps, not for execution-sensitive HFT).
- US ETFs: ~50 curated (broad index, sector, bond, commodity, vol, international, factor).
- Fundamentals: DOW 30 (Quant tier only) — earnings, ratios, statements.
- Crypto perp funding rates: BTC (Quant tier).

## Backtest modes

| Mode | Granularity | Cost multiplier | Best for |
|---|---|---|---|
| `minute` | 1-min OHLCV, vectorized | 1× | Fast iteration, parameter sweeps, longer time horizons |
| `hybrid` | 1-min signals + adverse-fill within bar | 3× | Realistic execution slippage without tick storage cost |
| `tick` | Full event-driven on tick stream | 10× | HFT-grade realism. BTC/ETH/SOL crypto + FX majors |

## Pricing

- **Free** — $0. 500 credits/month, rolling 30-day refresh. BTC/ETH/SOL, minute mode. 1 API key, 1 concurrent backtest. 7-day result retention.
- **Trader** — $19/month. 30,000 credits/month. All symbols (crypto/FX/stocks/ETFs), minute + hybrid mode. Up to 50-combo parameter sweeps. 3 API keys, 5 concurrent. Persistent strategy library. 90-day retention.
- **Quant** — $39/month. 300,000 credits/month. Everything in Trader + tick mode (BTC/ETH/SOL) + walk-forward + sweeps up to 1,000 combos + public sharing + comparison + fundamentals (DOW 30) + funding-rate aware perp backtests. 10 API keys, 10 concurrent. 1-year retention.

Credit top-ups (one-off, no expiry):
- $5 → 20,000 credits (all paying plans)
- $20 → 100,000 credits (all paying plans)
- $90 → 500,000 credits (Quant only)
- $300 → 2,000,000 credits (Quant only)

Cancel any time from your agent (`manage_subscription`) or at https://bagtester.com/dashboard.

## Strategy template

```python
from bagtester import Strategy

class SmaCross(Strategy):
    def __init__(self, fast: int = 20, slow: int = 50):
        self.fast = fast
        self.slow = slow
        self.closes: list[float] = []

    def on_bar(self, ctx, bar):
        self.closes.append(bar.close)
        if len(self.closes) < self.slow:
            return
        fast_avg = sum(self.closes[-self.fast:]) / self.fast
        slow_avg = sum(self.closes[-self.slow:]) / self.slow
        pos = ctx.position(bar.symbol)
        if fast_avg > slow_avg and not pos:
            ctx.buy(bar.symbol, ctx.equity * 0.95 / bar.close)
        elif fast_avg < slow_avg and pos:
            ctx.close(bar.symbol)
```

Pre-installed in the sandbox: numpy, pandas, polars, scipy, scikit-learn, statsmodels, ta-lib, pandas-ta, numba. `pip install` is not available at runtime; no network egress.

## Result shape (schema v2.0)

```jsonc
{
  "schema_version": "2.0",
  "job_id": "bt_…",
  "experiment_id": null,
  "status": "completed",
  "mode": "minute",
  "summary": {
    "headline": "+24.7% return, Sharpe 1.83, max DD -14.2% over 87 trades",
    "sparkline_equity": "▁▂▂▃▅▆▇▇█▇▇▆▅▆▆▇█▇▇█",
    "sparkline_drawdown": "▁▁▁▂▃▃▂▁▁▁▁▂▂▁▁▁▁▁▁▁",
    "total_return_pct": 24.7,
    "sharpe_ratio": 1.83,
    "max_drawdown_pct": -14.2
  },
  "context":          { /* symbol, dates, capital, fees, engine_version, code_hash */ },
  "returns":          { /* 13 return metrics */ },
  "risk":             { /* 14 risk metrics incl. top_drawdowns[] */ },
  "risk_adjusted":    { /* sharpe, sortino, calmar, omega, tail_ratio, … */ },
  "trades":           { /* 23 trade-level metrics */ },
  "exposure":         { /* time_in_market, turnover, cost_drag, … */ },
  "benchmark":        { /* alpha, beta, correlation, up/down capture, … */ },
  "distributions":    { /* monthly/yearly/rolling tables, histograms */ },
  "quality_flags":    { /* 12 booleans + reason + severity */ },
  "equity_curve":     [/* ≤500 points */],
  "drawdown_curve":   [/* ≤500 points */],
  "trade_ledger":     [/* ≤200 trades */],
  "visuals":          { "equity_url": "…", "chart_url": "…", "share_url": "…" },
  "billing":          { "credits_used": …, "balance_after": … },
  "next_steps":       [ /* agent-friendly recommendations */ ]
}
```

## Quality flags (12)

The agent reads these directly — no math required:

`small_sample`, `time_concentration`, `outlier_dependent`, `regime_dependent`, `high_tail_risk`, `low_trade_frequency`, `consecutive_loss_risk`, `low_win_rate_low_payoff`, `underperformed_benchmark`, `high_cost_drag`, `untested_in_drawdowns`, `long_recovery`.

Each is `{ triggered: bool, reason: string, severity: "low"|"medium"|"high" }`. Use `quality_flags._summary.high_severity_count` for quick triage.

## Differentiators

- **Agent-native UX.** All result fields are flat JSON the agent can act on directly. Sparklines for terminals (Claude Code CLI), PNG charts for chat clients (Claude Desktop, Cursor), and OG-card share URLs for any context.
- **Standardized output across modes.** Same metric names whether you ran minute, hybrid, or tick — your agent can compare apples to apples.
- **Subscription management from the agent.** `manage_subscription` returns a Stripe portal link so users never need the dashboard for billing changes.
- **Experiments as first-class objects.** `optimize_strategy` and `walk_forward` produce queryable experiments; the agent can later `compare_backtests` across them.

## Resources

- Quickstart: https://bagtester.com/docs/quickstart
- MCP API reference: https://bagtester.com/docs/api
- Modes: https://bagtester.com/docs/modes
- Available data: https://bagtester.com/docs/data
- Strategy templates: https://bagtester.com/docs/strategies
- Pricing: https://bagtester.com/pricing
- FAQ: https://bagtester.com/faq
- Full LLM-RAG-friendly reference: https://bagtester.com/llms-full.txt