Guide · 2026-05-12

Realistic backtests vs. vibes-trading: why your fill model matters

A strategy that prints +40% in a naïve minute-mode backtest can lose money live. The difference is the fill model. Here's how Bagtester's three execution modes (minute, hybrid, tick) trade off cost against realism — and when each one is the right call.

The naïve backtest is always optimistic

When you backtest at minute resolution and fill at next_bar.open with a constant slippage, you're telling the engine: "assume we always got filled at the printed open price." That assumption quietly hides three real costs:

Intra-bar drift. The next bar opens, your order goes in, the price moves before you're filled. Aggressive orders typically get worse fills, not better.
Spread. Buys pay the ask, sells receive the bid. Mid-price fills understate this by half the spread.
Adverse selection. If the price moves "in your favor" before fill, you don't get the fill — only the bad ones land. A subtle effect that systematically degrades execution.

Together, these can flip a marginal-Sharpe strategy from positive to negative. The cheaper the asset class and the higher the turnover, the worse the gap.

Bagtester's three modes

We expose three execution modes so you can iterate fast and validate slowly:

Minute (1× cost). Vectorized on 1-minute bars. Fills at next-bar open plus constant slippage_bps. Cheap, fast, optimistic. Use it for parameter exploration and quick iteration — but never for the final "is this strategy worth deploying?" check.
Hybrid (3× cost). Signals come from 1m bars; fills are modeled with adverse-selection inside the OHLC range. A buy market order gets a fill biased toward the bar's high; a sell, toward the low. Calibrated against tick fills, with FX bars also adding half the per-bar spread to the slippage so buys land at ask and sells at bid. This is the "should be default for evaluations" mode.
Tick (10× cost). Walks the actual trade stream tick-by-tick. Each order fills at the first opposing-side tick after submission. HFT-realistic. Currently available on BTC, ETH, SOL crypto and 16 FX majors. Use it for final validation or for strategies whose edge depends on execution timing.

A concrete example: SMA crossover on EURUSD

Take an SMA(20, 50) crossover on EURUSD, 2023-2024. In minute mode with a default fx_major fee profile, the strategy shows a flattering +12% return with Sharpe 0.94.

Re-run with hybrid mode (3× cost, but spread-aware) and the same strategy lands at +4% with Sharpe 0.31. The difference is entirely in execution: every cross-pair tick was crossing the spread, and at 2-5 bps each on minor pairs that adds up over ~200 trades.

Re-run again with tick mode on EURUSD (we have the data) and you get +3% / Sharpe 0.27 — close to hybrid, slightly worse because the actual tick stream has occasional spread blowouts during news that the 1m-averaged hybrid model misses.

The naïve minute result was four times more optimistic than the best realistic estimate. Without the hybrid → tick check, you would have shipped a strategy that doesn't pay rent.

Reading the quality flags

Bagtester returns 12 boolean signals per backtest, computed from the same metrics. Three are directly relevant to execution realism:

high_cost_drag — triggers when total commission + slippage exceed 20% of the gross return. A signal your strategy is paying too much friction.
low_trade_frequency — fewer than 1 trade per week. Implies each trade matters disproportionately; execution noise can dominate.
outlier_dependent — top 5 trades drive most of the PnL. If the agent is told to optimize for that strategy, it's likely overfitting to a few lucky bars.

The agent reads these directly; no math required. If quality_flags._summary.high_severity_count > 0, the response includes the reasons inline.

When optimism is actually fine

Minute mode isn't wrong — it's appropriate when:

You're iterating on signal logic and don't need execution accuracy yet.
Your strategy holds for hours-to-days and turnover is low.
You're running a large parameter sweep where what matters is the relative ranking, not absolute numbers.
Your asset class has small spreads (BTCUSDT, EURUSD, SPY) and modest minute-mode slippage approximates real fills.

The mistake isn't using minute mode. It's shipping a strategy whose only validation was minute-mode.

The pragmatic workflow

The flow we recommend:

Iterate in minute mode. Fast, cheap, lots of submissions.
Once a strategy looks alive, re-run the top candidate in hybrid mode. Watch what Sharpe survives.
For anything you'd actually deploy, run tick mode (where available) as the final check.
Use walk_forward in hybrid mode for the overfit check.

Costs scale 1× / 3× / 10× across the three modes. The 10× tick run is rare in early iteration; you save it for the "before I commit capital" decision.

One more detail: FX bid/ask

FX bars in Bagtester carry a per-bar spread_bps field — the mean spread observed over the bar's duration. The runner uses this to add half-spread to slippage so buy fills model an ask hit and sells model a bid hit. On EUR-USD that's typically 0.1-0.5 bps and you'll barely notice; on cross pairs like GBPNZD it can be 2-5 bps, which is enough to invert a marginal-Sharpe carry strategy.

Crypto bars don't carry per-tick spread the same way — we model crypto execution via the crypto_spot fee profile's commission + slippage_bps defaults, calibrated against BTCUSDT aggTrade fills. Override with explicit commission_bps and slippage_bps args if your fee schedule is different.

Run the same strategy in all three modes

Free tier covers ~10 short backtests. That's enough to see how much realism eats your Sharpe.

Get a free API key Modes reference