Backtesting Strategy Development Mar 10, 2026 · 7 min read

Why Most Backtests Lie
(And How ForgeAlpha Doesn't)

You ran the backtest. The equity curve looked beautiful — smooth, upward, +143% over 18 months. You deployed the strategy. Two weeks later, you're down 12%.

Sound familiar? You're not alone. And you're not stupid — you were betrayed by a backtest that told you what you wanted to hear instead of what was actually true.

The Four Ways Backtests Lie to You

1. They ignore spread

Spread is the gap between the bid and ask price — the broker's cut on every trade. On EUR/USD, it might be 1.5 pips. On Gold, it could be 25–40 cents. On Volatility 75 Index, it can be wider still.

Most naive backtests execute at the exact mid-price or even the close price. No spread. No friction. Your strategy "bought" at 1.0842 when in reality, the ask was 1.0844. Over 300 trades, that 2-pip discrepancy compounds into a gap between your backtest P&L and your live account that you'll never be able to explain to yourself.

The fix is simple in concept: apply bid/ask spread to every open and close. But most tools don't bother because it makes the results look worse — and traders don't want to buy software that makes their strategies look bad.

2. They ignore commission

Your broker charges commission per lot on market orders. ECN brokers typically charge $3.50–$7 per lot per side. That's $7–$14 round-trip per standard lot.

If your strategy trades 0.10 lots on average and takes 400 trades a year, you're paying $280–$560 in commission annually on that single bot. For a scalping strategy taking micro-profits, this is often the difference between a profitable system and a money-losing one.

"But my broker doesn't charge commission" — then it's baked into a wider spread. Either way, there's friction on every trade. Any backtest that doesn't model it is fiction.

3. They use one tick per candle

This is the most underappreciated error in backtesting. When you backtest on 1-hour candles and evaluate rules only at the close, you're making a silent assumption: that nothing important happened between the open and close of that candle.

That assumption is wrong. A candle that closed at 1.0870 might have swept down to 1.0812 first, triggering your stop loss — then recovered. A single-tick backtest never sees that. It records a profitable trade. The live trader gets stopped out.

The correct approach is to simulate intra-candle price movement. ForgeAlpha replays each candle as 4 ticks: Open → Low (or High) → High (or Low) → Close. This simulates the worst-case intra-candle path and catches the SL/TP sweeps that single-tick backtests miss entirely.

ForgeAlpha 4-Tick Simulation

Open

1.0842

Strategy evaluates entry

Low

1.0812

SL check — would stop out?

High

1.0881

TP check — would take profit?

1.0868

Exit rule evaluation

On a bearish candle, ForgeAlpha simulates Open → Low → High → Close. On a bullish candle: Open → High → Low → Close. SL/TP checks on every tick — not just at close.

4. They ignore slippage

Market orders rarely fill at the exact price you see. In fast markets, your "market order at 1.0842" fills at 1.0844 because the order book moved in the microseconds between your signal and execution. This is slippage.

On liquid pairs like EUR/USD during London session, slippage might be 0.2–0.5 pips. On thinly-traded assets, news events, or Asian session opens, it can be 2–5 pips or more. Strategies that work on narrow profit targets are particularly sensitive to this.

ForgeAlpha supports both fixed slippage (constant penalty per trade) and random slippage (sampled from a range). Random slippage mode is intentionally noisy — it prevents over-optimization where a strategy is curve-fitted to avoid the exact slippage scenarios that kill it in live trading.

The Survivorship Bias Trap

Here's a thought experiment: you build a strategy, backtest it on EUR/USD from 2022–2025, and it works beautifully. You deploy it. Six months later it stops working.

Why? Because 2022–2025 included specific market regimes — a high-volatility trending phase, a low-volatility ranging phase — that suited your strategy. You didn't test it on 2017–2019 (low volatility, ranging), or 2020 (COVID crash), or 2015 (SNB flash crash). You only tested it on data that matched your strategy's personality.

This is survivorship bias in backtesting. The fix isn't complicated: test on longer date ranges, across different volatility regimes. The 3-way comparison in ForgeAlpha helps here — run original, AI-suggested, and your edited version across the same date range side-by-side. If AI's version only outperforms on the period it was trained on, you'll see it immediately.

The Cost Analysis Problem

Even if your backtest models spread, commission, and slippage correctly, there's a subtler question: does your strategy's profit per trade actually survive those costs?

Gross vs Net: Where Profits Disappear

Gross P&L (per trade avg) +$4.20

Spread cost (1.5 pip × 0.10 lots) -$1.50

Commission (ECN, round-trip) -$0.70

Slippage estimate -$0.40

Net P&L +$1.60

A strategy that averages $4.20/trade gross generates only $1.60/trade net after friction. That's a 62% friction ratio — dangerously high.

ForgeAlpha's backtest engine includes a friction ratio warning: if your total friction costs (spread + commission + slippage) exceed 50% of your gross profit, you'll see a HIGH_FRICTION_RATIO warning in the results. This is an immediate red flag — strategies at this friction ratio have almost no margin of safety for live trading variance.

Additionally, ForgeAlpha's cost analysis system computes strategy-level cost metrics before you even run a backtest: minimum recommended capital, max safe grid levels, worst-case martingale exposure. If your strategy's basket profit is below the spread cost, you'll see a BASKET_BELOW_COST warning. These aren't nice-to-haves — they're the difference between a strategy that survives live trading and one that bleeds to death on spread alone.

What an Honest Backtest Looks Like

Let's be concrete. Here's what ForgeAlpha models on every backtest run:

Bid/ask spread — configurable per symbol to match your broker's actual spread
Commission per lot — set to your broker's exact round-trip rate
Slippage — fixed or random, applied on market order fills
4-tick candle simulation — intra-candle SL/TP sweeps are modeled correctly
Margin checks — every trade open validated against a 50% stop-out margin level
Leverage-aware sizing — virtual margin computed from volume × contractSize × price / leverage
Adaptive warmup — indicator warmup period calculated from your strategy's longest period, not a hardcoded value

That last point is worth dwelling on. If your strategy uses EMA(200), the backtest engine needs at least 200 candles of warmup data before it starts evaluating rules — otherwise, the first 200 trades are evaluated with incomplete indicator data and will produce misleading results. ForgeAlpha detects the maximum indicator period in your strategy's rule tree automatically and applies the correct warmup buffer.

The 3-Way Compare: Your Sanity Check

One of the most powerful features for honest backtesting is the 3-way compare. Here's how to use it:

Upload your trade history → ForgeAlpha extracts your original strategy rules
Generate AI suggestions → a candidate set of improved rules appears
Run 3-way backtest → original, AI-suggested, and your current edits run simultaneously

The goal isn't to find the version with the highest return. It's to find the version that performs consistently across a full date range with acceptable drawdown. The AI-suggested version might show higher returns on one period by taking more risk — the 3-way compare makes that visible before you commit capital.

Example: 3-Way Backtest Results (6 months, EURUSD H1)

Metric	Original	AI Suggested	Your Version
Win Rate	52%	61%	58%
Profit Factor	1.31	1.87	1.64
Max Drawdown	18.4%	9.2%	12.1%
Friction Ratio	38%	22%	29%
Net Return	+18.3%	+43.2%	+31.7%

The AI version wins on every metric here. But the point is: you can see that — not just trust it. Run this on your own strategy.

Stop Curve-Fitting. Start Validating.

The goal of backtesting isn't to find a strategy that looks good in the past. It's to find evidence that a strategy has properties that are likely to persist in the future: positive expectancy after realistic costs, reasonable drawdown, consistent behavior across market regimes.

Most backtesting tools optimize for the first goal because it makes the product feel good to use. ForgeAlpha optimizes for the second goal because that's what actually matters when real money is on the line.

An honest backtest that returns +18% is worth more than a dishonest one that returns +143%. One of those will hurt you when you try to live on it.

Run an Honest Backtest

Upload your trade history or build a strategy from scratch. Backtest with real spread, slippage, and commission — and see the actual numbers.

Get Early Access

Why Most Backtests Lie(And How ForgeAlpha Doesn't)