Bootstrapping avoids seductive backtest results

Nothing gets the adrenaline rushing as strong backtesting results of your latest equity trading idea. Often, however, it is a mirage created by a subset of equities, which have performed particularly well or poorly thereby inflating the results beyond what seems reasonable to expect going forward.

The investment community has come a long way in terms of becoming more statistically sound, but it is still surprising how few research papers on cross-sectional equity factors mention bootstrapping. Without bootstrapping researchers are simply presenting results from the full sample, implying that the same type of – potentially spectacular – returns will happen again in the future and will be captured satisfactorily by the model. In other words, the backtesting results may be heavily skewed by outliers. In our latest post on equity factor models we mentioned bootstrapping, but we postponed any real discussion of the topic. In today’s blog post we return to the topic of bootstrapping and specifically how outliers influence the results of the aforementioned factor model.

Outlier sample bias

In our equity factor model research our backtest is based on 1,000 bootstrapped samples where each sample is a subset of the constituents in the S&P 1200 Global index. Due to our use of bootstrapped samples the same stock can appear twice or more in any given holding period and it is possible to use subsampling instead.

What is particularly interesting in our backtest is that the historical sample (running the backtest simply on the available historical data set) delivers impressive results with an annualized return of 11% (vertical orange line in the chart below). This is 2.2%-points better than the bootstrapped mean of 8.8%

Bootstrapping vs historical sample

Only 7.2% of the bootstrapped results have a higher annualized return than the historical sample putting it firmly in the right tail of the distribution of annualized returns. Without bootstrapping the historical sample backtest could lead to inflated expectations due to outliers. While it is possible that the model will be able to capture such outliers in the future as well, we prefer to err on the side of caution and therefore prefer to use the boostrapped mean as our expected annualized return rather than that achieved with the full list of historical constituents.

Another advantage of the resampling methodology is that is creates a confidence band around our expectation. We expect an 8.8% annualized return from this model, but if it delivers 6.5% it would fit well within the distribution of annualized returns and hence not surprise us too much. Below the confidence band (5% and 95% percentiles) is shown for the cumulative return with a mean total return of 363% compared with 540% for the historical sample which is close to the upper band of 558%


It is our wish that the investment community steps up its use of bootstrapping on cross-sectional equity research or in other ways incorporate the uncertainty of the results more explicitly, thereby painting a more appropriate picture of the expected performance of a trading strategy.