When is a Backtest Too Good to be True? Part Two.

In the previous post, I went through a simple exercise which, to me, clearly demonsrtates that 60% out of sample guess rate (on daily basis) for S&P 500 will generate ridiculous returns. From the feedback I got, it seemed that my example was somewhat unconvincing. Let’s dig a bit further then.

Let’s add Sharpe ratio and maximum drawdown to the CAGR and compute all three for each sample.

return.mc = function(rets, samples=1000, size=252) {
   require(PerformanceAnalytics)
   # The annualized return for each sample
   result = data.frame(cagr=rep(NA, samples), sharpe.ratio=NA, max.dd=NA)
   for(ii in 1:samples) {
      # Sample the indexes
      aa = sample(1:NROW(rets), size=size)
      # All days we guessed wrong
      bb = -abs(rets)
      # On the days in the sample we guessed correctly
      bb[aa] = abs(bb[aa])
      cc = as.numeric(bb)
      # Compute the statistics of interest for this sample.
      result[ii,1] = Return.annualized(cc,scale=252)
      result[ii,2] = SharpeRatio.annualized(bb,scale=252)
      result[ii,3] = maxDrawdown(cc)
   }
   return(result)
}

Let’s look at some summary statistics:

require(quantmod)
gspc = getSymbols("^GSPC", from="1900-01-01", auto.assign=F)
rets = ROC(Cl(gspc),type="discrete",na.pad=F)["1994/2013"]

df = return.mc(rets, size=as.integer(0.6*NROW(rets)))
summary(df,digits=2)

#       cagr       sharpe.ratio     max.dd    
#  Min.   :0.34   Min.   :1.8   Min.   :0.13  
#  1st Qu.:0.45   1st Qu.:2.3   1st Qu.:0.22  
#  Median :0.48   Median :2.5   Median :0.26  
#  Mean   :0.48   Mean   :2.5   Mean   :0.27  
#  3rd Qu.:0.51   3rd Qu.:2.7   3rd Qu.:0.31  
#  Max.   :0.67   Max.   :3.5   Max.   :0.63

The picture is clearer now. Lowest Sharpe ratio of 1.8 among all samples, and a mean at 2.5? Yeah, right.

The results were similar for other asset classes as well – bonds, oil, etc. All in all, in financial markets, like in a casino, a small edge translates into massive wealth, and most practitioners understand that intuitively.

Comments

  1. Pat says:

    Thanks for sharing. I think you meant to put bb under the performance arguments, not cc (as it does not exist).
    Something kind of interesting, is that the actual realized bias was about 54% over the rets period, with a CAGR of about 7%. If you plug 54% under the size argument of the mc function, you find that 7% was in the very low end of the range of results! Imagine some oracle had a crystal ball and could predict 54% daily returns over all those years, and as a result, he might have expected something like 16% Mean CAGR, but only achieved the lowest end of the mc CAGR results, and the high end of the drawdown results — talk about bad luck on top of edge. Maybe that tells us something cautionary about trying to optimize solely around hit rates.

    df = return.mc(rets, samples=1000, size=as.integer(0.54*NROW(rets)))

    > summary(df,digits=2)
    cagr sharpe.ratio max.dd
    Min. :0.058 Min. :0.30 Min. :0.21
    1st Qu.:0.134 1st Qu.:0.70 1st Qu.:0.33
    Median :0.158 Median :0.82 Median :0.38
    Mean :0.160 Mean :0.83 Mean :0.40
    3rd Qu.:0.185 3rd Qu.:0.96 3rd Qu.:0.46
    Max. :0.307 Max. :1.60 Max. :0.74

    > Return.annualized(rets,scale=252)
    GSPC.Close
    Annualized Return 0.0713289
    > maxDrawdown(rets)
    [1] 0.5677539

    1. quintuitive says:

      Thanks for finding the “cc” bug – it was meant to be “cc = as.numeric(bb)”.

      About the buy and hold – although it seems “lucky” in terms of guess rate (54%), it is quite “unlucky”, as you have observed, in terms of returns. The asymmetry of the returns (returns in a bear vs returns in a bull) is a plausible explanation. Likewise, at a 50% guess rate, one loses money on average.

      Last but not least, luck is a huge factor indeed on the Drawdown front as well. Imagine running into the 40% losing days right off the bat – the drawdown will be nearly 100%. The loses are also massive if one guessed correctly, at 60%, the days with the smallest returns.

  2. Thank you for sharing.
    I simulated different success rates over at my blog “Data Shenanigans”, mainly to introduce people to parallel computing, but also to show how a succcess rate of roughly 55% results in an exploding value.
    If you want to find more, please visit: https://datashenanigan.wordpress.com/2015/09/23/simulating-backtests-of-stock-returns-using-monte-carlo-and-snowfall-in-parallel/
    David

Leave a Reply