Markets are very smart in absorbing and reflecting information. If you think otherwise, try making money by trading. If you are new to it, make sure you don’t bet the house.

In other words, markets are efficient. At least most of the time. So then why people trade? The general believe is that there are windows during which prices of certain assets are inefficient. Thus, there are opportunities to make money. Is the presence of autocorrelation one such opportunity? Let’s find out.

To keep things simple, we will use the standard R *acf* function to compute the autocorrelation.

require(quantmod) spy=getSymbols("SPY",from="1900-01-01",auto.assign=F) # Note: We use adjusted close - it's unrealistic to expect anything from actual close in the # presence of dividends spy.rets = ROC(Ad(spy),na.pad=F) aa = acf(tail(spy.rets,500),main="ACF computed over the last 500 days") head(aa) # [1] 1.000000000 0.051116484 -0.037469705 -0.010014871 -0.126667484 -0.004113005

This will produce the following chart:

It shows the autocorrelation coefficients at different lags. The first lag is the correlation of the series with itself (lag 0), and, it’s always 1. The second value (0.051116484) is the correlation of the series with the series lagged by one.

The two dashed lines are the confidence intervals for the lags. They are of special interest since we are going to use them to decide when there is significant autocorrelation. How are they computed? To find the answer, I had to look at the acf’s code:

# xx is the series, conf.level is the confidence level - think 0.95 for instance conf = qnorm((1 + conf.level)/2)/sqrt(sum(!is.na(xx)))

The above is nothing else but computing confidence intervals for normal (0,1) distribution. It still puzzles me (I couldn’t find an answer quickly when I thought about it) why the correlation coefficients at different lags are distributed normally in (0,1), but that’s irrelevant.

The last question is how to trade extreme autocorrelation – do we bet that the autocorrelation persists, or do we bet that the autocorrelation fades? There are two variables here: the sign of the correlation and the sign of the last day return. A table is helpful.

Correlation Sign | Return Sign | Persisting Signal | Fading Signal |
---|---|---|---|

1 | 1 | 1 | -1 |

1 | -1 | -1 | 1 |

-1 | 1 | -1 | 1 |

-1 | -1 | 1 | -1 |

My gut tells me to go with the fading – markets are efficient, especially this one. So, let’s run a quick backtest, putting all this mumbo-jumbo into code.

high.acf = function(xx,conf.level=0.95,lag=1) { aa = acf(xx,plot=F) conf = qnorm((1 + conf.level)/2)/sqrt(sum(!is.na(xx))) if(abs(aa$acf[lag+1,1,1]) > conf) sign(aa$acf[lag+1,1,1]) else 0 } backtest.acf = function(rets, n=21, conf.level=0.95, lag=1, fade=F, dates="2004/2013") { aa = na.trim(rollapplyr(rets, width=n+lag, FUN=high.acf, conf.level=conf.level, lag=lag)) bb = merge(rets, aa, all=F) ind = sign(bb[,1]*bb[,2]) if(fade) ind = -ind cc = merge(rets, lag.xts(ind), all=F) dd = cc[,1]*cc[,2] strat = dd[dates] n.win = NROW(which(as.numeric(strat) > 0, arr.ind=T)) n.trades = NROW(which(as.numeric(strat) != 0, arr.ind=T)) str = paste(round(n.win/n.trades*100,2), "% [", n.win, "/", n.trades, "]", sep="") print(str) } # About 3 (trading) months of history backtest.acf(spy.rets, dates="2004/2013", fade=T, n=63) # [1] "54.88% [45/82]"

The percentage looks ok, but the sample is small (only 82 opportunities over 10 years). Is it possible that there is some opportunity here – too early to tell. Certainly worth looking further though, IMO.

Hi: For an i.i.d sequence, the sample autocorrelations are N(0,1/n) asymptotically. I forget who proved this but it’s a useful result because all the “tests for non i.i.d” ( portmanteau, box-leung ) developed from it.

Last thing: Page 13 of the link below gives the statement and the reference.

http://www.statistica.unimib.it/utenti/p_matteo/lessons/SSE/stationarity.pdf

Thanks a lot. Now I can figure out how easy is the proof that escaped me. ðŸ™‚

If I understand what you are doing (not sure that I do), the idea is to wait until significant autocorrelation is detected, and then trade on it. Here’s my concern, the more data we look at, the higher the chance that something will appear significant even when it isn’t. Let’s say I draw a sample of size 100 from a standard normal distribution and calculate the mean. The standard error is 1/sqrt(100) which is 1/10. So there’s approximately a 5% chance of my sample having a mean of 0.2 or higher (1 tailed test). My point is that if we draw 20 samples, our chance of have one (or more) with a mean of 0.2 is much higher than 5%. The conf value in the routine takes the sqrt of the the sample size. But I think this may only be considering 1 sample.

A good point. Let’s turn the question a bit though. If the event occurred X number of times, is there a number Y < X which should make us strongly suspicious that there is a real pattern? (After checking the code for bugs that is). I think one can make the case that such number exists.