The previous post in this series, showed a way to identify trading opportunities. The approach I implemented used time series daily data to identify good entry points in terms of risk-reward. The natural next step is to try to make use of these opportunities using machine learning.

To refresh: the output of the previous post was a time series which had three possible states: short, none, long. Each state identified whether an opportunity for size-able return exists over the next few days. This was generated using data from the future. That’s the dependent variable.

What data shall we use for the independent variables, for the predictors? No idea. ðŸ™‚ To get something going, I decided to try a few features which might have some influence:

- The returns of each of the last five days.
- The returns over the last 252 (a year), 63 (a quarter), 21 (a month) and 5 (a week) days. This is the momentum, or the rate of change.
- The channel position (as a fraction) over the last 252 (a year), 63 (a quarter), 21 (a month) and 5 (a week) days. In other words, take the min and the max over the period and compute where within the channel the last close is. For instance, if the last close is a new high, the value of this column is 1. A new low is 0.

All returns were normalized to daily, using an exponential moving average over the returns. For the people who understand code easier – he is pretty much the final version to build the features:

build.features = function(xx, days = 1:5, vola.len=35) { stopifnot(has.Cl(xx)) close = Cl(xx) rets = ROC(close, n = days[1], type = "discrete") arets = rets / sqrt(EMA(rets * rets, n = vola.len)) # Subtract one because we want to align at the end of the day res = na.trim(lag.xts(arets, k=days[1]-1)) colNames = c(paste(sep = "", days[1], "D_LAG")) for (nn in tail(days, -1)) { res = merge(res, na.trim(lag.xts(arets, k=nn-1)), all = F) colNames = append(colNames, paste(sep = "", nn, "D_LAG")) } # Annual, quarterly, monthly and weekly: # 1. returns (normalized to daily) # 2. min/max channel position cc = c(252, 63, 21, 5) for (nn in cc) { # Compute the return rr = ROC(close, n = nn, type = "discrete") # Normalize to daily return rr = (1 + rr)^(1/nn) - 1 # Normalize by volatility rr = rr / sqrt(EMA(rets * rets, n = vola.len)) res = merge(res, na.trim(rr), all = F) colNames = append(colNames, paste0(nn, "D_RET")) res = merge(res, na.trim((close - runMin(close, n=nn))/(runMax(close, n=nn) - runMin(close, n=nn))), all=F) colNames = append(colNames, paste0(nn, "D_CHANNEL")) } res = reclass(res, xx) colnames(res) = colNames return(res) }

Next, using my parallel framework based on the the caret package, I run a few machine learning methods: *lda*, *qda*, *svm*, *xgb trees* and a few others. Despite the parallelism, pretty much all algorithms took more than a day to finish for about six years of data on a single time series (I used heating oil back-adjusted futures).

Unfortunately, time spent didn’t translate into meaningful results. The only algorithm which gave somewhat interesting forecasts was the *qda*. Everything else had pretty much flat, out of the market, forecast for all points. The reason could be that the opportunity series was generated using a conservative approach – only about 9% of the days were identified as opportunities.

The *qda* generated about 121 non-flat forecasts, a bit less than the actual. Out of these 121, 108 were wrong and 13 were spot-on. So why do I say that the method looks promising? The opportunities are generated in a way to guarantee that a certain profit target is achieved without hitting a smaller stop loss. For more details refer to the previous post in this series. If neither of these goals is achieved, we will exit the market after a specific number of days. Thus, we may still end up making money. That’s why it looks promising to me.

So what’s next? Not sure yet, but there are a few directions I can think of:

- Improving the predictors
- More sophisticated machine learning approach
- Using history from multiple financial instruments for learning

Will you be giving a post on backtesting? Or have any resources for backtesting a simulated portfolio in R?

Not sure about that, but there are some resources/posts on this blog.