Walk-Forward Strategy Performance

The previous post introduced forecasting using multiple series and also suggested another form of improvement – namely filtering out low probability forecasts. Can we improve the forecasts by any of these approaches, and if yes, by how much?

Let’s first start with the very basic scenario. Using the heating oil continuous contract, the forecasted and the actual “opportunities” (for more details see here, here and here), we can build the confusion matrix:

Forecasted
Actual Short Out Long
Short 107 277 90
Out 100 544 112
Long 73 253 82

Besides the confusion matrix, we can choose many other metrics from the scikit learn module. However, all these metrics are general purpose, and they fail to capture fully the domain specific knowledge of this case. What is special, is that overall we can ignore any error which leaves us out of the market, if there is an opportunity. The reason is simple – we can’t lose money by staying out. One simple way to quantify this is to use good vs bad forecasts at the interesting locations in the matrix:

# The input is the confusion matrix. The order for the rows,
# and the columns, is "Short", "Out", "Long"
def metric1(aa):
    good = aa[0,0] + aa[2,2]
    bad = aa[0,2] + aa[1,0] ++ aa[1,2] + aa[2,0]
    return good / bad

Based on this metric, we get the following results:

Strategy Good vs Bad
Single Series 50%
Multiple Series 58%
Multiple with Pruning 54%

The devil is usually in the details. Nothing really conclusive, but a few observations.

  • For the multiple series run, I simply added the oil contract, thus, there were two series in total. The heating oil and oil are correlated most of the time but not extremely so. Adding more series had diminishing improvements, mostly because the number of out forecasts became more and more predominant.
  • For the probability pruning, I tried different thresholds. Nothing really worked. This is consistent with my experience in other similar situations.

We’ve got some results now, but without an actual back-test, how can we say whether they are worth it or not? It’s hard to tell, but it seems to me that this results are quite promising. We need to return to the opportunity definition. In that definition, we assume some sort of a stop loss, a profit target and a time exit. There is a vague relationship between the stop loss and the profit target. The profit target is larger than the stop loss, in general, but not by a factor of two most of the time. Since we aren’t getting more then 60% success rate (and we need 1:1 for the Good vs Bad metric), it is hard to make the argument that the strategies are making money? My gut feeling tells me that these strategies are already likely to be positive for two reasons:

  • The profit target achieved is big. An observation scanning through a few opportunity data frames.
  • Not all wrong positions will hit the stop loss – some of them will quit after 15 days.

For a more serious discussion, we need a real back-test. Another option is to tweak the profit factor in the opportunity labeling in such way that it guarantees a certain ratio against the stop loss.

Leave a Reply