Walk-Forward Analysis for Multiple Series

The examples in the previous posts in this series (see here, and here and here) all use a single series, the 10 year daily history for the heating oil back-adjusted contract. A further extension is to try to learn patterns from multiple series.

At each step of the walk-forward loop, we consider all previous data points as history, i.e. information we can use to forecast the current point. Also, each sample (a data frame row containing features) is independent, thus, we can simply interleave various time series, sorting them by the time component. We only need to add the “symbol” column, so that we know which series the sample came from. Here is a sample code:

# all_data: a dictionary with a few series per instrument
# series: a list of strings, contains the symbols of the series to stack
def stack_series(all_data, series):
    res = None
    for ss in series:
        # First concatenate the "response" and the "features"
        tt = pd.concat([all_data[ss]['full']['entry'], all_data[ss]['features']], axis=1).dropna()
        # Insert the symbol (repeated) as the first column
        tt.insert(0, 'symbol', pd.Series(ss, index=tt.index))
        # stack the results
        if res is None:
            res = tt
            res = res.append(tt)
    # Sort the stacked data frame by the time column
    res = res.sort_index()
    return res

# A sample how to use the above function
aa = stack_series(all_data, ['HO2','CL2','RB2'])

The above code produces the following data frame:

This data frame can be used as input to the walk-forward framework developed in the previous posts. There is only one minor detail. Until now, all the example code has been going through the series point by point. We can do that for this series as well, but it is wrong. Why?

Data snooping – looking at future data. In other words, we must exclude the reading for heating oil (HO2) for 2016-09-23 when we predict the direction for oil (CL2) for the same date. This is one example where the ForecastLocations abstraction developed earlier comes handy. Simply set each period to start on the first occurrence of a given date, and to end on the last occurrence. All forecasts for a given date will use only points with previous date as history. Clean and beautiful!

Leave a Reply