My plans are to use Python for the rest of this series. The main reasons are algorithm related, but irrelevant for the time being, however, I decided to re-write some of the code I posted recently and I found the experience rather surprising.

The experience was quite positive, but let me explain: I have always liked Python for scripting, but this time, I enjoyed usiworking on a more data science type of project. Over the years I have tried using Python for what I use R in daily life, but this was the first time when using Python didn’t feel like using a crutch. Not only that, that there are certain tasks where I really appreciated Python’s approach. Combine that with the fact that Python is a universal language with a really nice syntax (something I can’t say about R) and I think I’ve got a new winner.

The full code is on GitHub. Here is the labeling function:

import pandas as pd import numpy as np import instrumentdb as idb import math import pickle def good_entries(ohlcv, min_days = 3, days_out = 15, vola_len = 35, days_pos = 0.6, stop_loss = 1.5): if days_out <= min_days: raise RuntimeError('days_out must be greater than min_days.') hi = ohlcv['high'] lo = ohlcv['low'] cl = ohlcv['close'] # Compute returns rets = cl.pct_change() erets = rets.pow(2).ewm(span=vola_len).mean().pow(1/2) # erets = rets.ewm(span=vola_len).mean() res = np.zeros(len(erets)) days = np.zeros(len(erets)) for ii in range(min_days, days_out): hh = hi.rolling(window = ii).max().shift(-ii) ll = lo.rolling(window = ii).min().shift(-ii) hi_ratio = (hh/cl - 1)/erets lo_ratio = (ll/cl - 1)/erets dd = math.ceil(days_pos * ii) longs = (hi_ratio > dd) & (-lo_ratio < stop_loss) longs = np.where(longs.notnull() & (longs != 0), 1, 0) shorts = (-lo_ratio > dd) & (hi_ratio < stop_loss) shorts = np.where(shorts.notnull() & (shorts != 0), -1, 0) both = np.where(longs == 1, 1, shorts) new_days = ii*((res == 0) & (both != 0)).astype(int) res = np.where(res != 0, res, both) days = np.where(days != 0, days, new_days) full_df = pd.DataFrame({'entry' : res, 'days' : days, 'erets' : erets}, index = ohlcv.index) oppo_df = full_df[full_df['entry'] != 0] return {'full' : full_df, 'oppo' : oppo_df}

The code is succinct and easy to read. It was easy, intuitive, straightforward and fun to re-write. What else can one ask for?

There are differences of course. For instance, Python’s *pct_change* returns percentages, not fractions (1.2 instead of 0.012) – one needs to take this into account, as I did in the code, or simply roll one’s own one-liner to calculate returns – I don’t think there is one readily available in Pandas. On the other hand – I especially like the new-ish Pandas’ syntax around the exponential and rolling functionality. At the same time, I pity the souls who used the old syntax and now have to migrate to the new. ðŸ™‚

erets are calculated differently, the corresponding function would be EMA from talib. for ii in range(min_days, days_out) also goes up to 14 only, to make it as in R one should use days_out+1. So the results are very different. Is it intended to be so as you’ve changed your assumptions?

Calculating erets was not the purporse of this post. The approach in this post is what I have been using lately. It’s pretty much what Robert Carver has in his book “Systematic Trading”, but in general, it’s based on the standard deviation formula assuming zero mean.

The min_days, days_out are not related to EMA – they are simple how many days I want to give for opportunities to develop. Unfortunately, with this approach, pretty much all opportunities are on the shortest limit.

Yes, I understand that. Just wanted to clarify that R code from http://www.quintuitive.com/2016/08/13/labeling-opportunities-price-series/ and this Python one produce very different results. Thanks a lot for ForecastLocations class. Use it with some mods extensively now. ðŸ™‚