A Tensorflow Exercise

A previous post in this series, implemented the Walk Forward Loop on top of Microsoft’s CNTK. There was interest in a Google’s Tensorflow implementation, which seems to be the more popular framework in this domain, I decided to put what have already done with Tensorflow.

The full source code is here. It will not work without modifications – it needs data, and some of my modules. These are pretty easy to fix though.

tensorflow_fit_predict is where the work is done. The first point of interest is how we use Tensorflow’s computation graph:

    with tf.Graph().as_default():
        input = tf.placeholder(tf.float32, [None, nfeatures])
        label = tf.placeholder(tf.float32, [None, nlabels])

The above code creates a new graph. This is a slight deviation of how Tensorflow is usually used (take a look at pretty much any example), but there is a good reason for that. tensorflow_fit_predict is called in a loop. If we use the default graph for each iteration, the old nodes are not deleted, looks like they stay around. The result is that we run out of memory fairly quickly. I found out this the hard way – the code still contains some debug logging I used to investigate this issue.

Next comes the deep neural network:

input = tf.placeholder(tf.float32, [None, nfeatures])
label = tf.placeholder(tf.float32, [None, nlabels])

nconv1 = 32
cw1 = tf.Variable(tf.random_normal([1, 3, 1, nconv1]))
cb1 = tf.Variable(tf.random_normal([nconv1]))
conv_input = tf.reshape(input, shape=[-1, 1, nfeatures, 1])
cl1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv_input, cw1, strides=[1, 1, 1, 1], padding='SAME'), cb1))
mp1 = tf.nn.max_pool(cl1, ksize=[1, 1, 2, 1], strides=[1, 1, 2, 1], padding='SAME')

nhidden1 = 128
w1 = tf.Variable(tf.random_normal([378*32, nhidden1]))
b1 = tf.Variable(tf.random_normal([nhidden1]))
fc_input = tf.reshape(mp1, [-1, w1.get_shape().as_list()[0]])
l1 = tf.nn.relu(tf.add(tf.matmul(fc_input, w1), b1))

nhidden2 = 128
w2 = tf.Variable(tf.random_normal([nhidden1, nhidden2]))
b2 = tf.Variable(tf.random_normal([nhidden2]))
l2 = tf.nn.relu(tf.add(tf.matmul(l1, w2), b2))

w3 = tf.Variable(tf.random_normal([nhidden2, nlabels]))
b3 = tf.Variable(tf.random_normal([nlabels]))
model = tf.add(tf.matmul(l2, w3), b3)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=label))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_fores = tf.equal(tf.argmax(model, 1), tf.argmax(label, 1))
accuracy = tf.reduce_mean(tf.cast(correct_fores, tf.float32))

The network is similar to what I used before, except that this time I also threw in a convolution layer.

As for experimenting, what I thought I will do first is to feed the network raw data, no engineered features in other words. The volatility adjusted returns of the last three years (756 features per row). The label is the sign of the return for the following day. The first run was quite promising. The correct sign guesses were more than 52%! Keep in mind, HO is a coin toss. The second run was disappointing (yeah, you gotta do that stuff in real life) – the correct guesses 50.71%. Furthermore, the overlap between the first and the second run was 52%. That’s a lot of randomness, and it’s unacceptable. Likely the next thing to focus on. Either that, or, as they say: “garbage in, garbage out” (meaning feature engineering is paramount).

Leave a Reply