Trading with Support Vector Machines (SVM)

by ivannp on November 30, 2012

Finally all the stars have aligned and I can confidently devote some time for back-testing of new trading systems, and Support Vector Machines (SVM) are the new “toy” which is going to keep me busy for a while.

SVMs are a well-known tool from the area of supervised Machine Learning, and they are used both for classification and regression. For more details refer to the literature.

It seems to me that the most intuitive application for trading is regression, so let’s start by building an SVM regression model.

Following our experience with ARMA+GARCH models, we will start by trying to forecast returns, instead of prices. Likewise, in our first tests, we will use only the returns of the previous 5 days as the features determining the return of a particular day. We will start with history of 500 days as the training set.

In more mathematical terms, for the training set we have N features, for each of them we have M samples. We also have M responses.

\begin{pmatrix} F^{1}_{1} & F^{2}_{1} & F^{3}_{1} & ... & F^{N}_{1} \\ F^{1}_{2} & F^{2}_{2} & F^{3}_{2} & ... & F^{N}_{2} \\ F^{1}_{3} & F^{2}_{3} & F^{3}_{3} & ... & F^{N}_{3} \\ \cdots \\ F^{1}_{M} & F^{2}_{M} & F^{3}_{M} & ... & F^{N}_{M} \end{pmatrix} \Rightarrow \begin{pmatrix} R_{1} \\ R_{2} \\ R_{3} \\ \cdots \\ R_{M} \end{pmatrix}

Given a row of feature values, the left matrix, the SVM is trained to produce the response value. In our specific example, we have five columns (features), each column corresponding to the returns with a different lag (from 1 to 5). We have 500 samples and the corresponding responses.

\begin{pmatrix} F^{1}_{1} & F^{2}_{1} & F^{3}_{1} & F^{4}_{1} & F^{5}_{1} \\ F^{1}_{2} & F^{2}_{2} & F^{3}_{2} & F^{4}_{2} & F^{5}_{2} \\ F^{1}_{3} & F^{2}_{3} & F^{3}_{3} & F^{4}_{3} & F^{5}_{3} \\ \cdots \\ F^{1}_{500} & F^{2}_{500} & F^{3}_{500} & F^{4}_{500} & F^{5}_{500} \end{pmatrix} \Rightarrow \begin{pmatrix} R_{1} \\ R_{2} \\ R_{3} \\ \cdots \\ R_{500} \end{pmatrix}

Once the SVM is trained on this set, we can start feeding it with sets of five features, corresponding to the returns for the five previous days, and the SVM will provide us with the response, which is the forecasted return. For example, after training the SVM on the previous 500 days, we will use the returns for days 500, 499, 498, 497 and 496 (these are ours (F^{1}_{501}, F^{2}_{501}, F^{3}_{501}, F^{4}_{501}, F^{5}_{501}) as the input to obtain the forecasted return for day 501.

From all the packages available in R, I decided to choose the e1071 package. A close second choice was the kernlab package, which I am still planning to try in the future.

Then I tried a few strategies. First I tried something very similar to the ARMA+GARCH approach – the lagged returns from the five previous days. I was quite surprised to see this strategy performing better than the ARMA+GARCH (this is the home land of the ARMA+GARCH and I would have been quite happy just with comparable performance)!

Next, I tried to the same five features, but trying to select the best subset. The selection was done using a greedy approach, starting with 0 features, and interactively adding the feature which minimizes the error best. This approach improved things further.

Finally, I tried a different approach with about a dozen features. The features included returns over different period of time (1-day, 2-day, 5-day, etc), some statistics (mean, median, sd, etc) and volume. I used the same greedy approach to select features. This final system showed a very good performance as well, but it took a hell of a time to run.

Time to end this post, the back-testing results have to wait. Until then you can play with the full source code yourself. Here is an example of using it:

require(e1071)
require(quantmod)
require(parallel)

source("e1071.R")

tt = get( getSymbols( "^GSPC", from="1900-01-01" ) )

rets = na.trim( ROC( Cl( tt ), type="discrete" ) )

# only the first two features so that we may see some results in reasonable time
data = svmFeatures( tt )[,c(1,2)]

rets = rets[index(data)]
data = data[index(rets)]

stopifnot( NROW( rets ) == NROW( data ) )

fore = svmComputeForecasts(
               data=data,
               history=500,
               response=rets,
               cores=8,
               trace=T,
               modelPeriod="days",
               startDate="1959-12-28",
               endDate="1959-12-31",
               featureSelection="all" )

{ 38 comments… read them below or add one }

Regis December 2, 2012 at 04:51

Hello, is it possible to have an example of application if your function?
Regards

Reply

ivannp December 4, 2012 at 15:22

Updated the post.

Reply

Miguel December 5, 2012 at 11:50

Great post about SVM’s. Thanks for sharing. I’m an R newbie, could you please tell me what is the difference between doing this

xtsData = data[index(data)[(startIndex-history):(startIndex-1)]]
xtsResponse = response[index(response)[(startIndex-history):(startIndex-1)]]

and doing this?

xtsData = data[(startIndex-history):(startIndex-1)]
xtsResponse = response[(startIndex-history):(startIndex-1)]

Reply

ivannp December 5, 2012 at 14:07

Probably none.:)

Reply

Mike December 14, 2012 at 06:31

Hi!
In windows doesn’t work because of multicore problem.
One more thing that I don’t understand is reflected in this to rows of the code
rets = rets[index(data)]
data = data[index(rets)]

In my opinion it’s more effective to merge series
smth like

mydtret <- na.exclude(merge(rets,data)
and to have only one argument=object to function call instead of 2

Interesting work, thanks
Mike

Reply

ivannp December 14, 2012 at 11:04

Argh, Windows – I use it seldom lately. Quite surprised still, since the parallel package is part of the base R distribution now. Hopefully it will be addressed soon.

Meanwhile, how about not using parallel execution? Also there are other packages providing parallel execution, but that would be more work.

You are right about the merge – I still wonder why I did it this way this time.:)

Reply

Mike December 14, 2012 at 13:16

I’m receiving errors.
Now the error is
> data = svmFeatures( tt )[,c(1,2)]
Error in match.fun(FUN) : object ‘skewness’ not found

But when I make manually data object I receive error in prediction
svmComputeOneForecast <- function related to dimensions and
sampling="cross"

It's difficult to me to debug

Cheers,
Mike

Reply

ivannp December 14, 2012 at 13:39

skewness comes from the PerformanceAnalytics package, which you need to install from CRAN. Adding require(PerformanceAnalytics) as the first line of svmFeatures should address the first problem.

Reply

Mike December 15, 2012 at 04:24

now error is
Error in merge.xts(res, xts(na.trim(lag(rollmean(rets, k = 21, align = “right”), :
length of ‘dimnames’ [2] not equal to array extent
it seems that in windows code needs a lot of changes

Reply

ivannp December 15, 2012 at 10:44

Mike, I never meant the code to be used directly (until now I was providing only snippets), but I am surprised that R on Windows is so ugly. Not sure what’s your goal, but to analyze the strategies performance, you can use the indicator series which are already computed.

Reply

Mike December 15, 2012 at 11:48

It’s just pure academic interest on SVM. I used to work with clusters, PCA and I am curious how SVM is doing the same work.
In windows a lot of error are related to objects with dates as xts is or data frames.
UNIX is better but all brokers give API for windows. Some of them in Java and only this we may use from UNIX.
I don’t like win architecture but it’s a habit already and I don’t have time to change OS.

Cheers,
Mike

Reply

ivannp December 15, 2012 at 12:40

I just tried it on windows 7, 64 bit, R version 2.15.2. I get a warning from svmFeatures, which I know how to fix (calling sd on an xts/zoo object does interesting conversion to a matrix), but no problems. Running:

source("c:/e1071.R")
tt = get( getSymbols( "^GSPC", from="1900-01-01" ) )
rets = na.trim( ROC( Cl( tt ), type="discrete" ) )
data = svmFeatures( tt )[,c(1,2)]
# There were 50 or more warnings (use warnings() to see the first 50)
require(parallel)
rets = rets[index(data)]
data = data[index(rets)]
fore = svmComputeForecasts(
             data=data,
             response=rets,
             history=500,
             cores=1,
             trace=TRUE,
             modelPeriod="days",
             startDate="1959-12-28",
             endDate="1959-12-31",
             featureSelection="all")

worked for me.

Reply

Mike December 15, 2012 at 13:47

Thanks!
I’ll try.
One question if you don’t mind
Why are you using get with function cetSymbols from quantmod package?
I use call vers
Example
SPY <- getSymbols('SPY', auto.assign = FALSE)
You have a lot to compute and get consume memory and takes time to obtain objects name
as a string var

Reply

Mike December 15, 2012 at 14:00

The same error
I’m using R 2.15.1
But I’m surprised with this result before call
> head(data)

1 function (…, list = character(), package = NULL, lib.loc = NULL,
2 verbose = getOption(“verbose”), envir = .GlobalEnv)
3 {
4 fileExt <- function(x) {
5 db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)
6 ans <- sub(".*\\\\.", "", x)

It seems that data is reserved word
And now I don't know what is going to features function

Reply

ivannp December 29, 2012 at 18:25

I am using R 2.15.2 on 64-bit linux. On my system data is also a function, but I don’t think this is the problem.

Reply

ivannp December 15, 2012 at 13:51

That’s probably a better way. Have seen it before, but didn’t pay attention till now to realize that it does exactly that. Thanks.

Reply

cosmos December 29, 2012 at 16:08

Hello,

What do you mean when you speak about the lagged returns? IS it the value 500,499,498,497,496 for the prediction of the return 501?

Thx

Reply

ivannp December 29, 2012 at 18:23

Yes, these returns are used to forecast the return of 501, which is used as the position (long/short depending on the sign) for 501. The model is trained on the previous data only, i.e. the last “row” in the training set is 499,498,497,496,495.

Reply

cosmos December 30, 2012 at 04:44

Ok, that’s sound good, and for the return do you use this formule :

return t=log(Price t+1/Price t)

And after you normalize (center and scale ) the return?

Regards

Reply

ivannp December 30, 2012 at 17:26

You can find these details in the accompanied source code. I use the discrete returns:

rets = na.trim(ROC(close, type="discrete"))

I don’t don’t center the data, neither I scale it. However, both packages which I use (caret+kernlab and e1071) seem to do that by default.

Reply

cosmos December 30, 2012 at 04:50

Another things, sorry, but you build your model on the previous 499 values, this is your training data set,
And afte,r to forecast the 501 return you use the last row as test values?
Why build a model on the 500 values and use the last row i.e 500,499 etc another time to test or to train?

Regards

Reply

cosmos December 30, 2012 at 05:02

in your tutorial you said: For example, after training the SVM on the previous 500 days, we will use the returns for days 500, 499, 498, 497 and 496 (these are ours as the input to obtain the forecasted return for day 501.

And in your response you said:Yes, these returns are used to forecast the return of 501, which is used as the position (long/short depending on the sign) for 501. The model is trained on the previous data only, i.e. the last “row” in the training set is 499,498,497,496,495

Are you ok if i claim,i build the model on the 500 values, and to forecasted the 501 i use the model build previously and give him to “eat” the last row of the training dataset ie:500,499etc as input to obtain the forecasted 501′s value?
Regards and sorry for the multiple post, but i have got trouble with my network

Reply

ivannp December 30, 2012 at 17:32

In general, I think it should be ok to use the approach you are suggesting. Notice that this approach will be much more demanding to implement in real-life trading, or at least the way I do it.
If one is to compute all reasonable closing prices in advance, the new approach will need to do one fit for each individual close and one predict. Compare to the other approach – the model is fit based on the previous 500 days, then only the new data (different for each close) is being fed. In other words, here we have one fit, many predicts.

Reply

Louis January 18, 2013 at 03:12

Hi there,

Really admire your work mate,Im new to SVM with R, Im trying to use ten technical indictors to forecast stock index tommorrow’s movements(up or down as 1 or -1), when you train the model, what would you use as response column…if you are at 07/07/2012…you use that day movement or 08/07′s movement for 07/07′s technical indicators…if you use 07/07′s input…how do you forcast tommorrow’s results as you dont know tommorrow’s inputs…I hope that makes sense…

any help would be appreciated

thanks mate

Louis

Reply

ivannp January 18, 2013 at 11:40

Hi Louis, if the target is only long/short, you may want to use classification instead of regression. Most packages silently switch to classification if the response vector is “factors” rather than numericals. Check http://quantumfinancier.wordpress.com/2010/06/26/support-vector-machine-rsi-system/ for an example.

Before starting the step forward process, I align the desired response with the data that predicts it. In general, this is a lag(1) of the predictors (data). This aligns the desired value of 08/07 (the return of 08/07) with its predictors (the indicators as of 07/07 close). Then I fit the model using the data by excluding the last date. In other words, let’s say I am about to predict 08/07. The last date I look when fitting is 07/07. Since the data is already lagged, the last date for the indicators is in fact 06/07. Finally to forecast one day ahead, I feed the data for 08/07, which corresponds to the indicators on 07/07. Now I can compare the predicted with the real value. This can relatively-easily be applied in practice, as long as the indicators are based on closing prices only. Anything using high/low/volume/etc (stochastics for instance), needs to either be lagged by an extra day, or the trading cannot happen right at the close … see my other post: http://www.quintuitive.com/2012/08/23/trading-at-the-close-the-mechanics/.

Hope this helps

Reply

Louis January 20, 2013 at 21:21

Thats really helpful, thanks a lot mate…

most papers about SVM online are quite pointless…I guess not everyone likes to share their secrets haha

Best regards

Louis

Reply

Louis January 21, 2013 at 01:46

sorry, one more question: for the model with the best performance, what inputs did you use to forecast tommorrow’s return..

AS quote”Finally, I tried a different approach with about a dozen features. The features included returns over different period of time (1-day, 2-day, 5-day, etc), some statistics (mean, median, sd, etc) and volume. I used the same greedy approach to select features. This final system showed a very good performance as well, but it took a hell of a time to run.”

thanks

Reply

ivannp January 21, 2013 at 09:38

The model with the best performance, by just a bit, was using lagged (by 1,2,3,4 and 5 days) daily returns. The only twist was that it was using a greedy method to select the “best” combination of features.

Reply

Louis January 21, 2013 at 03:32

I just finished my work, I adopted a classification approach to predict tommorrow’s movement…

the accuracy is almost 86% on a 140days testdate.. I wonder if this is a little bit too high…

Nevertheless…Thank you so much

Louis

Reply

ivannp January 21, 2013 at 09:40

Sounds too high to me. I’d certainly review it a few more times before taking it seriously – that’s like having a crystal ball.;)

Reply

Louis January 29, 2013 at 02:56

Hi mate,

Here is an idea, when you select the training set, try to divede your training set into two equal parts, one is the response returns above the average, the other is the returns below the average…this way, your forecast accuracy might not change much, but you can forecast the BIG movements much more accurate…

the performance should be improved…I’ve tried this and the results are quite nice…please let me know if this works for you..cheers

Best regards

Louis

Reply

Krzysztof April 15, 2013 at 18:03

Hello,

My names is Krzysztof and I’m an author of this thread

http://www.trade2win.com/boards/metatrader/105880-3rd-generation-nn-deep-learning-deep-belief-nets-restricted-boltzmann-machines.html

In this project I was using different machine learning algorithms (including SVM) trying to predict if certain trade with fixed TP/SL will be successful or not so classification.

Some comments/questions to your code

1)Using cross validation (tune.control, cross=cross). Are you sure that using cross validation is correct in case of time series ?? In such case training will be done on past data and by definition of time series it introduce future leak and inflated results. Perhaps fix option should be used instead.

2)Kernel constants. The most important parameters for SVM are kernel constants. Why those fixed values are used and not another ??

3)Sample size/Choice of SVM. The training time of this SVM is exponentially correlated to the training sample size as far as I remember. But there are other SVMs specialized for use with bigger training sample size e.g. I was training my system on up to 150 000 bars within few hours. For example pegassos, libocas or SVMLight. In such case it is possible e.g. to use hourly data and bigger sample size which will give much better accuracy.

4) Why not use RWeka package and have the access perhaps more than hundred ML algos
used in WEKA ??

Krzysztof

Reply

ivannp April 15, 2013 at 19:30

Hi Krzysztof,

Quite an interesting thread, certainly something to take a look at – thanks for bringing it up. What do you mean by “TP/SL”?

1) The code is mostly for illustration purposes, and I didn’t find it very useful overall. I agree, it is dubious to consider each row in the training set as independent from the rest in the case of time series with their correlations, but it is an approach. According to http://robjhyndman.com/hyndsight/crossvalidation/, a better approach might be a one-step cross-validation using the past data, which should be relatively easy to add.
2) Nothing special about these kernel constants – just a set to iterate through. I think I used values found in examples using the package(s). The optimization chooses the best.
3) Good to know. Currently I don’t trade intraday, so I never had the need to deal with such big amounts of data. For daily data, increasing the window improves the performance, but up to a point from my experience. A bit counter intuitive that longer time window will improve the forecasts indefinitely- at some point the data is too much (as a ratio vs the number of parameters optimized).
4) Haven’t really looked into RWeka. Originally I started with e1071 because it’s fast and very stable. Then I switched to caret, which provides unified interface to many other packages, including some algorithms from RWeka. In my experience, I haven’t found an ML approach that really stands out for prediction of time series data. Have you?

Thanks for sharing,
Ivan

Reply

Krzysztof April 16, 2013 at 02:43

small correction

1)Using cross validation (tune.control, cross=cross). Are you sure that using cross validation is correct in case of time series ?? In such case training will be done on past data and by definition of time series it introduce future leak and inflated results. Perhaps fix option should be used instead.

should be ‘In such case training will be done on future data’

Reply

ivannp April 16, 2013 at 07:23

My implementation doesn’t (at least by design;)) use future data. If the history is 500 points, the cross validation is used within these 500 to determine the best fit.

Reply

Krzysztof April 28, 2013 at 19:39

TP/SL – take profit/ stop loss. System was trying to trade on every bar (buy or sell) than label
was determined (trade successful/unsuccessful) depends if TP or SL was hit first.

“If the history is 500 points, the cross validation is used within these 500 to determine the best fit”

Well in such case in sample results will be inflated and has no meaning. For example if you divide training period for 10 sub periods and train on periods 1-8 and 10 than evaluate on period 9, future leak occurs as period 10 is after period 9 so in the future.

this how cross validation works i think and it is applicable only for static patterns. Only walk forward has a meaning in case of time series.

Krzysztof

Reply

Régis September 15, 2013 at 12:39

Hello,

When i try the code i find the ROC 2 for the row 29/03/195: Is the difference of : value of 28/03 and 24/03 divide by value of 24/03, why is not the value of 27/03- value of 24/03 divide by value of 24/03….

Best regards

Reply

ivannp September 16, 2013 at 12:46

Hi,

The code is just for illustration. You can modify it anyway you want and use whatever features you think may improve the predictions. The ROC function comes from the TTR package.

Regards

Reply

Leave a Comment