Are R^2s Useful In Finance? Hypothesis-Driven Development In Reverse

This post will shed light on the values of R^2s behind two rather simplistic strategies — the simple 10 month SMA, and its relative, the 10 month momentum (which is simply a difference of SMAs, as Alpha Architect showed in their book DIY Financial Advisor.

Not too long ago, a friend of mine named Josh asked me a question regarding R^2s in finance. He’s finishing up his PhD in statistics at Stanford, so when people like that ask me questions, I’d like to answer them. His assertion is that in some instances, models that have less than perfect predictive power (EG R^2s of .4, for instance), can still deliver very promising predictions, and that if someone were to have a financial model that was able to explain 40% of the variance of returns, they could happily retire with that model making them very wealthy. Indeed, .4 is a very optimistic outlook (to put it lightly), as this post will show.

In order to illustrate this example, I took two “staple” strategies — buy SPY when its closing monthly price is above its ten month simple moving average, and when its ten month momentum (basically the difference of a ten month moving average and its lag) is positive. While these models are simplistic, they are ubiquitously talked about, and many momentum strategies are an improvement upon these baseline, “out-of-the-box” strategies.

Here’s the code to do that:

require(xts)
require(quantmod)
require(PerformanceAnalytics)
require(TTR)

getSymbols('SPY', from = '1990-01-01', src = 'yahoo')
adjustedPrices <- Ad(SPY)
monthlyAdj <- to.monthly(adjustedPrices, OHLC=TRUE)

spySMA <- SMA(Cl(monthlyAdj), 10)
spyROC <- ROC(Cl(monthlyAdj), 10)
spyRets <- Return.calculate(Cl(monthlyAdj))

smaRatio <- Cl(monthlyAdj)/spySMA - 1
smaSig <- smaRatio > 0
rocSig <- spyROC > 0

smaRets <- lag(smaSig) * spyRets
rocRets <- lag(rocSig) * spyRets

And here are the results:

strats <- na.omit(cbind(smaRets, rocRets, spyRets))
colnames(strats) <- c("SMA10", "MOM10", "BuyHold")
charts.PerformanceSummary(strats, main = "strategies")
rbind(table.AnnualizedReturns(strats), maxDrawdown(strats), CalmarRatio(strats))

                              SMA10     MOM10   BuyHold
Annualized Return         0.0975000 0.1039000 0.0893000
Annualized Std Dev        0.1043000 0.1080000 0.1479000
Annualized Sharpe (Rf=0%) 0.9346000 0.9616000 0.6035000
Worst Drawdown            0.1663487 0.1656176 0.5078482
Calmar Ratio              0.5860332 0.6270657 0.1757849

In short, the SMA10 and the 10-month momentum (aka ROC 10 aka MOM10) both handily outperform the buy and hold, not only in absolute returns, but especially in risk-adjusted returns (Sharpe and Calmar ratios). Again, simplistic analysis, and many models get much more sophisticated than this, but once again, simple, illustrative example using two strategies that outperform a benchmark (over the long term, anyway).

Now, the question is, what was the R^2 of these models? To answer this, I took a rolling five-year window that essentially asked: how well did these quantities (the ratio between the closing price and the moving average – 1, or the ten month momentum) predict the next month’s returns? That is, what proportion of the variance is explained through the monthly returns regressed against the previous month’s signals in numerical form (perhaps not the best framing, as the signal is binary as opposed to continuous which is what is being regressed, but let’s set that aside, again, for the sake of illustration).

Here’s the code to generate the answer.

predictorsAndPredicted <- na.omit(cbind(lag(smaRatio), lag(spyROC), spyRets))
R2s <- list()
for(i in 1:(nrow(predictorsAndPredicted)-59))  { #rolling five-year regression
  subset <- predictorsAndPredicted[i:(i+59),]
  smaLM <- lm(subset[,3]~subset[,1])
  smaR2 <- summary(smaLM)$r.squared
  rocLM <- lm(subset[,3]~subset[,2])
  rocR2 <- summary(rocLM)$r.squared
  R2row <- xts(cbind(smaR2, rocR2), order.by=last(index(subset)))
  R2s[[i]] <- R2row
}
R2s <- do.call(rbind, R2s)
par(mfrow=c(1,1))
colnames(R2s) <- c("SMA", "Momentum")
chart.TimeSeries(R2s, main = "R2s", legend.loc = 'topleft')

And the answer, in pictorial form:

In short, even in the best case scenarios, namely, crises which provide momentum/trend-following/call it what you will its raison d’etre, that is, its risk management appeal, the proportion of variance explained by the actual signal quantities was very small. In the best of times, around 20%. But then again, think about what the R^2 value actually is–it’s the percentage of variance explained by a predictor. If a small set of signals (let alone one) was able to explain the majority of the change in the returns of the S&P 500, or even a not-insignificant portion, such a person would stand to become very wealthy. More to the point, given that two strategies that handily outperform the market have R^2s that are exceptionally low for extended periods of time, it goes to show that holding the R^2 up as some form of statistical holy grail certainly is incorrect in the general sense, and anyone who does so either is painting with too broad a brush, is creating disingenuous arguments, or should simply attempt to understand another field which may not work the way their intuition tells them.

Thanks for reading.

A Book Review of ReSolve Asset Management’s Adaptive Asset Allocation

This review will review the “Adaptive Asset Allocation: Dynamic Global Portfolios to Profit in Good Times – and Bad” book by the people at ReSolve Asset Management. Overall, this book is a definite must-read for those who have never been exposed to the ideas within it. However, when it comes to a solution that can be fully replicated, this book is lacking.

Okay, it’s been a while since I reviewed my last book, DIY Financial Advisor, from the awesome people at Alpha Architect. This book in my opinion, is set up in a similar sort of format.

This is the structure of the book, and my reviews along with it:

Part 1: Why in the heck you actually need to have a diversified portfolio, and why a diversified portfolio is a good thing. In a world in which there is so much emphasis put on single-security performance, this is certainly something that absolutely must be stated for those not familiar with portfolio theory. It highlights the example of two people–one from Abbott Labs, and one from Enron, who had so much of their savings concentrated in their company’s stock. Mr. Abbott got hit hard and changed his outlook on how to save for retirement, and Mr. Enron was never heard from again. Long story short: a diversified portfolio is good, and a properly diversified portfolio can offset one asset’s zigs with another asset’s zags. This is the key to establishing a stream of returns that will help meet financial goals. Basically, this is your common sense story (humans love being told stories) so as to motivate you to read the rest of the book. It does its job, though for someone like me, it’s more akin to a big “wait for it, wait for it…and there’s the reason why we should read on, as expected”.

Part 2: Something not often brought up in many corners of the quant world (because it’s real life boring stuff) is the importance not only of average returns, but *when* those returns are achieved. Namely, imagine your everyday saver. At the beginning of their careers, they’re taking home less salary and have less money in their retirement portfolio (or speculation portfolio, but the book uses retirement portfolio). As they get into middle age and closer to retirement, they have a lot more money in said retirement portfolio. Thus, strong returns are most vital when there is more cash available *to* the portfolio, and the difference between mediocre returns at the beginning and strong returns at the end of one’s working life as opposed to vice versa is *astronomical* and cannot be understated. Furthermore, once *in* retirement, strong returns in the early years matter far more than returns in the later years once money has been withdrawn out of the portfolio (though I’d hope that a portfolio’s returns can be so strong that one can simply “live off the interest”). Or, put more intuitively: when you have $10,000 in your portfolio, a 20% drawdown doesn’t exactly hurt because you can make more money and put more into your retirement account. But when you’re 62 and have $500,000 and suddenly lose 30% of everything, well, that’s massive. How much an investor wants to avoid such a scenario cannot be understated. Warren Buffett once said that if you can’t bear to lose 50% of everything, you shouldn’t be in stocks. I really like this part of the book because it shows just how dangerous the ideas of “a 50% drawdown is unavoidable” and other “stay invested for the long haul” refrains are. Essentially, this part of the book makes a resounding statement that any financial adviser keeping his or her clients invested in equities when they’re near retirement age is doing something not very advisable, to put it lightly. In my opinion, those who advise pension funds should especially keep this section of the book in mind, since for some people, the long-term may be coming to an end, and what matters is not only steady returns, but to make sure the strategy doesn’t fall off a cliff and destroy decades of hard-earned savings.

Part 3: This part is also one that is a very important read. First off, it lays out in clear terms that the long-term forward-looking valuations for equities are at rock bottom. That is, the expected forward 15-year returns are very low, using approximately 75 years of evidence. Currently, according to the book, equity valuations imply a *negative* 15-year forward return. However, one thing I *will* take issue with is that while forward-looking long-term returns for equities may be very low, if one believed this chart and only invested in the stock market when forecast 15-year returns were above the long term average, one would have missed out on both the 2003-2007 bull runs, *and* the one since 2009 that’s just about over. So, while the book makes a strong case for caution, readers should also take the chart with a grain of salt in my opinion. However, another aspect of portfolio construction that this book covers is how to construct a robust (assets for any economic environment) and coherent (asset classes balanced in number) universe for implementation with any asset allocation algorithm. I think this bears repeating: universe selection is an extremely important topic in the discussion of asset allocation, yet there is very little discussion about it. Most research/topics simply take some “conventional universe”, such as “all stocks on the NYSE”, or “all the stocks in the S&P 500”, or “the entire set of the 50-60 most liquid futures” without consideration for robustness and coherence. This book is the first source I’ve seen that actually puts this topic under a magnifying glass besides “finger in the air pick and choose”.

Part 4: and here’s where I level my main criticism at this book. For those that have read “Adaptive Asset Allocation: A Primer”, this section of the book is basically one giant copy and paste. It’s all one large buildup to “momentum rank + min-variance optimization”. All well and good, until there’s very little detail beyond the basics as to how the minimum variance portfolio was constructed. Namely, what exactly is the minimum variance algorithm in use? Is it one of the poor variants susceptible to numerical instability inherent in inverting sample covariance matrices? Or is it a heuristic like David Varadi’s minimum variance and minimum correlation algorithm? The one feeling I absolutely could not shake was that this book had a perfect opportunity to present a robust approach to minimum variance, and instead, it’s long on concept, short on details. While the theory of “maximize return for unit risk” is all well and good, the actual algorithm to implement that theory into practice is not trivial, with the solutions taught to undergrads and master’s students having some well-known weaknesses. On top of this, one thing that got hammered into my head in the past was that ranking *also* had a weakness at the inclusion/exclusion point. E.G. if, out of ten assets, the fifth asset had a momentum of say, 10.9%, and the sixth asset had a momentum of 10.8%, how are we so sure the fifth is so much better? And while I realize that this book was ultimately meant to be a primer, in my opinion, it would have been a no-objections five-star if there were an appendix that actually went into some detail on how to go from the simple concepts and included a small numerical example of some algorithms that may address the well-known weaknesses. This doesn’t mean Greek/mathematical jargon. Just an appendix that acknowledged that not every reader is someone only picking up his first or second book about systematic investing, and that some of us are familiar with the “whys” and are more interested in the “hows”. Furthermore, I’d really love to know where the authors of this book got their data to back-date some of these ETFs into the 90s.

Part 5: some more formal research on topics already covered in the rest of the book–namely a section about how many independent bets one can take as the number of assets grow, if I remember it correctly. Long story short? You *easily* get the most bang for your buck among disparate asset classes, such as treasuries of various duration, commodities, developed vs. emerging equities, and so on, as opposed to trying to pick among stocks in the same asset class (though there’s some potential for alpha there…just…a lot less than you imagine). So in case the idea of asset class selection, not stock selection wasn’t beaten into the reader’s head before this point, this part should do the trick. The other research paper is something I briefly skimmed over which went into more depth about volatility and retirement portfolios, though I felt that the book covered this topic earlier on to a sufficient degree by building up the intuition using very understandable scenarios.

So that’s the review of the book. Overall, it’s a very solid piece of writing, and as far as establishing the *why*, it does an absolutely superb job. For those that aren’t familiar with the concepts in this book, this is definitely a must-read, and ASAP.

However, for those familiar with most of the concepts and looking for a detailed “how” procedure, this book does not deliver as much as I would have liked. And I realize that while it’s a bad idea to publish secret sauce, I bought this book in the hope of being exposed to a new algorithm presented in the understandable and intuitive language that the rest of the book was written in, and was left wanting.

Still, that by no means diminishes the impact of the rest of the book. For those who are more likely to be its target audience, it’s a 5/5. For those that wanted some specifics, it still has its gem on universe construction.

Overall, I rate it a 4/5.

Thanks for reading.

On The Relationship Between the SMA and Momentum

Happy new year. This post will be a quick one covering the relationship between the simple moving average and time series momentum. The implication is that one can potentially derive better time series momentum indicators than the classical one applied in so many papers.

Okay, so the main idea for this post is quite simple:

I’m sure we’re all familiar with classical momentum. That is, the price now compared to the price however long ago (3 months, 10 months, 12 months, etc.). E.G. P(now) – P(10)
And I’m sure everyone is familiar with the simple moving average indicator, as well. E.G. SMA(10).

Well, as it turns out, these two quantities are actually related.

It turns out, if instead of expressing momentum as the difference of two numbers, it is expressed as the sum of returns, it can be written (for a 10 month momentum) as:

MOM_10 = return of this month + return of last month + return of 2 months ago + … + return of 9 months ago, for a total of 10 months in our little example.

This can be written as MOM_10 = (P(0) – P(1)) + (P(1) – P(2)) + … + (P(9) – P(10)). (Each difference within parentheses denotes one month’s worth of returns.)

Which can then be rewritten by associative arithmetic as: (P(0) + P(1) + … + P(9)) – (P(1) + P(2) + … + P(10)).

In other words, momentum — aka the difference between two prices, can be rewritten as the difference between two cumulative sums of prices. And what is a simple moving average? Simply a cumulative sum of prices divided by however many prices summed over.

Here’s some R code to demonstrate.

require(quantmod)
require(TTR)
require(PerformanceAnalytics)

getSymbols('SPY', from = '1990-01-01')
monthlySPY <- Ad(SPY)[endpoints(SPY, on = 'months')]
monthlySPYrets <- Return.calculate(monthlySPY)
#dividing by 10 since that's the moving average period for comparison
signalTSMOM <- (monthlySPY - lag(monthlySPY, 10))/10 
signalDiffMA <- diff(SMA(monthlySPY, 10))

# rounding just 
sum(round(signalTSMOM, 3)==round(signalDiffMA, 3), na.rm=TRUE)

With the resulting number of times these two signals are equal:

[1] 267

In short, every time.

Now, what exactly is the punchline of this little example? Here’s the punchline:

The simple moving average is…fairly simplistic as far as filters go. It works as a pedagogical example, but it has some well known weaknesses regarding lag, windowing effects, and so on.

Here’s a toy example how one can get a different momentum signal by changing the filter.

toyStrat <- monthlySPYrets * lag(signalTSMOM > 0)

emaSignal <- diff(EMA(monthlySPY, 10))
emaStrat <- monthlySPYrets * lag(emaSignal > 0)

comparison <- cbind(toyStrat, emaStrat)
colnames(comparison) <- c("DiffSMA10", "DiffEMA10")
charts.PerformanceSummary(comparison)
table.AnnualizedReturns(comparison)

With the following results:

                          DiffSMA10 DiffEMA10
Annualized Return            0.1051    0.0937
Annualized Std Dev           0.1086    0.1076
Annualized Sharpe (Rf=0%)    0.9680    0.8706

While the difference of EMA10 strategy didn’t do better than the difference of SMA10 (aka standard 10-month momentum), that’s not the point. The point is that the momentum signal is derived from a simple moving average filter, and that by using a different filter, one can still use a momentum type of strategy.

Or, put differently, the main/general takeaway here is that momentum is the slope of a filter, and one can compute momentum in an infinite number of ways depending on the filter used, and can come up with a myriad of different momentum strategies.

Thanks for reading.

NOTE: I am currently contracting in Chicago, and am always open to networking. Contact me at my email at [email protected] or find me on my LinkedIn here.

A First Attempt At Applying Ensemble Filters

This post will outline a first failed attempt at applying the ensemble filter methodology to try and come up with a weighting process on SPY that should theoretically be a gradual process to shift from conviction between a bull market, a bear market, and anywhere in between. This is a follow-up post to this blog post.

So, my thinking went like this: in a bull market, as one transitions from responsiveness to smoothness, responsive filters should be higher than smooth filters, and vice versa, as there’s generally a trade-off between the two. In fact, in my particular formulation, the quantity of the square root of the EMA of squared returns punishes any deviation from a flat line altogether (although inspired by Basel’s measure of volatility, which is the square root of the 18-day EMA of squared returns), while the responsiveness quantity punishes any deviation from the time series of the realized prices. Whether these are the two best measures of smoothness and responsiveness is a topic I’d certainly appreciate feedback on.

In any case, an idea I had on the top of my head was that in addition to having a way of weighing multiple filters by their responsiveness (deviation from price action) and smoothness (deviation from a flat line), that by taking the sums of the sign of the difference between one filter and its neighbor on the responsiveness to smoothness spectrum, provided enough ensemble filters (say, 101, so there are 100 differences), one would obtain a way to move from full conviction of a bull market, to a bear market, to anything in between, and have this be a smooth process that doesn’t have schizophrenic swings of conviction.

Here’s the code to do this on SPY from inception to 2003:

require(TTR)
require(quantmod)
require(PerformanceAnalytics)

getSymbols('SPY', from = '1990-01-01')

smas <- list()
for(i in 2:250) {
  smas[[i]] <- SMA(Ad(SPY), n = i)
}
smas <- do.call(cbind, smas)

xtsApply <- function(x, FUN, n, ...) {
  out <- xts(apply(x, 2, FUN, n = n, ...), order.by=index(x))
  return(out)
}

sumIsNa <- function(x){
  return(sum(is.na(x)))
}

ensembleFilter <- function(data, filters, n = 20, conviction = 1, emphasisSmooth = .51) {
  
  # smoothness error
  filtRets <- Return.calculate(filters)
  sqFiltRets <- filtRets * filtRets * 100 #multiply by 100 to prevent instability
  smoothnessError <- sqrt(xtsApply(sqFiltRets, EMA, n = n))
  
  # responsiveness error
  repX <- xts(matrix(data, nrow = nrow(filters), ncol=ncol(filters)), 
              order.by = index(filters))
  dataFilterReturns <- repX/filters - 1
  sqDataFilterQuotient <- dataFilterReturns * dataFilterReturns * 100 #multiply by 100 to prevent instability
  responseError <- sqrt(xtsApply(sqDataFilterQuotient, EMA, n = n))
  
  # place smoothness and responsiveness errors on same notional quantities
  meanSmoothError <- rowMeans(smoothnessError)
  meanResponseError <- rowMeans(responseError)
  ratio <- meanSmoothError/meanResponseError
  ratio <- xts(matrix(ratio, nrow=nrow(filters), ncol=ncol(filters)),
               order.by=index(filters))
  responseError <- responseError * ratio
  
  # for each term in emphasisSmooth, create a separate filter
  ensembleFilters <- list()
  for(term in emphasisSmooth) {
    
    # compute total errors, raise them to a conviction power, find the normalized inverse
    totalError <- smoothnessError * term + responseError * (1-term)
    totalError <- totalError ^ conviction
    invTotalError <- 1/totalError
    normInvError <- invTotalError/rowSums(invTotalError)
    
    # ensemble filter is the sum of candidate filters in proportion
    # to the inverse of their total error
    tmp <- xts(rowSums(filters * normInvError), order.by=index(data))
    
    #NA out time in which one or more filters were NA
    initialNAs <- apply(filters, 1, sumIsNa) 
    tmp[initialNAs > 0] <- NA
    tmpName <- paste("emphasisSmooth", term, sep="_")
    colnames(tmp) <- tmpName
    ensembleFilters[[tmpName]] <- tmp
  }
  
  # compile the filters
  out <- do.call(cbind, ensembleFilters)
  return(out)
}

t1 <- Sys.time()
filts <- ensembleFilter(Ad(SPY), smas, n = 20, conviction = 2, emphasisSmooth = seq(0, 1, by=.01))
t2 <- Sys.time()

par(mfrow=c(3,1))
filtDiffs <- sign(filts[,1:100] - filts[,2:101])
sumDiffs <- xts(rowSums(filtDiffs), order.by=index(filtDiffs))

plot(Ad(SPY)["::2003"])
plot(sumDiffs["::2003"])
plot(diff(sumDiffs["::2003"]))

And here’s the very underwhelming result:

Essentially, while I expected to see changes in conviction of maybe 20 at most, instead, my indicator of sum of sign differences did exactly as I had hoped it wouldn’t, which is to be a very binary sort of mechanic. My intuition was that between an “obvious bull market” and an “obvious bear market” that some differences would be positive, some negative, and that they’d net each other out, and the conviction would be zero. Furthermore, that while any individual crossover is binary, all one hundred signs being either positive or negative would be a more gradual process. Apparently, this was not the case. To continue this train of thought later, one thing to try would be an all-pairs sign difference. Certainly, I don’t feel like giving up on this idea at this point, and, as usual, feedback would always be appreciated.

Thanks for reading.

NOTE: I am currently consulting in an analytics capacity in downtown Chicago. However, I am also looking for collaborators that wish to pursue interesting trading ideas. If you feel my skills may be of help to you, let’s talk. You can email me at [email protected], or find me on my LinkedIn here.

Review: Invoance’s TRAIDE application

This review will be about Inovance Tech’s TRAIDE system. It is an application geared towards letting retail investors apply proprietary machine learning algorithms to assist them in creating systematic trading strategies. Currently, my one-line review is that while I hope the company founders mean well, the application is still in an early stage, and so, should be checked out by potential users/venture capitalists as something with proof of potential, rather than a finished product ready for mass market. While this acts as a review, it’s also my thoughts as to how Inovance Tech can improve its product.

A bit of background: I have spoken several times to some of the company’s founders, who sound like individuals at about my age level (so, fellow millennials). Ultimately, the selling point is this:

Systematic trading is cool.
Machine learning is cool.
Therefore, applying machine learning to systematic trading is awesome! (And a surefire way to make profits, as Renaissance Technologies has shown.)

While this may sound a bit snarky, it’s also, in some ways, true. Machine learning has become the talk of the town, from IBM’s Watson (RenTec itself hired a bunch of speech recognition experts from IBM a couple of decades back), to Stanford’s self-driving car (invented by Sebastian Thrun, who now heads Udacity), to the Netflix prize, to god knows what Andrew Ng is doing with deep learning at Baidu. Considering how well machine learning has done at much more complex tasks than “create a half-decent systematic trading algorithm”, it shouldn’t be too much to ask this powerful field at the intersection of computer science and statistics to help the retail investor glued to watching charts generate a lot more return on his or her investments than through discretionary chart-watching and noise trading. To my understanding from conversations with Inovance Tech’s founders, this is explicitly their mission.

(Note: Dr. Wes Gray and Alpha Architect, in their book DIY Financial Advisor, have already established that listening to pundits, and trying to succeed at discretionary trading, is on a whole, a loser’s game)

However, I am not sure that Inovance’s TRAIDE application actually accomplishes this mission in its current state.

Here’s how it works:

Users select one asset at a time, and select a date range (data going back to Dec. 31, 2009). Assets are currently limited to highly liquid currency pairs, and can take the following settings: 1 hour, 2 hour, 4 hour, 6 hour, or daily bar time frames.

Users then select from a variety of indicators, ranging from technical (moving averages, oscillators, volume calculations, etc. Mostly an assortment of 20th century indicators, though the occasional adaptive moving average has managed to sneak in–namely KAMA–see my DSTrading package, and MAMA–aka the Mesa Adaptive Moving Average, from John Ehlers) to more esoteric ones such as some sentiment indicators. Here’s where things start to head south for me, however. Namely, that while it’s easy to add as many indicators as a user would like, there is basically no documentation on any of them, with no links to reference, etc., so users will have to bear the onus of actually understanding what each and every one of the indicators they select actually does, and whether or not those indicators are useful. The TRAIDE application makes zero effort (thus far) to actually get users acquainted with the purpose of these indicators, what their theoretical objective is (measure conviction in a trend, detect a trend, oscillator type indicator, etc.)

Furthermore, regarding indicator selections, users also specify one parameter setting for each indicator per strategy. E.G. if I had an EMA crossover, I’d have to create a new strategy for a 20/100 crossover, a 21/100 crossover, rather than specifying something like this:

short EMA: 20-60
long EMA: 80-200

Quantstrat itself has this functionality, and while I don’t recall covering parameter robustness checks/optimization (in other words, testing multiple parameter sets–whether one uses them for optimization or robustness is up to the user, not the functionality) in quantstrat on this blog specifically, this information very much exists in what I deem “the official quantstrat manual”, found here. In my opinion, the option of covering a range of values is mandatory so as to demonstrate that any given parameter setting is not a random fluke. Outside of quantstrat, I have demonstrated this methodology in my Hypothesis Driven Development posts, and in coming up for parameter selection for volatility trading.

Where TRAIDE may do something interesting, however, is that after the user specifies his indicators and parameters, its “proprietary machine learning” algorithms (WARNING: COMPLETELY BLACK BOX) determine for what range of values of the indicators in question generated the best results within the backtest, and assign them bullishness and bearishness scores. In other words, “looking backwards, these were the indicator values that did best over the course of the sample”. While there is definite value to exploring the relationships between indicators and future returns, I think that TRAIDE needs to do more in this area, such as reporting P-values, conviction, and so on.

For instance, if you combine enough indicators, your “rule” is a market order that’s simply the intersection of all of the ranges of your indicators. For instance, TRAIDE may tell a user that the strongest bullish signal when the difference of the moving averages is between 1 and 2, the ADX is between 20 and 25, the ATR is between 0.5 and 1, and so on. Each setting the user selects further narrows down the number of trades the simulation makes. In my opinion, there are more ways to explore the interplay of indicators than simply one giant AND statement, such as an “OR” statement, of some sort. (E.G. select all values, put on a trade when 3 out of 5 indicators fall into the selected bullish range in order to place more trades). While it may be wise to filter down trades to very rare instances if trading a massive amount of instruments, such that of several thousand possible instruments, only several are trading at any given time, with TRAIDE, a user selects only *one* asset class (currently, one currency pair) at a time, so I’m hoping to see TRAIDE create more functionality in terms of what constitutes a trading rule.

After the user selects both a long and a short rule (by simply filtering on indicator ranges that TRAIDE’s machine learning algorithms have said are good), TRAIDE turns that into a backtest with a long equity curve, short equity curve, total equity curve, and trade statistics for aggregate, long, and short trades. For instance, in quantstrat, one only receives aggregate trade statistics. Whether long or short, all that matters to quantstrat is whether or not the trade made or lost money. For sophisticated users, it’s trivial enough to turn one set of rules on or off, but TRAIDE does more to hold the user’s hand in that regard.

Lastly, TRAIDE then generates MetaTrader4 code for a user to download.

And that’s the process.

In my opinion, while what Inovance Tech has set out to do with TRAIDE is interesting, I wouldn’t recommend it in its current state. For sophisticated individuals that know how to go through a proper research process, TRAIDE is too stringent in terms of parameter settings (one at a time), pre-coded indicators (its target audience probably can’t program too well), and asset classes (again, one at a time). However, for retail investors, my issue with TRAIDE is this:

There is a whole assortment of undocumented indicators, which then move to black-box machine learning algorithms. The result is that the user has very little understanding of what the underlying algorithms actually do, and why the logic he or she is presented with is the output. While TRAIDE makes it trivially easy to generate any one given trading system, as multiple individuals have stated in slightly different ways before, writing a strategy is the easy part. Doing the work to understand if that strategy actually has an edge is much harder. Namely, checking its robustness, its predictive power, its sensitivity to various regimes, and so on. Given TRAIDE’s rather short data history (2010 onwards), and coupled with the opaqueness that the user operates under, my analogy would be this:

It’s like giving an inexperienced driver the keys to a sports car in a thick fog on a winding road. Nobody disputes that a sports car is awesome. However, the true burden of the work lies in making sure that the user doesn’t wind up smashing into a tree.

Overall, I like the TRAIDE application’s mission, and I think it may have potential as something for the retail investors that don’t intend to learn the ins-and-outs of coding a trading system in R (despite me demonstrating many times over how to put such systems together). I just think that there needs to be more work put into making sure that the results a user sees are indicative of an edge, rather than open the possibility of highly-flexible machine learning algorithms chasing ghosts in one of the noisiest and most dynamic data sets one can possibly find.

My recommendations are these:

1) Multiple asset classes
2) Allow parameter ranges, and cap the number of trials at any given point (E.G. 4 indicators with ten settings each = 10,000 possible trading systems = blow up the servers). To narrow down the number of trial runs, use techniques from experimental design to arrive at decent combinations. (I wish I remembered my response surface methodology techniques from my master’s degree about now!)
3) Allow modifications of order sizing (E.G. volatility targeting, stop losses), such as I wrote about in my hypothesis-driven development posts.
4) Provide *some* sort of documentation for the indicators, even if it’s as simple as a link to investopedia (preferably a lot more).
5) Far more output is necessary, especially for users who don’t program. Namely, to distinguish whether or not there is a legitimate edge, or if there are too few observations to reject the null hypothesis of random noise.
6) Far longer data histories. 2010 onwards just seems too short of a time-frame to be sure of a strategy’s efficacy, at least on daily data (may not be true for hourly).
7) Factor in transaction costs. Trading on an hourly time frame will mean far less P&L per trade than on a daily resolution. If MT4 charges a fixed ticket price, users need to know how this factors into their strategy.
8) Lastly, dogfooding. When I spoke last time with Inovance Tech’s founders, they claimed they were using their own algorithms to create a forex strategy, which was doing well in live trading. By the time more of these suggestions are implemented, it’d be interesting to see if they have a track record as a fund, in addition to as a software provider.

If all of these things are accounted for and automated, the product will hopefully accomplish its mission of bringing systematic trading and machine learning to more people. I think TRAIDE has potential, and I’m hoping that its staff will realize that potential.

Thanks for reading.

NOTE: I am currently contracting in downtown Chicago, and am always interested in networking with professionals in the systematic trading and systematic asset management/allocation spaces. Find my LinkedIn here.

EDIT: Today in my email (Dec. 3, 2015), I received a notice that Inovance was making TRAIDE completely free. Perhaps they want a bunch more feedback on it?

A Filter Selection Method Inspired From Statistics

This post will demonstrate a method to create an ensemble filter based on a trade-off between smoothness and responsiveness, two properties looked for in a filter. An ideal filter would both be responsive to price action so as to not hold incorrect positions, while also be smooth, so as to not incur false signals and unnecessary transaction costs.

So, ever since my volatility trading strategy, using three very naive filters (all SMAs) completely missed a 27% month in XIV, I’ve decided to try and improve ways to create better indicators in trend following. Now, under the realization that there can potentially be tons of complex filters in existence, I decided instead to focus on a way to create ensemble filters, by using an analogy from statistics/machine learning.

In static data analysis, for a regression or classification task, there is a trade-off between bias and variance. In a nutshell, variance is bad because of the possibility of overfitting on a few irregular observations, and bias is bad because of the possibility of underfitting legitimate data. Similarly, with filtering time series, there are similar concerns, except bias is called lag, and variance can be thought of as a “whipsawing” indicator. Essentially, an ideal indicator would move quickly with the data, while at the same time, not possess a myriad of small bumps-and-reverses along the way, which may send false signals to a trading strategy.

So, here’s how my simple algorithm works:

The inputs to the function are the following:

A) The time series of the data you’re trying to filter
B) A collection of candidate filters
C) A period over which to measure smoothness and responsiveness, defined as the square root of the n-day EMA (2/(n+1) convention) of the following:
a) Responsiveness: the squared quantity of price/filter – 1
b) Smoothness: the squared quantity of filter(t)/filter(t-1) – 1 (aka R’s return.calculate) function
D) A conviction factor, to which power the errors will be raised. This should probably be between .5 and 3
E) A vector that defines the emphasis on smoothness (vs. emphasis on responsiveness), which should range from 0 to 1.

Here’s the code:

require(TTR)
require(quantmod)

getSymbols('SPY', from = '1990-01-01')

smas <- list()
for(i in 2:250) {
  smas[[i]] <- SMA(Ad(SPY), n = i)
}
smas <- do.call(cbind, smas)

xtsApply <- function(x, FUN, n, ...) {
  out <- xts(apply(x, 2, FUN, n = n, ...), order.by=index(x))
  return(out)
}

sumIsNa <- function(x){
  return(sum(is.na(x)))
}

This gets SPY data, and creates two utility functions–xtsApply, which is simply a column-based apply that replaces the original index that using a column-wise apply discards, and sumIsNa, which I use later for counting the numbers of NAs in a given row. It also creates my candidate filters, which, to keep things simple, are just SMAs 2-250.

Here’s the actual code of the function, with comments in the code itself to better explain the process from a technical level (for those still unfamiliar with R, look for the hashtags):

ensembleFilter <- function(data, filters, n = 20, conviction = 1, emphasisSmooth = .51) {
  
  # smoothness error
  filtRets <- Return.calculate(filters)
  sqFiltRets <- filtRets * filtRets * 100 #multiply by 100 to prevent instability
  smoothnessError <- sqrt(xtsApply(sqFiltRets, EMA, n = n))
  
  # responsiveness error
  repX <- xts(matrix(data, nrow = nrow(filters), ncol=ncol(filters)), 
              order.by = index(filters))
  dataFilterReturns <- repX/filters - 1
  sqDataFilterQuotient <- dataFilterReturns * dataFilterReturns * 100 #multiply by 100 to prevent instability
  responseError <- sqrt(xtsApply(sqDataFilterQuotient, EMA, n = n))
  
  # place smoothness and responsiveness errors on same notional quantities
  meanSmoothError <- rowMeans(smoothnessError)
  meanResponseError <- rowMeans(responseError)
  ratio <- meanSmoothError/meanResponseError
  ratio <- xts(matrix(ratio, nrow=nrow(filters), ncol=ncol(filters)),
               order.by=index(filters))
  responseError <- responseError * ratio
  
  # for each term in emphasisSmooth, create a separate filter
  ensembleFilters <- list()
  for(term in emphasisSmooth) {
    
    # compute total errors, raise them to a conviction power, find the normalized inverse
    totalError <- smoothnessError * term + responseError * (1-term)
    totalError <- totalError ^ conviction
    invTotalError <- 1/totalError
    normInvError <- invTotalError/rowSums(invTotalError)
    
    # ensemble filter is the sum of candidate filters in proportion
    # to the inverse of their total error
    tmp <- xts(rowSums(filters * normInvError), order.by=index(data))
    
    #NA out time in which one or more filters were NA
    initialNAs <- apply(filters, 1, sumIsNa) 
    tmp[initialNAs > 0] <- NA
    tmpName <- paste("emphasisSmooth", term, sep="_")
    colnames(tmp) <- tmpName
    ensembleFilters[[tmpName]] <- tmp
  }
  
  # compile the filters
  out <- do.call(cbind, ensembleFilters)
  return(out)
}

The vast majority of the computational time takes place in the two xtsApply calls. On 249 different simple moving averages, the process takes about 30 seconds.

Here’s the output, using a conviction factor of 2:

t1 <- Sys.time()
filts <- ensembleFilter(Ad(SPY), smas, n = 20, conviction = 2, emphasisSmooth = c(0, .05, .25, .5, .75, .95, 1))
t2 <- Sys.time()
print(t2-t1)


plot(Ad(SPY)['2007::2011'])
lines(filts[,1], col='blue', lwd=2)
lines(filts[,2], col='green', lwd = 2)
lines(filts[,3], col='orange', lwd = 2)
lines(filts[,4], col='brown', lwd = 2)
lines(filts[,5], col='maroon', lwd = 2)
lines(filts[,6], col='purple', lwd = 2)
lines(filts[,7], col='red', lwd = 2)

And here is an example, looking at SPY from 2007 through 2011.

In this case, I chose to go from blue to green, orange, brown, maroon, purple, and finally red for smoothness emphasis of 0, 5%, 25%, 50%, 75%, 95%, and 1, respectively.

Notice that the blue line is very wiggly, while the red line sometimes barely moves, such as during the 2011 drop-off.

One thing that I noticed in the course of putting this process together is something that eluded me earlier–namely, that naive trend-following strategies which are either fully long or fully short based on a crossover signal can lose money quickly in sideways markets.

However, theoretically, by finely varying the jumps between 0% to 100% emphasis on smoothness, whether in steps of 1% or finer, one can have a sort of “continuous” conviction, by simply adding up the signs of differences between various ensemble filters. In an “uptrend”, the difference as one moves from the most responsive to most smooth filter should constantly be positive, and vice versa.

In the interest of brevity, this post doesn’t even have a trading strategy attached to it. However, an implied trading strategy can be to be long or short the SPY depending on the sum of signs of the differences in filters as you move from responsiveness to smoothness. Of course, as the candidate filters are all SMAs, it probably wouldn’t be particularly spectacular. However, for those out there who use more complex filters, this may be a way to create ensembles out of various candidate filters, and create even better filters. Furthermore, I hope that given enough candidate filters and an objective way of selecting them, it would be possible to reduce the chances of creating an overfit trading system. However, anything with parameters can potentially be overfit, so that may be wishful thinking.

All in all, this is still a new idea for me. For instance, the filter to compute the error terms can probably be improved. The inspiration for an EMA 20 essentially came from how Basel computes volatility (if I recall, correctly, it uses the square root of an 18 day EMA of squared returns), and the very fact that I use an EMA can itself be improved upon (why an EMA instead of some other, more complex filter). In fact, I’m always open to how I can improve this concept (and others) from readers.

Thanks for reading.

NOTE: I am currently contracting in Chicago in an analytics capacity. If anyone would like to meet up, let me know. You can email me at [email protected], or contact me through my LinkedIn here.

How well can you scale your strategy?

This post will deal with a quick, finger in the air way of seeing how well a strategy scales–namely, how sensitive it is to latency between signal and execution, using a simple volatility trading strategy as an example. The signal will be the VIX/VXV ratio trading VXX and XIV, an idea I got from Volatility Made Simple’s amazing blog, particularly this post. The three signals compared will be the “magical thinking” signal (observe the close, buy the close, named from the ruleOrderProc setting in quantstrat), buy on next-day open, and buy on next-day close.

Let’s get started.

require(downloader)
require(PerformanceAnalytics)
require(IKTrading)
require(TTR)

download("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxvdailyprices.csv", 
         destfile="vxvData.csv")
download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT",
         destfile="longXIV.txt")
download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", 
         destfile="longVXX.txt") #requires downloader package
getSymbols('^VIX', from = '1990-01-01')


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxv <- xts(read.zoo("vxvData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
vixVxv <- Cl(VIX)/Cl(vxv)


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

vxxCloseRets <- Return.calculate(Cl(vxx))
vxxOpenRets <- Return.calculate(Op(vxx))
xivCloseRets <- Return.calculate(Cl(xiv))
xivOpenRets <- Return.calculate(Op(xiv))

vxxSig <- vixVxv > 1
xivSig <- 1-vxxSig

magicThinking <- vxxCloseRets * lag(vxxSig) + xivCloseRets * lag(xivSig)
nextOpen <- vxxOpenRets * lag(vxxSig, 2) + xivOpenRets * lag(xivSig, 2)
nextClose <- vxxCloseRets * lag(vxxSig, 2) + xivCloseRets * lag(xivSig, 2)
tradeWholeDay <- (nextOpen + nextClose)/2

compare <- na.omit(cbind(magicThinking, nextOpen, nextClose, tradeWholeDay))
colnames(compare) <- c("Magic Thinking", "Next Open", 
                       "Next Close", "Execute Through Next Day")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), 
      maxDrawdown(compare), CalmarRatio(compare))

par(mfrow=c(1,1))
chart.TimeSeries(log(cumprod(1+compare), base = 10), legend.loc='topleft', ylab='log base 10 of additional equity',
                 main = 'VIX vx. VXV different execution times')

So here’s the run-through. In addition to the magical thinking strategy (observe the close, buy that same close), I tested three other variants–a variant which transacts the next open, a variant which transacts the next close, and the average of those two. Effectively, I feel these three could give a sense of a strategy’s performance under more realistic conditions–that is, how well does the strategy perform if transacted throughout the day, assuming you’re managing a sum of money too large to just plow into the market in the closing minutes (and if you hope to get rich off of trading, you will have a larger sum of money than the amount you can apply magical thinking to). Ideally, I’d use VWAP pricing, but as that’s not available for free anywhere I know of, that means that readers can’t replicate it even if I had such data.

In any case, here are the results.

Equity curves:

Log scale (for Mr. Tony Cooper and others):

Stats:

                          Magic Thinking Next Open Next Close Execute Through Next Day
Annualized Return               0.814100 0.8922000  0.5932000                 0.821900
Annualized Std Dev              0.622800 0.6533000  0.6226000                 0.558100
Annualized Sharpe (Rf=0%)       1.307100 1.3656000  0.9529000                 1.472600
Worst Drawdown                  0.566122 0.5635336  0.6442294                 0.601014
Calmar Ratio                    1.437989 1.5831686  0.9208586                 1.367510

My reaction? The execute on next day’s close performance being vastly lower than the other configurations (and that deterioration occurring in the most recent years) essentially means that the fills will have to come pretty quickly at the beginning of the day. While the strategy seems somewhat scalable through the lens of this finger-in-the-air technique, in my opinion, if the first full day of possible execution after signal reception will tank a strategy from a 1.44 Calmar to a .92, that’s a massive drop-off, after holding everything else constant. In my opinion, I think this is quite a valid question to ask anyone who simply sells signals, as opposed to manages assets. Namely, how sensitive are the signals to execution on the next day? After all, unless those signals come at 3:55 PM, one is most likely going to be getting filled the next day.

Now, while this strategy is a bit of a tomato can in terms of how good volatility trading strategies can get (they can get a *lot* better in my opinion), I think it made for a simple little demonstration of this technique. Again, a huge thank you to Mr. Helmuth Vollmeier for so kindly keeping up his dropbox all this time for the volatility data!

Thanks for reading.

NOTE: I am currently contracting in a data science capacity in Chicago. You can email me at [email protected], or find me on my LinkedIn here. I’m always open to beers after work if you’re in the Chicago area.

NOTE 2: Today, on October 21, 2015, if you’re in Chicago, there’s a Chicago R Users Group conference at Jaks Tap at 6:00 PM. Free pizza, networking, and R, hosted by Paul Teetor, who’s a finance guy. Hope to see you there.

Volatility Stat-Arb Shenanigans

This post deals with an impossible-to-implement statistical arbitrage strategy using VXX and XIV. The strategy is simple: if the average daily return of VXX and XIV was positive, short both of them at the close. This strategy makes two assumptions of varying dubiousness: that one can “observe the close and act on the close”, and that one can short VXX and XIV.

So, recently, I decided to play around with everyone’s two favorite instruments on this blog–VXX and XIV, with the idea that “hey, these two instruments are diametrically opposed, so shouldn’t there be a stat-arb trade here?”

So, in order to do a lick-finger-in-the-air visualization, I implemented Mike Harris’s momersion indicator.

momersion <- function(R, n, returnLag = 1) {
  momentum <- sign(R * lag(R, returnLag))
  momentum[momentum < 0] <- 0
  momersion <- runSum(momentum, n = n)/n * 100
  colnames(momersion) <- "momersion"
  return(momersion)
}

And then I ran the spread through it.


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

volSpread <- xivRets + vxxRets
volSpreadMomersion <- momersion(volSpread, n = 252)
plot(volSpreadMomersion)

In other words, this spread is certainly mean-reverting at just about all times.

And here is the code for the results from 2011 onward, from when the XIV and VXX actually started trading.

#both sides
sig <- -lag(sign(volSpread))
longShort <- sig * volSpread
charts.PerformanceSummary(longShort['2011::'], main = 'long and short spread')

#long spread only
sig <- -lag(sign(volSpread))
sig[sig < 0] <- 0
longOnly <- sig * volSpread
charts.PerformanceSummary(longOnly['2011::'], main = 'long spread only')


#short spread only
sig <- -lag(sign(volSpread))
sig[sig > 0] <- 0
shortOnly <- sig * volSpread
charts.PerformanceSummary(shortOnly['2011::'], main = 'short spread only')

threeStrats <- na.omit(cbind(longShort, longOnly, shortOnly))["2011::"]
colnames(threeStrats) <- c("LongShort", "Long", "Short")
rbind(table.AnnualizedReturns(threeStrats), CalmarRatio(threeStrats))

Here are the equity curves:

Long-short:

Long-only:

Short-only:

With the following statistics:

                          LongShort      Long    Short
Annualized Return          0.115400 0.0015000 0.113600
Annualized Std Dev         0.049800 0.0412000 0.027900
Annualized Sharpe (Rf=0%)  2.317400 0.0374000 4.072100
Calmar Ratio               1.700522 0.0166862 7.430481

In other words, the short side is absolutely amazing as a trade–except for the one small fact of having it be impossible to actually execute, or at least as far as I’m aware. Anyhow, this was simply a for-fun post, but hopefully it served some purpose.

Thanks for reading.

NOTE: I am currently contracting and am looking to network in the Chicago area. You can find my LinkedIn here.

A Review of DIY Financial Advisor, by Gray, Vogel, and Foulke

This post will review the DIY Financial Advisor book, which I thought was a very solid read, and especially pertinent to those who are more beginners at investing (especially systematic investing). While it isn’t exactly perfect, it’s about as excellent a primer on investing as one will find out there that is accessible to the lay-person, in my opinion.

Okay, so, official announcement: I am starting a new section of posts called “Reviews”, which I received from being asked to review this book. Essentially, I believe that anyone that’s trying to create a good product that will help my readers deserves a spotlight, and I myself would like to know what cool and innovative financial services/products are coming about. For those who’d like exposure on this site, if you’re offering an affordable and innovative product or service that can be of use to an audience like mine, reach out to me.

Anyway, this past weekend, while relocating to Chicago, I had the pleasure of reading Alpha Architect’s (Gray, Vogel, Foulke) book “DIY financial advisor”, essentially making a case as to why a retail investor should be able to outperform the expert financial advisers that charge several percentage points a year to manage one’s wealth.
The book starts off by citing various famous studies showing how many subtle subconscious biases and fallacies human beings are susceptible to (there are plenty), such as falling for complexity, overconfidence, and so on—none of which emotionless computerized systems and models suffer from. Furthermore, it also goes on to provide several anecdotal examples of experts gone bust, such as Victor Niederhoffer, who blew up not once, but twice (and rumor has it he blew up a third time), and studies showing that systematic data analysis has shown to beat expert recommendations time and again—including when experts were armed with the outputs of the models themselves. Throw in some quotes from Jim Simons (CEO of the best hedge fund in the world, Renaissance Technologies), and the first part of the book can be summed up like this:

1) Your rank and file human beings are susceptible to many subconscious biases.

2) Don’t trust the recommendations of experts. Even simpler models have systematically outperformed said “experts”. Some experts have even blown up, multiple times even (E.G. Victor Niederhoffer).

3) Building an emotionless system will keep these human fallacies from wrecking your investment portfolio.

4) Sticking to a well thought-out system is a good idea, even when it’s uncomfortable—such as when a marine has to wear a Kevlar helmet, hold extra ammo, and extra water in a 126 degree Iraq desert (just ask Dr./Captain Gray!).

This is all well and good—essentially making a very strong case for why you should build a system, and let the system do the investment allocation heavy lifting for you.
Next, the book goes into the FACTS acronym of different manager selection—fees, access, complexity, taxes, and search. Fees: how much does it cost to have someone manage your investments? Pretty self-explanatory here. Access: how often can you pull your capital (EG a hedge fund that locks you up for a year especially when it loses money should be run from, and fast). Complexity: do you understand how the investments are managed? Taxes: long-term capital gains, or shorter-term? Generally, very few decent systems will be holding for a year or more, so in my opinion, expect to pay short-term taxes. Search: that is, how hard is it to find a good candidate? Given the sea of hedge funds (especially those with short-term track records, or only track records managing tiny amounts of money), how hard is it to find a manager who’ll beat the benchmark after fees? Answer: very difficult. In short, all the glitzy sophisticated managers you hear about? Far from a terrific deal, assuming you can even find one.

Continuing, the book goes into two separate anomalies that should form the foundation for any equity investment strategy – value, and momentum. The value system essentially goes long the top decile of the EBIT/TEV metric for the top 60% of market-cap companies traded on the NYSE every year. In my opinion, this is a system that is difficult to implement for the average investor in terms of managing the data process for this system, along with having the proper capital to allocate to all the various companies. That is, if you have a small amount of capital to invest, you might not be able to get that equal weight allocation across a hundred separate companies. However, I believe that with the QVAL and IVAL etfs (from Alpha Architect, and full disclosure, I have some of my IRA invested there), I think that the systematic value component can be readily accessed through these two funds.

The momentum strategy, however, is much simpler. There’s a momentum component, and a moving average component. There’s some math that shows that these two signals are related (a momentum signal is actually proportional to a difference of a moving average and its last value), and the ROBUST system that this book proposes is a combination of a momentum signal and an SMA signal. This is how it works. Assume you have $100,000 and 5 assets to invest in, for the sake of math. Divide the portfolio into a $50,000 momentum component and a $50,000 moving average component. Every month, allocate $10,000 to each of the five assets with a positive 12-month momentum, or stay in cash for that asset. Next, allocate another $10,000 to each of the five assets with a price above a 12-month simple moving average. It’s that simple, and given the recommended ETFs (commodities, bonds, foreign stocks, domestic stocks, real estate), it’s a system that most investors can rather easily implement, especially if they’ve been following my blog.

For those interested in more market anomalies (especially value anomalies), there’s a chapter which contains a large selection of academic papers that go back and forth on the efficacies of various anomalies and how well they can predict returns. So, for those interested, it’s there.

The book concludes with some potential pitfalls which a DIY investor should be aware of when running his or her own investments, which is essentially another psychology chapter.
Overall, in my opinion, this book is fairly solid in terms of reasons why a retail investor should take the plunge and manage his or her own investments in a systematic fashion. Namely, that flesh and blood advisers are prone to human error, and on top of that, usually charge an unjustifiably high fee, and then deliver lackluster performance. The book recommends a couple of simple systems, one of which I think anyone who can copy and paste some rudimentary R can follow (the ROBUST momentum system), and another which I think most stay-at-home investors should shy away from (the value system, simply because of the difficulty of dealing with all that data), and defer to either or both of Alpha Architect’s 2 ETFs.
In terms of momentum, there are the ALFA, GMOM, and MTUM tickers (do your homework, I’m long ALFA) for various differing exposures to this anomaly, for those that don’t want to pay the constant transaction costs/incur short-term taxes of running their own momentum strategy.

In terms of where this book comes up short, here are my two cents:
Tested over nearly a century, the risk-reward tradeoffs of these systems can still be frightening at times. That is, for a system that delivers a CAGR of around 15%, are you still willing to risk a 50% drawdown? Losing half of everything on the cusp of retirement sounds very scary, no matter the long-term upside.

Furthermore, this book keeps things simple, with an intended audience of mom and pop investors (who have historically underperformed the S&P 500!). While I think it accomplishes this, I think there could have been value added, even for such individuals, by outlining some ETFs or simple ETF/ETN trading systems that can diversify a portfolio. For instance, while volatility trading sounds very scary, in the context of providing diversification, it may be worth looking into. For instance, 2008 was a banner year for most volatility trading strategies that managed to go long and stay long volatility through the crisis. I myself still have very little knowledge of all of the various exotic ETFs that are popping up left and right, and I would look very favorably on a reputable source that can provide a tour of some that can provide respectable return diversification to a basic equities/fixed-income/real asset ETF-based portfolio, as outlined in one of the chapters in this book, and other books, such as Meb Faber’s Global Asset Allocation (a very cheap ebook).

One last thing that I’d like to touch on—this book is written in a very accessible style, and even the math (yes, math!) is understandable for someone that’s passed basic algebra. It’s something that can be read in one or two sittings, and I’d recommend it to anyone that’s a beginner in investing or systematic investing.

Overall, I give this book a solid 4/5 stars. It’s simple, easily understood, and brings systematic investing to the masses in a way that many people can replicate at home. However, I would have liked to see some beyond-the-basics content as well given the plethora of different ETFs.

Thanks for reading.

Hypothesis-Driven Development Part V: Stop-Loss, Deflating Sharpes, and Out-of-Sample

This post will demonstrate a stop-loss rule inspired by Andrew Lo’s paper “when do stop-loss rules stop losses”? Furthermore, it will demonstrate how to deflate a Sharpe ratio to account for the total number of trials conducted, which is presented in a paper written by David H. Bailey and Marcos Lopez De Prado. Lastly, the strategy will be tested on the out-of-sample ETFs, rather than the mutual funds that have been used up until now (which actually cannot be traded more than once every two months, but have been used simply for the purpose of demonstration).

First, however, I’d like to fix some code from the last post and append some results.

A reader asked about displaying the max drawdown for each of the previous rule-testing variants based off of volatility control, and Brian Peterson also recommended displaying max leverage, which this post will provide.

Here’s the updated rule backtest code:

ruleBacktest <- function(returns, nMonths, dailyReturns,
nSD=126, volTarget = .1) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- na.omit(xts(nMonthAverage, order.by = index(returns)))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(nMonthAverage))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
constantVolComponents <- cbind(dailyBacktest, 0)
out <- Return.portfolio(R = constantVolComponents, weights = wts)
out <- apply.monthly(out, Return.cumulative)
maxLeverage <- max(monthlyLeverage, na.rm = TRUE)
return(list(out, maxLeverage))
}

t1 <- Sys.time()
allPermutations <- list()
allDDs <- list()
leverages <- list()
for(i in seq(21, 252, by = 21)) {
monthVariants <- list()
ddVariants <- list()
leverageVariants <- list()
for(j in 1:12) {
trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
sharpe <- table.AnnualizedReturns(trial[[1]])[3,]
dd <- maxDrawdown(trial[[1]])
monthVariants[[j]] <- sharpe
ddVariants[[j]] <- dd
leverageVariants[[j]] <- trial[[2]]
}
allPermutations[[i]] <- do.call(c, monthVariants)
allDDs[[i]] <- do.call(c, ddVariants)
leverages[[i]] <- do.call(c, leverageVariants)
}
allPermutations <- do.call(rbind, allPermutations)
allDDs <- do.call(rbind, allDDs)
leverages <- do.call(rbind, leverages)
t2 <- Sys.time()
print(t2-t1)

Drawdowns:

Leverage:

Here are the results presented as a hypothesis test–a linear regression of drawdowns and leverage against momentum formation period and volatility calculation period:

ddLM <- lm(meltedDDs$MaxDD~meltedDDs$volFormation + meltedDDs$momentumFormation)
summary(ddLM)

Call:
lm(formula = meltedDDs$MaxDD ~ meltedDDs$volFormation + meltedDDs$momentumFormation)

Residuals:
Min 1Q Median 3Q Max
-0.08022 -0.03434 -0.00135 0.02911 0.20077

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.240146 0.010922 21.99 < 2e-16 ***
meltedDDs$volFormation -0.000484 0.000053 -9.13 6.5e-16 ***
meltedDDs$momentumFormation 0.001533 0.001112 1.38 0.17
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0461 on 141 degrees of freedom
Multiple R-squared: 0.377, Adjusted R-squared: 0.368
F-statistic: 42.6 on 2 and 141 DF, p-value: 3.32e-15

levLM <- lm(meltedLeverage$MaxLeverage~meltedLeverage$volFormation + meltedDDs$momentumFormation)
summary(levLM)

Call:
lm(formula = meltedLeverage$MaxLeverage ~ meltedLeverage$volFormation +
meltedDDs$momentumFormation)

Residuals:
Min 1Q Median 3Q Max
-0.9592 -0.5179 -0.0908 0.3679 3.1022

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.076870 0.164243 24.82 <2e-16 ***
meltedLeverage$volFormation -0.009916 0.000797 -12.45 <2e-16 ***
meltedDDs$momentumFormation 0.009869 0.016727 0.59 0.56
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.693 on 141 degrees of freedom
Multiple R-squared: 0.524, Adjusted R-squared: 0.517
F-statistic: 77.7 on 2 and 141 DF, p-value: <2e-16

Easy interpretation here–the shorter-term volatility estimates are unstable due to the one-asset rotation nature of the system. Particularly silly is using the one-month volatility estimate. Imagine the system just switched from the lowest-volatility instrument to the highest. It would then take excessive leverage and get blown up that month for no particularly good reason. A longer-term volatility estimate seems to do much better for this system. So, while the Sharpe is generally improved, the results become far more palatable when using a more stable calculation for volatility, which sets maximum leverage to about 2 when targeting an annualized volatility of 10%. Also, to note, the period to compute volatility matters far more than the momentum formation period when addressing volatility targeting, which lends credence (at least in this case) to so many people that say “the individual signal rules matter far less than the position-sizing rules!”. According to some, position sizing is often a way for people to mask only marginally effective (read: bad) strategies with a separate layer to create a better result. I’m not sure which side of the debate (even assuming there is one) I fall upon, but for what it’s worth, there it is.

Moving on, I want to test out one more rule, which is inspired by Andrew Lo’s stop-loss rule. Essentially, the way it works is this (to my interpretation): it evaluates a running standard deviation, and if the drawdown exceeds some threshold of the running standard deviation, to sit out for some fixed period of time, and then re-enter. According to Andrew Lo, stop-losses help momentum strategies, so it seems as good a rule to test as any.

However, rather than test different permutations of the stop rule on all 144 prior combinations of volatility-adjusted configurations, I’m going to take an ensemble strategy, inspired by a conversation I had with Adam Butler, the CEO of ReSolve Asset Management, who stated that “we know momentum exists, but we don’t know the perfect way to measure it”, from the section I just finished up and use an equal weight of all 12 of the momentum formation periods with a 252-day rolling annualized volatility calculation, and equal weight them every month.

Here are the base case results from that trial (bringing our total to 169).

strat <- list()
for(i in 1:12) {
strat[[i]] <- ruleBacktest(returns = monthRets, nMonths = i, dailyReturns = sample, nSD = 252)[[1]]
}
strat <- do.call(cbind, strat)
strat <- Return.portfolio(R = na.omit(strat), rebalance_on="months")

rbind(table.AnnualizedReturns(strat), maxDrawdown(strat), CalmarRatio(strat))

With the following result:

portfolio.returns
Annualized Return 0.12230
Annualized Std Dev 0.10420
Annualized Sharpe (Rf=0%) 1.17340
Worst Drawdown 0.09616
Calmar Ratio 1.27167

Of course, also worth nothing is that the annualized standard deviation is indeed very close to 10%, even with the ensemble. And it’s nice that there is a Sharpe past 1. Of course, given that these are mutual funds being backtested, these results are optimistic due to the unrealistic execution assumptions (can’t trade sooner than once every *two* months).

Anyway, let’s introduce our stop-loss rule, inspired by Andrew Lo’s paper.

loStopLoss <- function(returns, sdPeriod = 12, sdScaling = 1, sdThresh = 1.5, cooldown = 3) {
stratRets <- list()
count <- 1
stratComplete <- FALSE
originalRets <- returns
ddThresh <- -runSD(returns, n = sdPeriod) * sdThresh * sdScaling
while(!stratComplete) {
retDD <- PerformanceAnalytics:::Drawdowns(returns)
DDbreakthrough <- retDD < ddThresh & lag(retDD) > ddThresh
firstBreak <- which.max(DDbreakthrough) #first threshold breakthrough, if 1, we have no breakthrough
#the above line is unintuitive since this is a boolean vector, so it returns the first value of TRUE
if(firstBreak > 1) { #we have a drawdown breakthrough if this is true
stratRets[[count]] <- returns[1:firstBreak,] #subset returns through our threshold breakthrough
nextPoint <- firstBreak + cooldown + 1 #next point of re-entry is the point after the cooldown period
if(nextPoint <= (nrow(returns)-1)) { #if we can re-enter, subset the returns and return to top of loop
returns <- returns[nextPoint:nrow(returns),]
ddThresh <- ddThresh[nextPoint:nrow(ddThresh),]
count <- count+1
} else { #re-entry point is after data exhausted, end strategy
stratComplete <- TRUE
}
} else { #there are no more critical drawdown breakthroughs, end strategy
stratRets[[count]] <- returns
stratComplete <- TRUE
}
}
stratRets <- do.call(rbind, stratRets) #combine returns
expandRets <- cbind(stratRets, originalRets) #account for all the days we missed
expandRets[is.na(expandRets[,1]), 1] <- 0 #cash positions will be zero
rets <- expandRets[,1]
colnames(rets) <- paste(cooldown, sdThresh, sep="_")
return(rets)
}

Essentially, the way it works is like this: the function computes all the drawdowns for a return series, along with its running standard deviation (non-annualized–if you want to annualize it, change the sdScaling parameter to something like sqrt(12) for monthly or sqrt(252) for daily data). Next, it looks for when the drawdown crossed a critical threshold, then cuts off that portion of returns and standard deviation history, and moves ahead in history by the cooldown period specified, and repeats. Most of the code is simply dealing with corner cases (is there even a time to use the stop rule? What about iterating when there isn’t enough data left?), and then putting the results back together again.

In any case, for the sake of simplicity, this function doesn’t use two different time scales (IE compute volatility using daily data, make decisions monthly), so I’m sticking with using a 12-month rolling volatility, as opposed to 252 day rolling volatility multiplied by the square root of 21.

Finally, here are another 54 runs to see if Andrew Lo’s stop-loss rule works here. Essentially, the intuition behind this is that if the strategy breaks down, it’ll continue to break down, so it would be prudent to just turn it off for a little while.

Here are the trial runs:


threshVec <- seq(0, 2, by=.25)
cooldownVec <- c(1:6)
sharpes <- list()
params <- expand.grid(threshVec, cooldownVec)
for(i in 1:nrow(params)) {
configuration <- loStopLoss(returns = strat, sdThresh = params[i,1],
cooldown = params[i, 2])
sharpes[[i]] <- table.AnnualizedReturns(configuration)[3,]
}
sharpes <- do.call(c, sharpes)

loStoplossFrame <- cbind(params, sharpes)
loStoplossFrame$improvement <- loStoplossFrame[,3] - table.AnnualizedReturns(strat)[3,]

colnames(loStoplossFrame) <- c("Threshold", "Cooldown", "Sharpe", "Improvement")

And a plot of the results.

ggplot(loStoplossFrame, aes(x = Threshold, y = Cooldown, fill=Improvement)) +
geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = 0)

Result:

Result: at this level, and at this frequency (retaining the monthly decision-making process), the stop-loss rule basically does nothing in order to improve the risk-reward trade-off in the best case scenarios, and in most scenarios, simply hurts. 54 trials down the drain, bringing us up to 223 trials. So, what does the final result look like?

charts.PerformanceSummary(strat)

Here’s the final in-sample equity curve–and the first one featured in this entire series. This is, of course, a *feature* of hypothesis-driven development. Playing whack-a-mole with equity curve bumps is what is a textbook case of overfitting. So, without further ado:

And now we can see why stop-loss rules generally didn’t add any value to this strategy. Simply, it had very few periods of sustained losses at the monthly frequency, and thus, very little opportunity for a stop-loss rule to add value. Sure, the occasional negative month crept in, but there was no period of sustained losses. Furthermore, Yahoo Finance may not have perfect fidelity on dividends on mutual funds from the late 90s to early 2000s, so the initial flat performance may also be a rather conservative estimate on the strategy’s performance (then again, as I stated before, using mutual funds themselves is optimistic given the unrealistic execution assumptions, so maybe it cancels out). Now, if this equity curve were to be presented without any context, one may easily question whether or not it was curve-fit. To an extent, one can argue that the volatility computation period may be optimized, though I’d hardly call a 252-day (one-year) rolling volatility estimate a curve-fit.

Next, I’d like to introduce another concept on this blog that I’ve seen colloquially addressed in other parts of the quantitative blogging space, particularly by Mike Harris of Price Action Lab, namely that of multiple hypothesis testing, and about the need to correct for that.

Luckily for that, Drs. David H. Bailey and Marcos Lopez De Prado wrote a paper to address just that. Also, I’d like to note one very cool thing about this paper: it actually has a worked-out numerical example! In my opinion, there are very few things as helpful as showing a simple result that transforms a collection of mathematical symbols into a result to demonstrate what those symbols actually mean in the span of one page. Oh, and it also includes *code* in the appendix (albeit Python — even though, you know, R is far more developed. If someone can get Marcos Lopez De Prado to switch to R–aka the better research language, that’d be a godsend!).

In any case, here’s the formula for the deflated Sharpe ratio, implemented straight from the paper.

deflatedSharpe <- function(sharpe, nTrials, varTrials, skew, kurt, numPeriods, periodsInYear) {
emc <- .5772
sr0_term1 <- (1 - emc) * qnorm(1 - 1/nTrials)
sr0_term2 <- emc * qnorm(1 - 1/nTrials * exp(-1))
sr0 <- sqrt(varTrials * 1/periodsInYear) * (sr0_term1 + sr0_term2)

numerator <- (sharpe/sqrt(periodsInYear) - sr0)*sqrt(numPeriods - 1)

skewnessTerm <- 1 - skew * sharpe/sqrt(periodsInYear)
kurtosisTerm <- (kurt-1)/4*(sharpe/sqrt(periodsInYear))^2

denominator <- sqrt(skewnessTerm + kurtosisTerm)

result <- pnorm(numerator/denominator)
pval <- 1-result
return(pval)
}

The inputs are the strategy’s Sharpe ratio, the number of backtest runs, the variance of the sharpe ratios of those backtest runs, the skewness of the candidate strategy, its non-excess kurtosis, the number of periods in the backtest, and the number of periods in a year. Unlike the De Prado paper, I choose to return the p-value (EG 1-.

Let’s collect all our Sharpe ratios now.

allSharpes <- c(as.numeric(table.AnnualizedReturns(sigBoxplots)[3,]),
meltedSharpes$Sharpe,
as.numeric(table.AnnualizedReturns(strat)[3,]),
loStoplossFrame$Sharpe)

And now, let’s plug and chug!

stratSignificant <- deflatedSharpe(sharpe = as.numeric(table.AnnualizedReturns(strat)[3,]),
nTrials = length(allSharpes), varTrials = var(allSharpes),
skew = as.numeric(skewness(strat)), kurt = as.numeric(kurtosis(strat)) + 3,
numPeriods = nrow(strat), periodsInYear = 12)

And the result!

> stratSignificant
[1] 0.01311

Success! At least at the 5% level…and a rejection at the 1% level, and any level beyond that.

So, one last thing! Out-of-sample testing on ETFs (and mutual funds during the ETF burn-in period)!

symbols2 <- c("CWB", "JNK", "TLT", "SHY", "PCY")
getSymbols(symbols2, from='1900-01-01')
prices2 <- list()
for(tmp in symbols2) {
prices2[[tmp]] <- Ad(get(tmp))
}
prices2 <- do.call(cbind, prices2)
colnames(prices2) <- substr(colnames(prices2), 1, 3)
returns2 <- na.omit(Return.calculate(prices2))

monthRets2 <- apply.monthly(returns2, Return.cumulative)

oosStrat <- list()
for(i in 1:12) {
oosStrat[[i]] <- ruleBacktest(returns = monthRets2, nMonths = i, dailyReturns = returns2, nSD = 252)[[1]]
}
oosStrat <- do.call(cbind, oosStrat)
oosStrat <- Return.portfolio(R = na.omit(oosStrat), rebalance_on="months")

symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
prices[[symbol]] <- Ad(get(symbol))
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
oosMFreturns <- na.omit(Return.calculate(prices))
oosMFmonths <- apply.monthly(oosMFreturns, Return.cumulative)

oosMF <- list()
for(i in 1:12) {
oosMF[[i]] <- ruleBacktest(returns = oosMFmonths, nMonths = i, dailyReturns = oosMFreturns, nSD=252)[[1]]
}
oosMF <- do.call(cbind, oosMF)
oosMF <- Return.portfolio(R = na.omit(oosMF), rebalance_on="months")
oosMF <- oosMF["2009-04/2011-03"]

fullOOS <- rbind(oosMF, oosStrat)

rbind(table.AnnualizedReturns(fullOOS), maxDrawdown(fullOOS), CalmarRatio(fullOOS))
charts.PerformanceSummary(fullOOS)

And the results:

portfolio.returns
Annualized Return 0.1273
Annualized Std Dev 0.0901
Annualized Sharpe (Rf=0%) 1.4119
Worst Drawdown 0.1061
Calmar Ratio 1.1996

And one more equity curve (only the second!).

In other words, the out-of-sample statistics compare to the in-sample statistics. The Sharpe ratio is higher, the Calmar slightly lower. But on a whole, the performance has kept up. Unfortunately, the strategy is currently in a drawdown, but that’s the breaks.

So, whew. That concludes my first go at hypothesis-driven development, and has hopefully at least demonstrated the process to a satisfactory degree. What started off as a toy strategy instead turned from a rejection to a not rejection to demonstrating ideas from three separate papers, and having out-of-sample statistics that largely matched if not outperformed the in-sample statistics. For those thinking about investing in this strategy (again, here is the strategy: take 12 different portfolios, each selecting the asset with the highest momentum over months 1-12, target an annualized volatility of 10%, with volatility defined as the rolling annualized 252-day standard deviation, and equal-weight them every month), what I didn’t cover was turnover and taxes (this is a bond ETF strategy, so dividends will play a large role).

Now, one other request–many of the ideas for this blog come from my readers. I am especially interested in things to think about from readers with line-management responsibilities, as I think many of the questions from those individuals are likely the most universally interesting ones. If you’re one such individual, I’d appreciate an introduction, and knowing who more of the individuals in my reader base are.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up, consulting arrangements, and job discussions. Contact me through my email at [email protected], or through my LinkedIn, found here.