Following our two latest posts on multivariate stochastic volatility (**part 1** and **part 2**) we bring an intermezzo, not related to trading, on how to make beautiful plots in R.

The gold standard in the R community is to create plots based on the extensive **ggplot2 package** built on the principles of the grammar of graphics in R. It has many settings and can create practically any plot you desire. However, as other R users may have noticed it often gets cumbersome to rewrite code for your next ggplot because of all the code needed to create the customization you want. Or you keep forgetting the functions for creating a specific plot.

After many painful hours in this trap we finally liberated ourselves by creating a package called **ggplotHelper**, hosted on GitHub, which is essentially a series of overlay functions for the ggplot function. It reduces the time spent on creating plots significantly but it comes with certain constrains. We have restricted the colour palette to grey and a few easy-to-read colours. However, these can easily be modified in the colour functions. In addition our package does not have an overlay for all plot functions in ggplot.

If you like our design template you can now easily create similar charts yourselves. Before you enter the land of ggplotHelper, we have to warn that the package has not yet entered a stable version 1.0.0, so we will make many commits as the package evolves. The package does not come with unit tests at this stage so sometimes new code may create bugs elsewhere in the code.

Our ggplotHelper package is published as open source and hosted on GitHub. With the **devtools package** it is very easy to install ggplotHelper using the install_github function.

library(devtools) install_github("pgarnry/ggplotHelper")

Now ggplotHelper is installed. Today we will only discuss the three functions bar_chart, line_chart and save_png . For the full package explanation you will have to wait for the vignette or the function documentation as we do try to make documentation self-explanatory.

If you experience issues with our functions you are welcome to raise tickets in our GitHub repository. You can also contribute code to make it better.

Colours are controlled through two functions grey_theme and chart_colours. The grey_theme function allows customization of plot margin and legend position (see ggplot documentation for available options). These feautures can be set through the ellipsis which will be clear very soon.

Below is a small snippet of the grey_theme function showing the only available options.

grey_theme <- function(legend.position = "bottom", plot.margin = c(0.7, 1.2, 0.5, 0.5)) {

If you want to change the colour scheme you simply clone the repository and change the colours in the grey_theme function.

The bar plot is a toolbox stable for any researcher. With ggplotHelper it only takes one line of code to create a bar plot.

data(mtcars) mtcars$name <- rownames(mtcars) # not a pretty bar plot because of overlapping x-axis names bar_chart(mtcars, y = "mpg", x = "name")

This bar plot would require around 50 lines of code to create the design, colours and data handling with the ggplot2 functions. To get it to one line of code, we have simplified and made choices for the user.

The x-axis names are overlapping because of the large number of bars. This problem is easily fixed by flipping the data.

# the flip option makes the plot prettier bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE)

Flipping the data looks easy but behind the lines we are handling the dirtiness. Next we add a title.

# add a title bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE, title = "Miles per gallon across different car models")

Again we make choices for the user by adding a line break in the title to get the proper distance to the plot window. Next we change the y-axis scale.

# change the y scale bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE, title = "Miles per gallon across different car models", scale.y = c(0, 40, 5))

Unordered data can be difficult to interpret so we change the order of our data from highest to lowest values (decreasing).

# now we want to order the values decreasing bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE, title = "Miles per gallon across different car models", scale.y = c(0, 40, 5), decreasing = TRUE)

Finally, we can highlight a specific data point (bar) using the the *bar.colour.name* variable.

# finally we highlight a data point bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE, title = "Miles per gallon across different car models", scale.y = c(0, 40, 5), decreasing = TRUE, bar.colour.name = "Merc 280C")

We hope this have given you an idea of the powerful functions available in ggplotHelper.

The most tricky part of ggplot is to make time series plots, but with ggplotHelper we aimed to make it very easy through the line_chart function. Line plots can either be time-series (often the case) or not. The line_chart function provides options for both and for time-series plots the function supports four different date/time classes. Those are **POSIXct**, **Date**, **yearmon** and **yearqtr**.

Let us start with creating two random processes.

set.seed(5) # create two random processes rand.ts <- data.frame(rp = c(cumprod(rnorm(500, 0.0004, 0.0016) + 1), cumprod(rnorm(500, 0.0002, 0.0016) + 1)), date = rep(seq.Date(Sys.Date() - 499, Sys.Date(), by = "days"), 2), name = rep(c("rp1", "rp2"), each = 500)) # quick line plot line_chart(rand.ts, y = "rp", x = "date", group = "name")

In order to create a line plot with multiple lines the grouping variable has to be specified. However, the method for multiple lines will likely change in the future. As you can see in the data.frame object the date column contain duplicates. Ideally we want to get rid of this input design and allow for only unique date classes and multiple y columns.

The current version grouping names will be used as default legend names. The function automatically detects that the x variable is a **Date class **and enables certain options for manipulating dates (date format and scaling).

Now we will change the y-axis scale and change the legend names. We will also add a title.

# add title, changing the y-axis scale and add custome legend names line_chart(rand.ts, y = "rp", x = "date", group = "name", title = "Random processes", legend.names = c("Random process 1", "Random process 2"), y.min = .95, y.max = 1.25)

As one quickly realizes, the two legend names stand next to each other. Can we add some space and make it prettier?

# add some extra space between legend names line_chart(rand.ts, y = "rp", x = "date", group = "name", title = "Random processes", legend.names = c("Random process 1 ", "Random process 2"), y.min = .95, y.max = 1.25)

Extra space added. Sometimes you want a different date format than the ISO standard. This can easily be changed through the date.format variable and the x.interval setting sets the increment of the sequence in x values. See example below.

# change the date format and interval line_chart(rand.ts, y = "rp", x = "date", group = "name", title = "Random processes", legend.names = c("Random process 1 ", "Random process 2"), y.min = .95, y.max = 1.25, date.format = "%Y %b %d", x.interval = 60)

The date.format takes all the available conversion rules in the **strptime** function. Finally we want to show how to send arguments to the **grey_theme** function through the **line_chart** function.

# remove legend line_chart(rand.ts, y = "rp", x = "date", group = "name", title = "Random processes", legend.names = c("Random process 1 ", "Random process 2"), y.min = .95, y.max = 1.25, date.format = "%Y %b %d", x.interval = 60, legend.position = "none")

The legend.position variable is set in the **grey_theme **function but is passed on through the ellipsis. The only other available variable is **plot.margin **which provides the functionality to change the plot margins in centimeters.

The line_chart function also supports ribbon, vertical and horizontal lines (used to highlight a critical level in the data). Especially the ribbon is a more advanced function and currently works only with one time series.

Often we want our plots saved outside our R environment. The ggplotHelper has a quick save_png function that takes the ggplot object names (plot names) as input and stores the plot objects as .png files with a height of 480 pixels and width scaled by the **golden ratio **to 777. A future feature will allow the user to specify the height and width.

line1 <- line_chart(rand.ts, y = "rp", x = "date", group = "name") line3 <- line_chart(rand.ts, y = "rp", x = "date", group = "name", title = "Random processes", legend.names = c("Random process 1 ", "Random process 2"), y.min = .95, y.max = 1.25) # save the two ggplots as .png files save_png(line4, line5)

It is that easy to save ggplots as .png files.

The ggplotHelper package also supports **density**, **box** and **scatter** plots and more will likely be implemented in the future. Density plot is very powerful and is one of our favourite plots for showing results of bootstrapped trading strategies.

]]>

In part 1 we estimated a sequential multivariate stochastic volatility model on a portfolio of five FX pairs (EURAUD, EURCAD, EURGBP, EURJPY, and EURUSD) and constructed a mean-variance optimized portfolio. Using non-optimized settings the portfolio produced an annualized return of 5.4% compared with 4.1% for the benchmark, but this outperformance was due to higher volatility, and the annualized Sharpe Ratio (SR) of 0.49 failed to match the benchmark’s SR of 0.68.

Our non-optimized model in part 1 did not include autoregressive lags (that is, *d = 0*) and used discount factors close to one (*beta = delta = 0.99)*, resulting in slow evolution of the intercepts and covariance matrix. Today, we will adjust these three parameters in an attempt to improve on the (risk-adjusted) return. Concretely, we let *beta = {0.80, 0.85, 0.90, 0.95, 0.99}*, *delta = {0.95, 0.96, …, 0.99}*, and *d = {0, 1, 3, 5}* resulting in 100 model combinations. This is where sequential estimation shines as we do not need to re-estimate 100 models in batch for every time period in our evaluation period (574 weeks). Our use of weekly data makes batch estimation (much) less of a problem in terms of speed than if we have used higher-frequency data, but imagine if we used 1-minute prices instead (>7,000 observations per week).

Out of the 100 model combinations, in terms of total return the best model delivers 81% (see chart above). This model, which uses *d = 0*,* beta = 0.99 *and* delta = 0.98*, is very close to our original when it comes to both total return performance and the settings used, suggesting that from a total return point of view the parameters we used in part 1 were close to optimal.

However, if we sort by the Sharpe Ratio instead (see chart below), the best model is the one with *beta = 0.80*, *delta = 0.98*, and *d = 3*, which delivers a SR of 0.80, higher than the benchmark’s SR of 0.68. This model includes three autoregressive terms for each FX pair and allows for a very fast evolution of the covariance matrix (i.e. *beta = 0.80*) resulting in a degrees of freedom of just 4. Therefore this model allows much wider tails than a normal distribution – a reasonable feature, we would expect a priori, given that we are dealing with price return data.

Interestingly, the chart also reveals that those models with autoregressive terms generally perform worse than those without (at least with respect to SR). The first (left-most) 25 combinations, which generally have annualized Sharpe Ratios close to 0.6, are those combinations where we are simply fitting a multivariate stochastic volatility mode (i.e. *d = 0*) for combinations of *beta* and *delta*. However, there are some combinations with higher SR which do include autoregressive terms, including the ‘best model’ mentioned in the previous paragraph.

The performance of the 100 combinations are of course only known in hindsight and hence would not have been of much help at the start of any given week (*n = 1, 2, …, 574*) where we would have had to decide on which model – and hence which expected return vector and predictive covariance matrix – to feed into our mean-variance optimizer.

We could calculate a performance statistic such as annualized SR or total return on a rolling basis and choose our model each week accordingly. Instead, each week we calculate the mean squared standardized 1-step forecast errors (MSSE) over a lookback window of 52 weeks (see Triantafyllopoulos [2008], page 7, for details) – similar to the lookback window used in the calculations for the benchmark portfolio in part 1. We then average the MSSE across the five FX pairs to arrive at a measure of goodness. If the model performs well across both lookback window and FX pairs then the measure should be close to one. If it is below one then the model generally predicts too large variances and it is above one then the model generally predicts too small variances.

The portfolio delivers a total return of 43.7% which translates to an annualized return of around 3.3%. This is well below the annualized return of 5.4% achieved with the ‘default settings’, but that was produced via high leverage and high volatility. The drawdowns – as should be clear from the chart above – are much smaller in this portfolio with the largest drawdown at 4.4% and the annualized SR is 0.86 – even higher than the ‘best SR model’ found above.

Because our portfolio is allowed to switch between combinations according to the goodness measure (average MSSE) it can switch between models with high or low predicted covariances and between models with and without autoregressive lags depending on the best average MSSE each week.

The selected model includes five autoregressive lags per FX pair in 287 of the 574 weeks (50%), three lags in 131 weeks, one lag in 101 weeks and zero lags (i.e. a multivariate stochastic volatility model) in 55 weeks. The VAR parameters are allowed to evolve at a moderate pace (*delta = 0.95*) in 271 weeks while the covariance matrix mostly evolves at a moderate-to-slow pace with 522 weeks spent in combinations where *beta >= 0.95*. However, in 52 weeks the selected model has a *beta* of less than or equal to 0.9.

To demonstrate the robustness of the portfolio’s performance to specific model combinations we randomly exclude 10% of the combinations and re-run the optimization. We do this 1,000 times and find that the portfolio continues to perform well with 90% of the annualized Sharpe Ratios in the interval from 0.76 to 0.97 and 95% in the interval from 0.73 to 1.00. We need to exclude well north of 50% of the combinations before we see a material effect on the SR.

The portfolio changes its leverage based on the predicted covariance, the predicted return vector and the required return (10% at an annualized rate). No less than 448 weeks (78%) are spent with a leverage of less than 1 while the portfolio is leveraged at least two times (gross exposure of at least 200%) in 73 weeks. This helps explain the relatively smooth cumulative return chart above.

Leverage reached a high of 4.38 on August 23, 2013 and net exposure (to EUR) climbed to 1.49, meaning that the portfolio had nearly 150% of net long exposure to the Euro during the following week. Similarly, net exposure reached a low of -1.33 on June 21, 2013, indicating that the portfolio was heavily biased against the Euro during the following week.

These figures compare with an average leverage of 0.71 and an average net exposure of 0.00. The fact that net exposure is (very close to) zero is positive given that we have chosen an unconstrained mean-variance optimization. Had we constrained the portfolio weights to sum to one as is typically done, we would in effect have placed a long bet on the Euro relative to the five other currencies (AUD, CAD, JPY, GBP, and USD), thereby assuming – or at least betting on – a positive drift in the five FX pairs during our test period (2005-2015).

In this part 2 we have shown how to (estimate and) select a model sequentially based on a measure of goodness (average MSSE). Based on this measure we can choose the best model each week and use the model’s output (prediction return vector and prediction covariance matrix) for determining the mean-variance optimal weights. One can test many more combinations than the 100 used above and also use another (or multiple) measure(s) of goodness. Furthermore, one can add complexity to the model by incorporating an autoregressive process for the volatility precision covariance matrix (see Triantafyllopoulos, 2013) or move over to particle filters (and perhaps include regime switching, see e.g. Bao et al., 2012).

##################################################################### ### Time-varying Vector Autoregression with Stochastic Volatility ### ##################################################################### TVVARSV <- function(x, AR = 0, beta = 0.99, delta = 0.99, lookback = ceiling(NROW(x)/10)) { x <- as.matrix(x) N <- NROW(x) # The number of observations per time series J <- NCOL(x) # The number of time series # Constants q <- 1/(1 - beta) k <- (beta * (1 - J) + J)/(beta * (2 - J) + J - 1) # Prior specification m <- matrix(0, J * AR + 1, J) P <- diag(0.01, J * AR + 1, J * AR + 1) S <- diag(0.02, J, J) dinv <- diag(1/sqrt(delta), J * AR + 1, J * AR + 1) jnam <- paste("J", 1:J, sep = "") cnam <- "Int" if (AR > 0) for (a in 1:AR) cnam <- c(cnam, paste("AR", a, "_", jnam, sep = "")) # Storage out <- list(means = array(NA, dim = c(N, J, 4), dimnames = list(NULL, jnam, c("actual", "pred", "error", "scaled"))), cov = array(NA, dim = c(N, J, J), dimnames = list(NULL, jnam, jnam)), stats = array(NA, dim = c(N, J, 4), dimnames = list(NULL, jnam, c("ME", "MAE", "RMSE", "MSSE"))), df = q * beta, beta = beta, delta = delta, AR = AR) for (n in (AR + 1):N) { xn <- matrix(x[n, ], J, 1) fn <- matrix(1) if (AR > 0) fn <- rbind(fn, matrix(x[(n - 1):(n - AR), ], ncol = 1, byrow = TRUE)) # Prediction R <- dinv %*% P %*% dinv Q <- as.numeric(t(fn) %*% R %*% fn) + 1 ycov <- Q * (1 - beta)/(3 * beta * k - 2 * k) * S pred <- t(m) %*% fn # If xn contains missing values then use the predictions xn <- ifelse(is.na(xn), pred, xn) err <- xn - pred zrr <- solve(t(chol(ycov))) %*% err out$means[n, , "actual"] <- as.numeric(xn) out$means[n, , "pred"] <- as.numeric(pred) out$means[n, , "error"] <- as.numeric(err) out$means[n, , "scaled"] <- as.numeric(zrr) out$cov[n, , ] <- ycov # Update K <- R %*% fn / Q m <- m + K %*% t(err) P <- R - tcrossprod(K) * Q S <- S/k + tcrossprod(err)/Q } out$stats[, , "ME"] <- apply(out$means[, , "error", drop = F], 2, TTR::runMean, n = lookback) out$stats[, , "MAE"] <- apply(abs(out$means[, , "error", drop = F]), 2, TTR::runMean, n = lookback) out$stats[, , "RMSE"] <- apply(out$means[, , "error", drop = F]^2, 2, TTR::runMean, n = lookback) out$stats[, , "MSSE"] <- apply(out$means[, , "scaled", drop = F]^2, 2 ,TTR::runMean, n = lookback) return(out) } ######################## ### Unconstrained MV ### ######################## SeqMeanVar <- function(exp.returns, req.return = 0, covar) { icovar <- solve(covar) weights <- req.return * as.numeric(icovar %*% exp.returns) / (t(exp.returns) %*% icovar %*% exp.returns) return(weights) } #################### ### MAIN PROGRAM ### #################### #### Input: A matrix of FX prices (N x J) with dates as rownames freq <- 52 lookback <- freq * 1 start.date <- "2004-12-31" return.req <- 1.1^(1/52) - 1 N <- NROW(prices) J <- NCOL(prices) start.obs <- which(start.date == rownames(prices)) returns.log <- log(prices[-1, ]/prices[-N, ]) returns.ari <- prices[-1, ]/prices[-N, ] - 1 N <- N - 1 dates <- rownames(returns.log) # Run models comb <- expand.grid(beta = c(seq(0.80, 0.95, 0.05), 0.99), delta = seq(0.95, 0.99, 0.01), AR = c(0, 1, 3, 5)) # No. of combinations cnum <- NROW(comb) # Combination names comb.names <- c(paste("Comb_", 1:cnum, sep = ""), "BM", "Best") # Store all combinations (plus the best and the benchmark) stats.arr <- array(NA, dim = c(N, NROW(comb) + 2, 6), dimnames = list(dates, comb.names, c("Ret", "Exc", "AnnVol", "Lev", "Net", "Stat"))) # A list of models model.list <- vector("list", cnum) # Loop across combinations (cnum + 1 is the benchmark) for (i in 1:(cnum + 1)) { cat("Combination", i, "of", NROW(comb), "\n") if (i <= cnum) { model <- TVVARSV(x = returns.log, beta = comb[i, "beta"], delta = comb[i, "delta"], lookback = lookback, AR = comb[i, "AR"]) stats.arr[, i, "Stat"] <- apply(model$stats[, , "MSSE", drop = F], 1, mean, na.rm = T) model.list[[i]] <- model } for (n in (start.obs + 1):N) { if (i == cnum + 1) { # Historical mean and covariance model.mean <- exp(apply(log(1 + returns.ari[(n - lookback):(n - 1), ]), 2, mean)) - 1 model.cov <- exp(cov(log(1 + returns.ari[(n - lookback):(n - 1), ]))) - 1 } else { # Model mean and covariance model.mean <- exp(model$means[n, , "pred"]) - 1 model.cov <- exp(model$cov[n, , ]) - 1 } # Mean-variance optimization weights.MV <- SeqMeanVar(exp.returns = model.mean, req.return = return.req, covar = model.cov) # Predicted portfolio volatility at annual frequency stats.arr[n, i, "AnnVol"] <- log(sqrt(t(weights.MV) %*% model.cov %*% weights.MV) + 1) * sqrt(freq) # Gross and net exposure stats.arr[n, i, "Lev"] <- sum(abs(weights.MV)) stats.arr[n, i, "Net"] <- sum(weights.MV) # Return stats.arr[n, i, "Ret"] <- sum(weights.MV * returns.ari[n, ]) } } # Excess return stats.arr[, 1:(cnum + 1), "Exc"] <- stats.arr[, 1:(cnum + 1), "Ret"] - matrix(stats.arr[, "BM", "Ret"], N, cnum + 1) for (n in start.obs:N) { # Find the best model using MSSE (lagged one week) best <- which.min((1 - stats.arr[n - 1, 1:cnum, "Stat"])^2) stats.arr[n, "Best", ] <- stats.arr[n, best, ] } # Log returns log.ret <- matrix(log(1 + stats.arr[start.obs:N, , "Ret"]), ncol = cnum + 2) # Annualized returns ann.ret <- exp(apply(log.ret, 2, mean) * freq) - 1 # Annualized (log) Sharpe ratios (risk-free rate = 0%) ann.lSR <- apply(log.ret, 2, mean)/apply(log.ret, 2, sd) * sqrt(freq) # Cumulative returns cum.ret <- rbind(1, apply(1 + stats.arr[start.obs:N, , "Ret", drop = F], 2, cumprod)) tot.ret <- tail(cum.ret, 1) # Drawdowns drawdowns <- cum.ret/apply(cum.ret, 2, cummax) - 1

]]>

Following an equity-centric start to our blog’s life with random portfolios (**part 1** and **part 2**) and a **monthly factor model**, we now move on to foreign exchange (FX). We will investigate how a (sequential) multivariate stochastic volatility model can be used to potentially improve a FX portfolio by modelling not only the variances of the FX pairs but also the covariances. These can then for example be used to find mean-variance or minimum-variance portfolios. In this blog post we stick to a simple mean-variance approach, but we aim to add complexity later in an attempt to improve on the results.

The generalized autoregressive conditional heteroscedasticity (**GARCH**) and its many evolutions (EGARCH, NGARCH, GARCH-M, multivariate GARCH etc.) remains a staple of time series modelling, but we will dive into the stochastic volatility camp and more specifically we will look at a sequential time-varying vector autoregressive model with stochastic volatility, following closely the approach in **Triantafyllopoulos (2008)**. This allows us to caste the model in state-space form providing us with sequential covariance predictions rather than having to re-estimate in batch constantly using a rolling or expanding window of observations. Furthermore, this model provides both the mean and covariance predictions and it is fast (though this is not important in this work as we use weekly data).

**FX portfolio**

We will investigate the performance of a portfolio of five FX pairs (EURAUD, EURCAD, EURGBP, EURJPY, and EURUSD) on a weekly basis during the eleven-year period 2005-2015 and to keep things simple in the following we assume *d = 0, beta = 0.99, *and *delta = 0.99* (see Triantafyllopoulos for details). This implies that we are estimating a multivariate stochastic volatility model with time-varying intercepts, but without any autoregressive terms (since *d = 0*), where the parameters are updated only gradually as the discount factors *beta* and *delta* are both close to one. The fact that *beta* is close to one implies that we are modelling the log-returns as normally distributed as the degrees of freedom is 99.

Every trading week the model is updated based on that week’s errors and a set of predictions for means and covariance is given. Based on these predictions we can update the weights of our mean-variance portfolio where we require an annualized return of 10% (similar for the benchmark). The data set used for portfolio evaluation is a (575 x 5) matrix of FX close prices (carry-adjusted) for the five FX pairs. As benchmark we run mean-variance optimization on historical data using a lookback window of 52 weeks. We take this approach to allow a comparison between the model’s ability to model portfolio volatility and the benchmark.

**A closer look at EURUSD**

Before we get to the performance evaluation we will take a look at the predictions of one of the five FX pairs, EURUSD. Despite the discount factor *delta* being close to one, the estimated mean return shows some quite large deviations from zero when momentum is high in either direction (see the chart below). During the autumn of 2008 EURUSD plunged from roughly 1.60 to 1.25 and then again in mid-2014 from close to 1.40 to less than 1.05. The annualized expected return (i.e. the annualized mean prediction for EURUSD) declined to as low as -10.8% to reflect this weakness.

Moving on to the expected volatility, we unsurprisingly see a large spike in late 2008 and well into the following year. Annualized expected volatility is above 10% in 416 of 574 weeks and it spends 284 consecutive weeks above this level starting in October, 2008. The highest predicted volatility of 13.3% is reached in May, 2009.

**Performance**

Our portfolio of five FX pairs returns 79% during the period from 2005 to 2015 compared to 55.2% for the benchmark portfolio while our portfolio has 322 winning weeks (56.1%) compared with the benchmark’s 321 winning weeks (55.9%). However, portfolio volatility is so much higher than benchmark volatility that the risk-adjusted return is lower with annualized Sharpe Ratios (assuming a zero risk-free rate) of 0.49 and 0.68, respectively.

In other words, the model delivers an excess return of 23.8% over eleven years, but this is a result of higher leverage and higher volatility (the annualized Information Ratio is 0.20). As a result, the largest drawdown – which unsurprisingly takes place in the midst of the global financial crisis – is 20.2% for our portfolio compared with 10.5% for the benchmark.

Turning to the expected annualized portfolio volatility it generally agrees with volatility as measured on a rolling 52-week basis reasonably well except during the financial crisis when it undershoots by a wide margin. While actual volatility peaks at 23.3% predicted volatility only reaches 19.6%. The median volatilities are 8.5% and 7.9%, respectively, and predicted volatility is above actual volatility just 37.1% of the time (194 out of 574 – 51 = 523 weeks).

While the model does produce a higher annualized return (5.4% vs. 4.1%) the annualized volality of our model portfolio during the eleven years 2005-2015 is 8.8% compared with the benchmark portfolio’s 4.5%. Furthermore, portfolio volatility is above 10% in 130 weeks compared with just three for the benchmark.

We have demonstrated a method to *both* estimate a multivariate stochastic volatility model *and *construct a mean-variance optimized portfolio sequentially rather than in batch mode. However, our non-optimized model fails to beat the benchmark on a risk-adjusted basis as leverage and volatility are both much higher in our model portfolio.

In part 2 we will investigate whether we can improve the results, in particular the model’s ability to accurately predict the covariance matrix of the five FX pairs, by tweaking the settings. One remedy could be to make changes to the discount factors (*beta* and *delta*) while allowing autoregressive terms could be another. We will explore this in the next blog post, so stay tuned.

*Note: the R code will be available in part 2.*

]]>

We start our experiment with some key assumptions. We have 20 signals with annualized log returns of 8% and annualized Sharpe Ratios of 0.6 – not exactly stellar signals. The signals make daily bets. The strategies in this experiment run for 10 years (on a daily basis), but we will later show how the statistics change when the number of observations decline. The experiment is repeated 500 times to get a sense of the distributions of relevant statistics, such as Sharpe Ratios and annualized returns.

An important input variable in trading is the correlation between signals and our experiment is carried out across a sequence of cross-signal correlations from zero to 0.9. Disregarding trading costs (because we are simply interested in relative performances) and using daily rebalancing the distributions of annualized portfolio returns across correlations looks more or less identical. Clearly, having more than one strategy does not improve the annualized return – regardless of the correlations between the strategies.

Blending multiple signals with lower correlation does not enhance returns, but the chart above does hint at the benefit to having more strategies – and especially if those strategies are relatively uncorrelated. The left-most distributions are much narrower and none of the 500 trials have returns below zero (up to and including strategies with correlations of 0.4).

The result becomes clearer when we move to risk-adjusted returns measured by the Sharpe Ratio. Here 20 strategies with zero correlation and low individual annualized Sharpe Ratios of 0.6 turn into a portfolio with an annualized Sharpe Ratio of 3 compared to 0.64 for a portfolio with average correlations of 0.9 between the trading strategies – this translates into a 370% improvement.

What is also fascinating about the chart above is how fast the improvement in Sharpe Ratio declines as the signals become more correlated. Increasing the correlations to 0.2 from 0 results in a decline of 56% in the Sharpe Ratio.

Despite a high Sharpe Ratio and around 50,000 bets across the signals (i.e. trading strategies) the variance of the Sharpe Ratios for the zero cross-signal correlation portfolios is still staggering. One investor might get lucky and produce a Sharpe Ratio of 3.5 (probably turning the person into a multi-billionaire) while another investor with the same types of strategies may be less fortunate, resulting in a Sharpe Ratio of 2.5. Luck does play a role in trading even for high Sharpe Ratio portfolios.

Obviously, an edge becomes clearer with more observations. What happens if our investor only has one year of observations rather than ten years? The chart below shows the explosion in the variances of the Sharpe Ratios across correlations. Despite 5,000 trades most portfolios cannot be separated from random luck. It is clear why data-driven hedge funds prefer higher frequencies (intraday trading). It validates the signals faster.

If we simulate 10,000 time-series with the above properties what is the percentage of them with a p-value lower than 5%? The answer is close to 48%, which could lead most researchers to discard such daily strategies (with an annualized Sharpe Ratio of 0.6). However, blending such weak signals can result in magic – if the correlations are low enough – whereby a portfolio’s combined return stream becomes highly significant. Among the zero correlation portfolios all of them have a p-value of less than 5%.

A daily strategy with an annualized Sharpe Ratio of 0.6 would likely be discarded on its own by a researcher as insufficient to produce anything attractive in trading. But with the right (i.e. low) correlations to existing signals, it could well add value to the portfolio.

This post does not break new ground as the effects of diversification are well-known in the investment community, but it does serve as a reminder that instead of discarding that 0.6 Sharpe Ratio strategy of yours perhaps you can add it to your existing portfolio of strategies, thereby lowering your portfolio volatility and hence allowing for more leverage to be used enhancing the total return.

*Note: This post will be the final post (for a while) on random returns and fundamental concepts. Upcoming blog posts will instead focus on specific trading strategies in various asset classes.*

# define variables N <- 252 * 10 # number of observations M <- 500 # number of trials cors <- seq(0, .9, .1) # sequence of cross-signal correlations strats <- 20 # number of strategies er <- 8 / 252 # expected daily log return in percent esr <- 0.6 / sqrt(252) # expected daily sharpe ratio # pre-allocate arrays to store portfolio returns and p-values port.rets <- array(NA, dim = c(N, M, length(cors))) port.pv <- array(NA, dim = c(M, length(cors))) for (i in 1:length(cors)) { # correlation structure R <- matrix(1, strats, strats) R[lower.tri(R)] <- cors[i] R[upper.tri(R)] <- cors[i] # transpose Choleski decomposition W <- t(chol(R)) for (m in 1:M) { # generate 5 random stratey return streams with zero mean and expected weekly log returns rets <- matrix(rnorm(N*strats, 0, er / esr / 100), ncol = strats) # multiply the Cholesky decomposition matrix with our random returns and add drift rets <- t(W %*% t(rets)) + er / 100 # calculate equal-weighted portfolio return port.rets[, m, i] <- log(apply(exp(rets) - 1, 1, mean) + 1) # insert p-values for portfolios port.pv[m, i] <- t.test(port.rets[, m, i])$p.value } cat(paste("Calculations done for correlation", cors[i]), "\n") } # calculate annualized Sharpe ratio and returns ann.sr <- apply(port.rets, c(2, 3), mean) / apply(port.rets, c(2, 3), sd) * sqrt(252) ann.ret <- apply(port.rets, c(2, 3), mean) * 252 # calculate percent of portfolios with p-value below 5% across correlations pv <- apply(port.pv < 0.05, 2, mean) # calculate theoretical percentage of time-series with p-value below 5% for the specified strategy properties # daily strategy with annualized Sharpe Ratio of 0.6 t.pv <- sum(replicate(10000, ifelse(t.test(rnorm(252 * 10, 8 / 252 / 100, (8/252/100) / (0.6/sqrt(252))))[["p.value"]] < 0.05, 1, 0))) / 10000 # create data.frame for ggplot2 charts of p-values pv.df <- data.frame(P.value = c(t.pv, sort(pv)) * 100, Name = c("Single signal p-value", paste("Correlation (", sort(cors, decreasing = TRUE), ")", sep = ""))) # create data.frame for ggplot2 charts of annualized return and Sharpe Ratio stat.df <- data.frame(Sharpe = c(ann.sr), Ann.Returns = c(ann.ret), Correlation = rep(cors,each = M))

]]>

Our debut blog post **Towards a better benchmark: random portfolios** resulted in a lot of feedback (thank you) and also triggered some thoughts about the diversification aspect of random portfolios. Inspired by the approach in our **four-factor equity model** we want to investigate whether it is possible to lower the variance of the portfolios by taking account of the correlation clustering present in the data. On the other hand, we do not want to stray too far from the random portfolios idea due to its simplicity.

We will investigate two well-known clustering techniques today, **k-means clustering** and **hierarchical clustering** (HCA), and see how they stack up against the results from random portfolios. We proceed by calculating the sample correlation matrix over five years (60 monthly observations) across the S&P 500 index’s constituents every month in the period 2000-2015, i.e. 192 monthly observations. We then generate 50 clusters using either k-means (100 restarts) or HCA on the correlation matrix and assign all equities to a cluster. From each of these 50 clusters we randomly pick one stock and include it in our portfolio for the following month. This implies that our portfolios will have 50 holdings every month. (None of the parameters used have been optimized.)

*The vertical dark-grey line represents the annualized return of a buy-and-hold strategy in the S&P 500 index.*

The mean random portfolio returns 9.1% on an annualized basis compared to the benchmark S&P 500 index, which returns 4.1%. The k-means algorithm improves a tad on random portfolios with 9.2% while HCA delivers 9.5%. Not a single annualized return from the 1,000 portfolios is below the buy-and-hold return no matter whether we look at random portfolios, k-means or HCA.

We have added a long-only minimum variance portfolio (MinVar) to the mix. The 1,000 MinVar portfolios use the exact same draws as the random portfolios, but then proceed to assign weights so they minimize the portfolios’ variances using the 1,000 covariance matrices rather than simply use equal weights.

Unsurprisingly, MinVar delivers a mean return well below the others at 8.1%. The weights in the MinVar portfolios are only constrained to 0-100% implying that in any given MinVar portfolio a few equities may dominate in terms of weights while the rest have weights very close to 0%. As a result, the distribution of the annualized mean returns of the MinVar portfolios is much wider and 1.2% of the portfolio (12) are below the S&P 500 index’s annualized mean.

*The vertical dark-grey line represents the annualized Sharpe ratio of a buy-and-hold strategy in the S&P 500 index.*

Turning to the annualized Sharpe ratio using for simplicity a risk-free rate of 0%, we find that a buy-and hold strategy yields a Sharpe ratio of 0.25. Random portfolios, meanwhile, yields 0.46 while k-means and HCA yield 0.49 and 0.58, respectively.

The k-means and HCA algorithms result in lower volatility compared to standard random portfoliios (see table below). While the random portfolios have a mean annualized standard deviation of 19.8%, k-means and HCA have mean standard deviations of 18.7% and 16.3% respectively (in addition to the slightly higher mean annualized returns). Put differently, it seems from the table below and charts above that k-means and in particular HCA add value relative to random portfolios, but do they do so significantly?

Using a t-test on the 1,000 portfolios’ annualized means we find p-values of 0.7% and ~0% respectively, supporting the view that significant improvements can be made by using k-means and HCA. The p-values fluctuate a bit for the k-means algorithm with every run of the code, but the HCA’s outperformance (in terms of annualized mean return) is always highly significant. Testing the Sharpe ratios rather than mean returns yield highly significant p-values for both k-means and HCA.

We have looked at two ways to potentially improve random portfolios by diversifying across correlation clusters. The choice of k-means and HCA as clustering techniques was made to keep it as simple as possible, but this choice does come with some assumptions. **Variance Explained** – for example – details why using k-means clustering is not a free lunch. In addition, what distance function to use in the HCA was not considered in this post, we simply opted for the simple choice of: *1 – correlation matrix*. We leave it to the reader to test other, more advanced clustering methods, and/or change the settings used in this blog post.

**Portfolio size**

So far we have only looked at a portfolio size of 50, but in the original post on **random portfolios** we also included portfolios of size 10 and 20. For completeness we provide the results below noting simply that the pattern is the same across portfolio sizes. In other words, HCA is the best followed by k-means and random portfolios – though the outperformance decreases with the sample size.

**Excess returns**

The approach above uses the historical correlation matrix of the returns, but there is a case to be made for using excess returns instead. We have therefore replicated the analysis above using excess returns instead of returns, where excess returns are defined as the residuals from regressing the available constituent equity returns on the market returns (i.e. S&P 500 index returns) using the same lookback window of five years (60 observations).

Using the matrix of residuals we construct a correlation matrix and repeat the analysis above; we follow the same steps and use the same parameter settings. Disappointingly, the results do not improve. In fact, the results deteriorate to such a degree that neither k-means nor HCA outperform regardless of whether we look at the mean return or Sharpe ratio.

Postscript: while we use k-means clustering on the correlation matrix we have seen it used in a variety of ways in the finance blogosphere, from **Intelligent Trading Tech** and **Robot Wealth**, both of which look at candlestick patterns, over **Turing Finance**’s focus on countries’ GDP growth to MKTSTK’s **trading volume prediction**, to mention a few.

########################################################### # # # INPUT: # # prices: (months x equities) matrix of close prices # # prices.index: (months x 1) matrix of index close prices # # tickers: (months x equities) binary matrix indicating # # whether a stock was present in the index in that month # # # ########################################################### library(TTR) library(PerformanceAnalytics) library(quadprog) library(xtable) # Function for finding the minimum-variance portfolio min_var <- function(cov.mat, short = FALSE) { if (short) { aMat <- matrix(1, size.port, 1) res <- solve.QP(cov.mat, rep(0, size.port), aMat, bvec = 1, meq = 1) } else { aMat <- cbind(matrix(1, size.port, 1), diag(size.port)) b0 <- as.matrix(c(1, rep(0, size.port))) res <- solve.QP(cov.mat, rep(0, size.port), aMat, bvec = b0, meq = 1) } return(res) } N <- NROW(prices) # Number of months J <- NCOL(prices) # Number of constituents in the S&P 500 index prices <- as.matrix(prices) prices.index <- as.matrix(prices.index) port.names <- c("Random", "K-means", "HCA", "MinVar") # Array that stores performance statistics perf <- array(NA, dim = c(3, draws, length(port.names)), dimnames = list(c("Ret (ann)", "SD (ann)", "SR (ann)"), NULL, port.names)) # Storage array ARR <- array(NA, dim = c(N, draws, size.port, length(port.names)), dimnames = list(rownames(prices), NULL, NULL, port.names)) rows <- which(start.port == rownames(prices)):N # Loop across time (months) for (n in rows) { cat(rownames(prices)[n], "\n") # Which equities are available? cols <- which(tickers[n, ] == 1) # Forward and backward return for available equities fwd.returns <- prices[n, cols]/prices[n - 1, cols] - 1 bwd.returns <- log(prices[(n - lookback):(n - 1), cols]/ prices[(n - lookback - 1):(n - 2), cols]) # Are these equities also available at n + 1? cols <- which(is.na(fwd.returns) == FALSE & apply(is.na(bwd.returns) == FALSE, 2, all)) # Returns for available equities fwd.returns <- fwd.returns[cols] bwd.returns <- bwd.returns[, cols] bwd.returns.index <- log(prices.index[(n - lookback):(n - 1), 1]/ prices.index[(n - lookback - 1):(n - 2), 1]) # Covariance and correlation matrices cov.mat <- cov(bwd.returns) cor.mat <- cor(bwd.returns) # K-means on covariance matrix km <- kmeans(x = scale(cor.mat), centers = size.port, iter.max = 100, nstart = 100) hc <- hclust(d = as.dist(1 - cor.mat), method = "average") for (d in 1:draws) { samp <- sample(x = 1:length(cols), size = size.port, replace = FALSE) opt <- min_var(cov.mat[samp, samp], short = FALSE) ARR[n, d, , "Random"] <- fwd.returns[samp] ARR[n, d, , "MinVar"] <- opt$solution * fwd.returns[samp] * size.port } hc.cut <- cutree(hc, k = size.port) for (q in 1:size.port) { ARR[n, , q, "K-means"] <- sample(x = fwd.returns[which(km$cluster == q)], size = draws, replace = TRUE) ARR[n, , q, "HCA"] <- sample(x = fwd.returns[which(hc.cut == q)], size = draws, replace = TRUE) } } ARR <- ARR[rows, , , ] # Performance calculations returns.mean <- apply(ARR, c(1, 2, 4), mean) - cost.port/100 ; N2 <- NROW(ARR) returns.cum <- apply(returns.mean + 1, c(2, 3), cumprod) returns.ann <- returns.cum[N2, , ]^(12/N2) - 1 std.ann <- exp(apply(log(1 + returns.mean), c(2, 3), StdDev.annualized, scale = 12)) - 1 sr.ann <- returns.ann / std.ann returns.index <- c(0, prices.index[-1, 1]/prices.index[-N, 1] - 1) returns.index <- returns.index[rows] returns.cum.index <- c(1, cumprod(returns.index + 1)) returns.ann.index <- tail(returns.cum.index, 1)^(12/N2) - 1 std.ann.index <- exp(StdDev.annualized(log(1 + returns.index[-1]), scale = 12)) - 1 sr.ann.index <- returns.ann.index / std.ann.index perf["Ret (ann)", , ] <- returns.ann * 100 perf["SD (ann)", , ] <- std.ann * 100 perf["SR (ann)", , ] <- sr.ann results.mean <- as.data.frame(apply(perf, c(1, 3), mean)) results.mean$Benchmark <- c(returns.ann.index * 100, std.ann.index * 100, sr.ann.index) print(round(results.mean, 2)) t.test(returns.ann[,2], returns.ann[, 1], alternative = "greater") t.test(returns.ann[,3], returns.ann[, 1], alternative = "greater")

]]>

The investment community has come a long way in terms of becoming more statistically sound, but it is still surprising how few research papers on cross-sectional equity factors mention bootstrapping. Without bootstrapping researchers are simply presenting results from the full sample, implying that the same type of – potentially spectacular – returns will happen again in the future and will be captured satisfactorily by the model. In other words, the backtesting results may be heavily skewed by outliers. In our latest post on **equity factor models** we mentioned bootstrapping, but we postponed any real discussion of the topic. In today’s blog post we return to the topic of bootstrapping and specifically how outliers influence the results of the aforementioned factor model.

**Outlier sample bias**

In our equity factor model research our backtest is based on 1,000 bootstrapped samples where each sample is a subset of the constituents in the S&P 1200 Global index. Due to our use of bootstrapped samples the same stock can appear twice or more in any given holding period and it is possible to use subsampling instead.

What is particularly interesting in our backtest is that the historical sample (running the backtest simply on the available historical data set) delivers impressive results with an annualized return of 11% (vertical orange line in the chart below). This is 2.2%-points better than the bootstrapped mean of 8.8%

Only 7.2% of the bootstrapped results have a higher annualized return than the historical sample putting it firmly in the right tail of the distribution of annualized returns. Without bootstrapping the historical sample backtest could lead to inflated expectations due to outliers. While it is possible that the model will be able to capture such outliers in the future as well, we prefer to err on the side of caution and therefore prefer to use the boostrapped mean as our expected annualized return rather than that achieved with the full list of historical constituents.

Another advantage of the resampling methodology is that is creates a confidence band around our expectation. We expect an 8.8% annualized return from this model, but if it delivers 6.5% it would fit well within the distribution of annualized returns and hence not surprise us too much. Below the confidence band (5% and 95% percentiles) is shown for the cumulative return with a mean total return of 363% compared with 540% for the historical sample which is close to the upper band of 558%

It is our wish that the investment community steps up its use of bootstrapping on cross-sectional equity research or in other ways incorporate the uncertainty of the results more explicitly, thereby painting a more appropriate picture of the expected performance of a trading strategy.

]]>

Equity factor models are used by many successful hedge funds and asset management firms. Their ability to create rather consistent alpha has been the driving force behind their widespread adoption. As we will show the magic seems to be under pressure.

Our four-factor model is based on the well-researched equity factors value, quality, momentum and low volatility with **S&P 1200 Global** as the universe. Value is defined by 12-month trailing EV/EBITDA and price-to-book (if the former is not available, which is often the case for financials). Quality is defined as return on invested capital (ROIC) and return on equity (ROE, if the former is not available, which is the case for financials). Momentum is defined as the three-month change in price (the window has not been optimized). The stocks’ betas are estimated by linear regression against the market return based on 18 months of monthly observations (the estimation window has not been optimized). All factors have been lagged properly to avoid look-ahead bias.

To avoid concentration risk, as certain industries often dominate a factor at a given period, the factor model spreads its exposure systematically across the equity market. A natural choice is to use industry classifications such as GICS, but from a risk management perspective we are more interested in correlation clusters. We find these clusters by applying hierarchical clustering on excess returns (the stocks’ returns minus the market return) looking back 24 months (again, this window has not been optimized). We require at least 50 stocks in each cluster otherwise the algorithm stops.

The median number of clusters across 1,000 bootstraps is robust around 6 over time compared to the 10 GICS sectors that represent the highest level of segmentation, even – surprisingly – during the financial crisis of 2008 despite a dramatic increase in cross-sectional correlation over that period. It is only the period from June 2000 to June 2003 that sees a big change in the market structure with fewer distinct clusters.

For each cluster at any given point in time in the backtesting period we rescale the raw equity factors to the range [0, 1]. Another approach would be to normalize the factors but that approach would be more sensitive to outliers which are quite profound in our equity factors. The average of rescaled factors are then ranked and those stocks with the highest combined rank (high earnings yield, high return on capital, strong price momentum and low beta) are chosen.

This is probably the simplest method in factor investing, but why increase complexitity if the increase in risk-adjusted returns are not significant? Another approach is use factors in regressions and compute the expected returns based on the stocks’ factors.

Our data goes back to 1996 but with 24 months reserved for the estimation window for calculating clusters the actual results start in 1998 and end in November 2015. We have specified the portfolio size to 40 stocks based on research showing that after point the **marginal reduction in risk is not meaningful** (in a long-only framework). Increasing the portfolio size reduces monthly turnover and hence cost, but we have not optimized this parameter. We have set trading costs (one-way average bid-ask spread and commission) to 0.15% which is probably on the high side in today’s market but on the low side in 1998, but on average likely a fair average trade cost for investors investing in global equities with no access to prime brokerage services.

The backtest results are based on 1,000 bootstraps in order to get more robust estimates of performance measures such as annualized return. In a follow-up post we will explain the importance of bootstrapping when testing cross-sectional equity strategies. One may object to the replacement requirement because it creates situations where the portfolio will select the same stock more than once, which increases concentration risk and departs from an equal-weight framework. However, sampling without replacement would reduce the universe from 1,200 securities.

Despite high trading costs and high turnover our four-factor model delivers alpha against random portfolios with both Kolmogorov-Smirnov (comparing the two distributions) and t-test (on excess annualized returns across bootstraps) showing it to be highly significant. The Kolmogorov-Smirnov statistic (D) is 0.82 and the t-test statistics on excess returns is 60. The four-factor model delivers on average 8.8% annualized return over the 1,000 bootstraps compared to 5.8% annualized return for S&P 1200 Global (orange vertical line) and an average 5.3% annualized return for random portfolios.

In our previous post we showed that **random portfolios beat buy-and-hold for the S&P 500 index**, but in this case they do not. The reason is the small portfolio size being roughly three percent of the universe leading to excessive turnover and hence costs.

The Sharpe ratio is also decent at 0.56 on average across the 1,000 bootstraps compared to 0.28 for random portfolios – hence a very significant improvement. A natural extension would be to explore ways to improve the risk-adjusted return further, for example by shorting the stocks with the worst total score thereby creating a market-neutral portfolio. is one possible solution.

However, our research shows that a market-neutral version does not deliver consistent alpha. In general we find that our equity models do not capture alpha equally well on the long and short side, indicating that drivers of equity returns are not symmetric. So far, we have found market-neutral equity strategies to have more merit on shorter time-frames (intraday or daily), but encourage readers to share any findings on longer time-frames (weekly or monthly frequency).

Interestingly the cumulative excess performance chart shows exactly what Cliff Asness, co-founder of AQR Capital Management, has explained at multiple occasions about the firm’s early start. Namely persistent underperformance as the momentum effect dominated the late 1990s and catapulted the U.S. equity market into a historical bubble. Value investing together with Warren Buffett was ridiculed. Basically stocks that were richly valued, had low or negative return on capital, high beta and high momentum performed very well.

However, starting in 2000 our four-factor model enjoys a long streak of outperformance similar to the fortunes of AQR and it continued until summer 2009. Since then excess return against S&P 1200 Global (we choose this as benchmark here because it beats random portfolios) has been more or less flat. In other words, it performs similarly to the market, but does not generate alpha for our active approach.

Why are traditional equity factor models not producing alpha to the same degree as the period 2000-2009? Two possible explanations come to mind. Competition in financial markets has gone up and with cheap access to computers, widespread adoption of open-source code and equity factors well-researched, the alpha has been competed away. Alternatively, the standard way of creating a four-factor model has run out of juice and factors can still work, but have to be applied in different ways. Maybe the factors should not be blended into a combined score, but instead the best stocks from each factor should be selected. There are endless ways to construct an equity factor model.

### risk.factors is an array with dimensions 239, 2439 and 7 (months, unique stocks in S&amp;amp;amp;P 1200 Global over the whole period, equity factors). ### variables such as cluster.window etc. are specified in our data handling script ### hist.tickers is a matrix with dimensions 239, 2439 (months, unique stocks in S&amp;amp;amp;P 1200 Global over the whole period) - basically a matrix with ones or NAs indicating whether a ticker was part of the index or not at a given point in time ### tr is a matrix containing historical total returns (including reinvesting of dividends and adjustments for corporate actions) with same dimensions as hist.tickers # variables no.pos <- 40 # number of positions in portfolio strategy <- "long" # long or long-short? min.stock.cluster <- 50 # minimum stocks per cluster B <- 1000 # number of bootstraps tc <- 0.15 # one-way trade cost in % (including bid-ask and commission) # pre-allocate list with length of dates to contain portfolio info over dates strat <- vector("list", length(dates)) names(strat) <- dates # pre-allocate xts object for portfolio returns port.ret <- xts(matrix(NA, N, B), order.by = dates) # pre-allocate xts object for random portfolio retuns rand.port.ret <- xts(matrix(NA, N, B), order.by = dates) # number of clusters over time no.clusters <- xts(matrix(NA, N, B), order.by = dates) # initialise text progress bar pb <- progress::progress_bar$new(format = "calculating [:bar] :percent eta: :eta", total = B, clear = FALSE, width = 60) # loop of bootstraps for(b in 1:B) { # loop over dates for(n in (cluster.window+2):N) { # rolling dates window dw <- (n - cluster.window):(n - 1) # index members at n time indx.memb <- which(hist.tickers[n - 1, ] == 1) # if number of bootstraps is above one if(b > 1) { # sample with replacement of index members at period n indx.memb <- sample(indx.memb, length(indx.memb), replace = T) } complete.obs <- which(apply(is.na(tr[dw, indx.memb]), 2, sum) == 0) # update index members at n time by complete observations for correlation indx.memb <- indx.memb[complete.obs] # temporary total returns temp.tr <- tr[dw, indx.memb] # normalised returns norm.ret <- scale(temp.tr) # fit PCA on normalised returns pca.fit <- prcomp(norm.ret) # estimate market returns from first PCA component x <- (norm.ret %*% pca.fit$rotation)[, 1] # estimate beta betas <- as.numeric(solve(t(x) %*% x) %*% t(x) %*% norm.ret) # estimate residuals (normalised return minus market) res <- norm.ret - tcrossprod(x, betas) # correlation matrix cm <- cor(res) # distance matrix dm <- as.dist((1 - cm) / 2) # fit a hierarchical agglomerative clustering fit <- hclust(dm, method = "average") for(i in 2:20) { # assign tickers into clusters groups <- cutree(fit, k = i) # minimum number of tickers in a cluster group.min <- min(table(groups)) # if smallest cluster has less than minimum required number of stocks break loop if(group.min < min.stock.cluster) { groups <- cutree(fit, k = i - 1) break } } # number of clusters G <- length(unique(groups)) # insert number of clusters no.clusters[n, b] <- G # stocks per cluster cluster.size <- table(groups) # number of positions per cluster risk.allocation <- round(table(groups) / sum(table(groups)) * no.pos) # pre-allocate list for containing all trade info on each cluster cluster.info <- vector("list", G) # loop over clusters for(g in 1:G) { # find the ticker positions in the specific cluster cluster.pos <- indx.memb[which(groups == g)] # which tickers have total returns for period n has.ret <- which(!is.na(tr[n, cluster.pos])) # adjust stock's position for g cluster based on available forward return cluster.pos <- cluster.pos[has.ret] # rescale quality risk factor quality.1 <- risk.factors[n - 1, cluster.pos, "quality.1"] quality.2 <- risk.factors[n - 1, cluster.pos, "quality.2"] quality.1.rank <- (quality.1 - min(quality.1, na.rm = T)) / (max(quality.1, na.rm = T) - min(quality.1, na.rm = T)) quality.2.rank <- (quality.2 - min(quality.2, na.rm = T)) / (max(quality.2, na.rm = T) - min(quality.2, na.rm = T)) quality.rank <- ifelse(!is.na(quality.2.rank), quality.2.rank, quality.1.rank) # rescale value risk factor value.1 <- risk.factors[n - 1, cluster.pos, "value.1"] value.2 <- risk.factors[n - 1, cluster.pos, "value.2"] value.1.rank <- (value.1 - min(value.1, na.rm = T)) / (max(value.1, na.rm = T) - min(value.1, na.rm = T)) value.2.rank <- (value.2 - min(value.2, na.rm = T)) / (max(value.2, na.rm = T) - min(value.2, na.rm = T)) value.rank <- ifelse(!is.na(value.2.rank), value.2.rank, value.1.rank) # rescale momentum risk factor mom <- risk.factors[n - 1, cluster.pos, "mom"] mom.rank <- (mom - min(mom, na.rm = T)) / (max(mom, na.rm = T) - min(mom, na.rm = T)) # rescale beta risk factor beta <- risk.factors[n - 1, cluster.pos, "beta"] * -1 beta.rank <- (beta - min(beta, na.rm = T)) / (max(beta, na.rm = T) - min(beta, na.rm = T)) # rescale reversal risk factor reversal <- risk.factors[n - 1, cluster.pos, "reversal"] * -1 reversal.rank <- (reversal - min(reversal, na.rm = T)) / (max(reversal, na.rm = T) - min(reversal, na.rm = T)) # combine all normalised risk factor ranks into one matrix ranks <- cbind(quality.rank, value.rank, mom.rank, beta.rank)#, reversal.rank) if(sum(complete.cases(ranks)) < risk.allocation[g]) { col.obs <- apply(!is.na(ranks), 2, sum) col.comp <- which(col.obs > (cluster.size[g] / 2)) comb.rank <- rank(apply(ranks[, col.comp], 1, mean), na.last = "keep") } else { comb.rank <- rank(apply(ranks, 1, mean), na.last = "keep") } if(strategy == "long") { long.pos <- cluster.pos[which(comb.rank > max(comb.rank, na.rm = T) - risk.allocation[g])] cluster.info[[g]] <- data.frame(Ticker = tickers[long.pos], Ret = as.numeric(tr[n, long.pos]), stringsAsFactors = FALSE) } if(strategy == "long-short") { long.pos <- cluster.pos[which(comb.rank > max(comb.rank, na.rm = T) - risk.allocation[g])] short.pos <- cluster.pos[which(comb.rank < risk.allocation[g] + 1)] long.data <- data.frame(Ticker = tickers[long.pos], Sign = rep("Long", risk.allocation[g]), Ret = as.numeric(tr[n, long.pos]), stringsAsFactors = FALSE) short.data <- data.frame(Ticker = tickers[short.pos], Sign = rep("Short", risk.allocation[g]), Ret = as.numeric(tr[n, short.pos]) * -1, stringsAsFactors = FALSE) cluster.info[[g]] <- rbind(long.data, short.data) } } # rbind data.frames across clusters and insert into strat list strat[[n]] <- do.call("rbind", cluster.info) # insert portfolio return if(n == cluster.window + 2) { port.ret[n, b] <- mean(strat[[n]][,"Ret"]) - tc * 2 / 100 } else { # turnover in % (only selling) strat.turnover <- 1 - sum(!is.na(match(strat[[n]][, "Ticker"], strat[[n-1]][, "Ticker"]))) / length(strat[[n-1]][, "Ticker"]) port.ret[n, b] <- mean(strat[[n]][,"Ret"]) - tc * strat.turnover * 2 / 100 } # insert random portfolio return rand.pos <- sample(indx.memb[which(!is.na(tr[n, indx.memb]))], size = no.pos, replace = T) if(n == cluster.window + 2) { rand.port.ret[n, b] <- mean(tr[n, rand.pos]) - tc * 2 / 100 prev.rand.pos <- rand.pos } else { rand.turnover <- 1 - sum(!is.na(match(prev.rand.pos, rand.pos))) / length(prev.rand.pos) rand.port.ret[n, b] <- mean(tr[n, rand.pos]) - tc * rand.turnover * 2 / 100 } } # update progress bar pb$tick() Sys.sleep(1 / 100) }

]]>

The evaluation of quantitative equity portfolios typically involves a comparison with a relevant benchmark, routinely a broad index such as the S&P 500 index. This is an easy and straightforward approach, but also – we believe – sets the bar too low resulting in too many favorable research outcomes. However, we want to hold ourselves to a higher standard and as a bare minimum that must include being able to beat a random portfolio – the classic dart-throwing monkey portfolio.

A market capitalization-weighted index, such as the S&P 500 index, is inherently a size-based portfolio where those equities which have done well *in the long run* are given the highest weight. It is, in other words, dominated by equities with long-run momentum. Given that an index is nothing more than a size-based trading strategy, we should ask ourselves whether there exist other simple strategies that perform better. The answer to this is in the affirmative and one such strategy is to generate a portfolio purely from chance. Such a strategy will choose, say, 20 equities among the constituents of an index each month and invest an equal amount in each position. After a month the process is repeated, 20 new equities are randomly picked, and so forth.

The fact that random portfolios outperform their corresponding indices is nothing new. David Winton Harding, the founder of the hedge fund Winton Capital, even went on CNBC last year to explain the concept, but he is far from only in highlighting the out-performance of random portfolios (e.g. here and here).

To demonstrate the idea we have carried out the research ourselves. We use the S&P 500 index – without survival bias and accounting for corporate actions such as dividends – and focus on the period from January 2000 to November 2015. We limit ourselves in this post to this time span as it mostly covers the digitization period, but results from January 1990 show similar results. Starting on 31 December 1999 we randomly select *X *equities from the list of S&P 500 constituents that particular month, assign equal weights, and hold this portfolio in January 2000. We then repeat the process on the final day of January and hold in February and every subsequent month until November 2015.

In the chart above the annualized returns for 1000 portfolios containing 10, 20, and 50 equities are depicted (the orange line represents the S&P 500 index). A few things are readily apparent:

- The mean of the annualized returns is practically unchanged across portfolio sizes
- The standard deviation of the annualized returns narrows as the portfolio size increases
- All three portfolio sizes beat the S&P 500 index
- 6.5% of the 1000 random portfolios of size 10 have an annualized return which is lower than that of the S&P 500 index (4.2%). Of the portfolios with sizes 20 and 50 the percentages are 0.9% and 0%, respectively. In other words, not a single of the 1000 random portfolios of size 50 delivers a annualized return below 4.2%.

So far we have talked one or many random portfolios without being too specific, but for random portfolios to work we need a large number of samples (i.e. portfolios) so that performance statistics, such as the annualized return, tend toward stable values. The chart below shows the cumulative mean over the number of random portfolios (size = 10), which stabilizes as the number of random portfolios increases (the orange line represents the mean across all 1000 portfolios).

All three portfolios beat the S&P 500 index in terms of annualized return, but we must keep in mind that these portfolios’ turnovers are high and hence we need to allow for trading costs. The analysis is thus repeated below for 1000 random portfolios with 50 positions with trade costs of 0%, 0.1% and 0.2% (round trip).

Unsurprisingly the mean annualized return declines as trading costs increase. Whereas not a single of the 1000 random portfolios of size 50 delivered an annualized return below 4.2% without trade costs , 2 and 40 portfolios have lower returns when trade costs of 0.1% and 0.2%, respectively, are added. Put differently, even with trade costs of 0.2% (round trip) every month a portfolio of 50 random stocks outperformed the S&P 500 index in terms of annualized return in 960 of 1000 instances.

Weighing by size is simple, and we like simple. But when it comes to equity portfolios we demand more. Put differently, if our upcoming quantitative equity portfolios cannot beat a randomly-generated portfolio what is the point? Therefore, going forward, we will refrain from using the S&P 500 index or any other appropriate index and instead compare our equity models to the results presented above. We want to beat not only the index, we want to beat random. We want to beat the dart-throwing monkey!

ADDENDUM: A couple of comments noted that we must use an an equal weight index to be consistent with the random portfolio approach. This can, for example, be achieved by investing in the Guggenheim S&P 500 Equal Weight ETF, which has yielded 9.6% annualized since inception in April, 2003 (with an expense ratio of 0.4%). The ETF has delivered a higher annualized return than that of the random portfolios (mean) when trading costs are added.

In other words, if you want to invest invest equally in the S&P 500 constituents, there is an easy way to do it. We will continue to use random portfolios as a benchmark as it is a simple approach, which our models must beat and we can choose the starting date as we please.

########################################################### # # # INPUT: # # prices: (months x equities) matrix of close prices # # prices_index: (months x 1) matrix of index close prices # # tickers: (months x equities) binary matrix indicating # # whether a stock was present in the index in that month # # # ########################################################### draws <- 1000 start_time <- "1999-12-31" freq_cal <- "MONTHLY" N <- NROW(prices) # Number of months J <- NCOL(prices) # Number of constituents in the S&P 500 index prices <- as.matrix(prices) prices_index <- as.matrix(prices_index) prices_ETF <- as.matrix(prices_ETF) # Narrow the window #prices <- prices[-1:-40, , drop = FALSE] #prices_index <- prices_index[-1:-40, , drop = FALSE] #prices_ETF <- prices_ETF[-1:-40, , drop = FALSE] #N <- NROW(prices) # Combinations sizes <- c(10, 20, 50) ; nsizes <- length(sizes) # Portfolio sizes costs <- c(0, 0.1, 0.2) ; ncosts <- length(costs) # Trading costs (round trip) # Array that stores performance statistics perf <- array(NA, dim = c(nsizes, ncosts, 3, draws), dimnames = list(paste("Size", sizes, sep = " "), paste("Cost", costs, sep = " "), c("Ret(ann)", "SD(ann)", "SR(ann)"), NULL)) # Loop across portfolio sizes for(m in 1:nsizes) { # Storage array ARR <- array(NA, dim = c(N, sizes[m], draws)) # Loop across time (months) for(n in 1:(N - 1)) { # Which equities are available? cols <- which(tickers[n, ] == 1) # Forward return for available equities fwd_returns <- prices[n + 1, cols]/prices[n, cols] - 1 # Are these equities also available at n + 1? cols <- which(is.na(fwd_returns) == FALSE) # Forward return for available equities fwd_returns <- fwd_returns[cols] # Loop across portfolios for(i in 1:draws) { # Sample a portfolio of size 'sizes[m]' samp <- sample(x = cols, size = sizes[m], replace = F) # Store a vector of forward returns in ARR ARR[n, , i] <- fwd_returns[samp] } # End of i loop } # End of n loop ARR[is.na(ARR)] <- 0 # Loop across trading costs for(m2 in 1:ncosts) { # Performance calculations returns_mean <- apply(ARR, c(1, 3), mean) - costs[m2]/100 returns_cum <- apply(returns_mean + 1, 2, cumprod) returns_ann <- tail(returns_cum, 1)^(percent_exponent/N) - 1 std_ann <- exp(apply(log(1 + returns_mean), 2, StdDev.annualized, scale = percent_exponent)) - 1 sr_ann <- returns_ann / std_ann perf[m, m2, "Ret(ann)", ] <- returns_ann * 100 perf[m, m2, "SD(ann)", ] <- std_ann * 100 perf[m, m2, "SR(ann)", ] <- sr_ann } # End of m2 loop } # End of m loop # Index and ETF returns returns_index <- prices_index[-1, 1]/prices_index[-N, 1] - 1 returns_ava_index <- sum(!is.na(returns_index)) returns_index[is.na(returns_index)] <- 0 returns_cum_index <- c(1, cumprod(1 + returns_index)) returns_ann_index <- tail(returns_cum_index, 1)^(percent_exponent/returns_ava_index) - 1 returns_ETF <- prices_ETF[-1, 1]/prices_ETF[-N, 1] - 1 returns_ava_ETF <- sum(!is.na(returns_ETF)) returns_ETF[is.na(returns_ETF)] <- 0 returns_cum_ETF <- c(1, cumprod(1 + returns_ETF)) returns_ann_ETF <- tail(returns_cum_ETF, 1)^(percent_exponent/returns_ava_ETF) - 1 # Print medians to screen STAT_MED <- apply(perf, c(1, 2, 3), median, na.rm = TRUE) rownames(STAT_MED) <- paste("Size ", sizes, sep = "") colnames(STAT_MED) <- paste("Cost ", costs, sep = "") print(round(STAT_MED, 2))

]]>

The Predictive Alpha blog favors complete transparency so all relevant R code is freely available. This allows visitors of the blog to recreate our results and hopefully also promotes discussion and feedback.

]]>