FX: multivariate stochastic volatility – part 2

In part 2 our mean-variance optimal FX portfolio is allowed to choose from multiple models each week based on a measure of goodness (MSSE). The risk-adjusted return improves as a result with the annualized Sharpe Ratio rising to 0.86 from 0.49.

In part 1 we estimated a sequential multivariate stochastic volatility model on a portfolio of five FX pairs (EURAUD, EURCAD, EURGBP, EURJPY, and EURUSD) and constructed a mean-variance optimized portfolio. Using non-optimized settings the portfolio produced an annualized return of 5.4% compared with 4.1% for the benchmark, but this outperformance was due to higher volatility, and the annualized Sharpe Ratio (SR) of 0.49 failed to match the benchmark’s SR of 0.68.

Our non-optimized model in part 1 did not include autoregressive lags (that is, d = 0) and used discount factors close to one (beta = delta = 0.99), resulting in slow evolution of the intercepts and covariance matrix. Today, we will adjust these three parameters in an attempt to improve on the (risk-adjusted) return. Concretely, we let beta = {0.80, 0.85, 0.90, 0.95, 0.99}delta = {0.95, 0.96, …, 0.99}, and d = {0, 1, 3, 5} resulting in 100 model combinations. This is where sequential estimation shines as we do not need to re-estimate 100 models in batch for every time period in our evaluation period (574 weeks). Our use of weekly data makes batch estimation (much) less of a problem in terms of speed than if we have used higher-frequency data, but imagine if we used 1-minute prices instead (>7,000 observations per week).


Out of the 100 model combinations, in terms of total return the best model delivers 81% (see chart above). This model, which uses d = 0, beta = 0.99 and delta = 0.98, is very close to our original when it comes to both total return performance and the settings used, suggesting that from a total return point of view the parameters we used in part 1 were close to optimal.

However, if we sort by the Sharpe Ratio instead (see chart below), the best model is the one with beta = 0.80delta = 0.98, and d = 3, which  delivers a SR of 0.80, higher than the benchmark’s SR of 0.68. This model includes three autoregressive terms for each FX pair and allows for a very fast evolution of the covariance matrix (i.e. beta = 0.80) resulting in a degrees of freedom of just 4. Therefore this model allows much wider tails than a normal distribution – a reasonable feature, we would expect a priori, given that we are dealing with price return data.

Interestingly, the chart also reveals that those models with autoregressive terms generally perform worse than those without (at least with respect to SR). The first (left-most) 25 combinations, which generally have annualized Sharpe Ratios close to 0.6, are those combinations where we are simply fitting a multivariate stochastic volatility mode (i.e. d = 0) for combinations of beta and delta. However, there are some combinations with higher SR which do include autoregressive terms, including the ‘best model’ mentioned in the previous paragraph.


The performance of the 100 combinations are of course only known in hindsight and hence would not have been of much help at the start of any given week (n = 1, 2, …, 574) where we would have had to decide on which model – and hence which expected return vector and predictive covariance matrix – to feed into our mean-variance optimizer.

We could calculate a performance statistic such as annualized SR or total return on a rolling basis and choose our model each week accordingly. Instead, each week we calculate the mean squared standardized 1-step forecast errors (MSSE) over a lookback window of 52 weeks (see Triantafyllopoulos [2008], page 7, for details) – similar to the lookback window used in the calculations for the benchmark portfolio in part 1. We then average the MSSE across the five FX pairs to arrive at a measure of goodness. If the model performs well across both lookback window and FX pairs then the measure should be close to one. If it is below one then the model generally predicts too large variances and it is above one then the model generally predicts too small variances.


The portfolio delivers a total return of 43.7% which translates to an annualized return of around 3.3%. This is well below the annualized return of 5.4% achieved with the ‘default settings’, but that was produced via high leverage and high volatility. The drawdowns – as should be clear from the chart above – are much smaller in this portfolio with the largest drawdown at 4.4% and the annualized SR is 0.86 – even higher than the ‘best SR model’ found above.

Because our portfolio is allowed to switch between combinations according to the goodness measure (average MSSE) it can switch between models with high or low predicted covariances and between models with and without autoregressive lags depending on the best average MSSE each week.

The selected model includes five autoregressive lags per FX pair in 287 of the 574 weeks (50%), three lags in 131 weeks, one lag in 101 weeks and zero lags (i.e. a multivariate stochastic volatility model) in 55 weeks. The VAR parameters are allowed to evolve at a moderate pace (delta = 0.95) in 271 weeks while the covariance matrix mostly evolves at a moderate-to-slow pace with 522 weeks spent in combinations where beta >= 0.95. However, in 52 weeks the selected model has a beta of less than or equal to 0.9.


To demonstrate the robustness of the portfolio’s performance to specific model combinations we randomly exclude 10% of the combinations and re-run the optimization. We do this 1,000 times and find that the portfolio continues to perform well with 90% of the annualized Sharpe Ratios in the interval from 0.76 to 0.97 and 95% in the interval from 0.73 to 1.00. We need to exclude well north of 50% of the combinations before we see a material effect on the SR.


The portfolio changes its leverage based on the predicted covariance, the predicted return vector and the required return (10% at an annualized rate). No less than 448 weeks (78%) are spent with a leverage of less than 1 while the portfolio is leveraged at least two times (gross exposure of at least 200%) in 73 weeks. This helps explain the relatively smooth cumulative return chart above.


Leverage reached a high of 4.38 on August 23, 2013 and net exposure (to EUR) climbed to 1.49, meaning that the portfolio had nearly 150% of net long exposure to the Euro during the following week. Similarly, net exposure reached a low of -1.33 on June 21, 2013, indicating that the portfolio was heavily biased against the Euro during the following week.

These figures compare with an average leverage of 0.71 and an average net exposure of 0.00. The fact that net exposure is (very close to) zero is positive given that we have chosen an unconstrained mean-variance optimization. Had we constrained the portfolio weights to sum to one as is typically done, we would in effect have placed a long bet on the Euro relative to the five other currencies (AUD, CAD, JPY, GBP, and USD), thereby assuming – or at least betting on – a positive drift in the five FX pairs during our test period (2005-2015).

In this part 2 we have shown how to (estimate and) select a model sequentially based on a measure of goodness (average MSSE). Based on this measure we can choose the best model each week and use the model’s output (prediction return vector and prediction covariance matrix) for determining the mean-variance optimal weights. One can test many more combinations than the 100 used above and also use another (or multiple) measure(s) of goodness. Furthermore, one can add complexity to the model by incorporating an autoregressive process for the volatility precision covariance matrix (see Triantafyllopoulos, 2013) or move over to particle filters (and perhaps include regime switching, see e.g. Bao et al., 2012).

### Time-varying Vector Autoregression with Stochastic Volatility ###

TVVARSV <- function(x, AR = 0, beta = 0.99, delta = 0.99, lookback = ceiling(NROW(x)/10)) {

x <- as.matrix(x)
N <- NROW(x) # The number of observations per time series
J <- NCOL(x) # The number of time series

# Constants
q <- 1/(1 - beta)
k <- (beta * (1 - J) + J)/(beta * (2 - J) + J - 1)

# Prior specification
m <- matrix(0, J * AR + 1, J)
P <- diag(0.01, J * AR + 1, J * AR + 1)
S <- diag(0.02, J, J)

dinv <- diag(1/sqrt(delta), J * AR + 1, J * AR + 1)

jnam <- paste("J", 1:J, sep = "")
cnam <- "Int"
if (AR > 0) for (a in 1:AR) cnam <- c(cnam, paste("AR", a, "_", jnam, sep = ""))

# Storage
out <- list(means = array(NA, dim = c(N, J, 4), dimnames = list(NULL, jnam, c("actual", "pred", "error", "scaled"))),
cov = array(NA, dim = c(N, J, J), dimnames = list(NULL, jnam, jnam)),
stats = array(NA, dim = c(N, J, 4), dimnames = list(NULL, jnam, c("ME", "MAE", "RMSE", "MSSE"))),
df = q * beta, beta = beta, delta = delta, AR = AR)

for (n in (AR + 1):N) {

xn <- matrix(x[n, ], J, 1)
fn <- matrix(1)
if (AR > 0) fn <- rbind(fn, matrix(x[(n - 1):(n - AR), ], ncol = 1, byrow = TRUE))

# Prediction
R <- dinv %*% P %*% dinv
Q <- as.numeric(t(fn) %*% R %*% fn) + 1

ycov <- Q * (1 - beta)/(3 * beta * k - 2 * k) * S

pred <- t(m) %*% fn

# If xn contains missing values then use the predictions
xn <- ifelse(is.na(xn), pred, xn)

err <- xn - pred
zrr <- solve(t(chol(ycov))) %*% err

out$means[n, , "actual"] <- as.numeric(xn)
out$means[n, , "pred"] <- as.numeric(pred)
out$means[n, , "error"] <- as.numeric(err)
out$means[n, , "scaled"] <- as.numeric(zrr)

out$cov[n, , ] <- ycov

# Update
K <- R %*% fn / Q
m <- m + K %*% t(err)
P <- R - tcrossprod(K) * Q
S <- S/k + tcrossprod(err)/Q


out$stats[, , "ME"] <- apply(out$means[, , "error", drop = F], 2, TTR::runMean, n = lookback)
out$stats[, , "MAE"] <- apply(abs(out$means[, , "error", drop = F]), 2, TTR::runMean, n = lookback)
out$stats[, , "RMSE"] <- apply(out$means[, , "error", drop = F]^2, 2, TTR::runMean, n = lookback)
out$stats[, , "MSSE"] <- apply(out$means[, , "scaled", drop = F]^2, 2 ,TTR::runMean, n = lookback)



### Unconstrained MV ###

SeqMeanVar <- function(exp.returns, req.return = 0, covar) {

icovar <- solve(covar)
weights <- req.return * as.numeric(icovar %*% exp.returns) / (t(exp.returns) %*% icovar %*% exp.returns)




#### Input: A matrix of FX prices (N x J) with dates as rownames

freq <- 52
lookback <- freq * 1
start.date <- "2004-12-31"
return.req <- 1.1^(1/52) - 1

N <- NROW(prices)
J <- NCOL(prices)

start.obs <- which(start.date == rownames(prices))

returns.log <- log(prices[-1, ]/prices[-N, ])
returns.ari <- prices[-1, ]/prices[-N, ] - 1

N <- N - 1

dates <- rownames(returns.log)

# Run models
comb <- expand.grid(beta = c(seq(0.80, 0.95, 0.05), 0.99),
delta = seq(0.95, 0.99, 0.01),
AR = c(0, 1, 3, 5))

# No. of combinations
cnum <- NROW(comb)

# Combination names
comb.names <- c(paste("Comb_", 1:cnum, sep = ""), "BM", "Best")

# Store all combinations (plus the best and the benchmark)
stats.arr <- array(NA, dim = c(N, NROW(comb) + 2, 6),
dimnames = list(dates, comb.names, c("Ret", "Exc", "AnnVol", "Lev", "Net", "Stat")))

# A list of models
model.list <- vector("list", cnum)

# Loop across combinations (cnum + 1 is the benchmark)
for (i in 1:(cnum + 1)) {

cat("Combination", i, "of", NROW(comb), "\n")

if (i <= cnum) {

model <- TVVARSV(x = returns.log, beta = comb[i, "beta"], delta = comb[i, "delta"],
lookback = lookback, AR = comb[i, "AR"])

stats.arr[, i, "Stat"] <- apply(model$stats[, , "MSSE", drop = F], 1, mean, na.rm = T)

model.list[[i]] <- model


for (n in (start.obs + 1):N) {

if (i == cnum + 1) {

# Historical mean and covariance
model.mean <- exp(apply(log(1 + returns.ari[(n - lookback):(n - 1), ]), 2, mean)) - 1
model.cov <- exp(cov(log(1 + returns.ari[(n - lookback):(n - 1), ]))) - 1

} else {

# Model mean and covariance
model.mean <- exp(model$means[n, , "pred"]) - 1
model.cov <- exp(model$cov[n, , ]) - 1


# Mean-variance optimization
weights.MV <- SeqMeanVar(exp.returns = model.mean, req.return = return.req, covar = model.cov)

# Predicted portfolio volatility at annual frequency
stats.arr[n, i, "AnnVol"] <- log(sqrt(t(weights.MV) %*% model.cov %*% weights.MV) + 1) * sqrt(freq)

# Gross and net exposure
stats.arr[n, i, "Lev"] <- sum(abs(weights.MV))
stats.arr[n, i, "Net"] <- sum(weights.MV)

# Return
stats.arr[n, i, "Ret"] <- sum(weights.MV * returns.ari[n, ])



# Excess return
stats.arr[, 1:(cnum + 1), "Exc"] <- stats.arr[, 1:(cnum + 1), "Ret"] - matrix(stats.arr[, "BM", "Ret"], N, cnum + 1)

for (n in start.obs:N) {

# Find the best model using MSSE (lagged one week)
best <- which.min((1 - stats.arr[n - 1, 1:cnum, "Stat"])^2)

stats.arr[n, "Best", ] <- stats.arr[n, best, ]


# Log returns
log.ret <- matrix(log(1 + stats.arr[start.obs:N, , "Ret"]), ncol = cnum + 2)

# Annualized returns
ann.ret <- exp(apply(log.ret, 2, mean) * freq) - 1

# Annualized (log) Sharpe ratios (risk-free rate = 0%)
ann.lSR <- apply(log.ret, 2, mean)/apply(log.ret, 2, sd) * sqrt(freq)

# Cumulative returns
cum.ret <- rbind(1, apply(1 + stats.arr[start.obs:N, , "Ret", drop = F], 2, cumprod))
tot.ret <- tail(cum.ret, 1)

# Drawdowns
drawdowns <- cum.ret/apply(cum.ret, 2, cummax) - 1

Trading strategies: No need for the holy grail

We demonstrate that weak trading signals, which do not offer high risk-adjusted returns on their own, can be combined into a powerful portfolio. In other words, no need for holy grails when researching signals.

We start our experiment with some key assumptions. We have 20 signals with annualized log returns of 8% and annualized Sharpe Ratios of 0.6 – not exactly stellar signals. The signals make daily bets. The strategies in this experiment run for 10 years (on a daily basis), but we will later show how the statistics change when the number of observations decline. The experiment is repeated 500 times to get a sense of the distributions of relevant statistics, such as Sharpe Ratios and annualized returns.

An important input variable in trading is the correlation between signals and our experiment is carried out across a sequence of cross-signal correlations from zero to 0.9. Disregarding trading costs (because we are simply interested in relative performances) and using daily rebalancing the distributions of annualized portfolio returns across correlations looks more or less identical. Clearly, having more than one strategy does not improve the annualized return – regardless of the correlations between the strategies.


Blending multiple signals with lower correlation does not enhance returns, but the chart above does hint at the benefit to having more strategies – and especially if those strategies are relatively uncorrelated. The left-most distributions are much narrower and none of the 500 trials have returns below zero (up to and including strategies with correlations of 0.4).

The result becomes clearer when we move to risk-adjusted returns measured by the Sharpe Ratio. Here 20 strategies with zero correlation and low individual annualized Sharpe Ratios of 0.6 turn into a portfolio with an annualized Sharpe Ratio of 3 compared to 0.64 for a portfolio with average correlations of 0.9 between the trading strategies – this translates into a 370% improvement.


What is also fascinating about the chart above is how fast the improvement in Sharpe Ratio declines as the signals become more correlated. Increasing the correlations to 0.2 from 0 results in a decline of 56% in the Sharpe Ratio.

Despite a high Sharpe Ratio and around 50,000 bets across the signals (i.e. trading strategies) the variance of the Sharpe Ratios for the zero cross-signal correlation portfolios is still staggering. One investor might get lucky and produce a Sharpe Ratio of 3.5 (probably turning the person into a multi-billionaire) while another investor with the same types of strategies may be less fortunate, resulting in a Sharpe Ratio of 2.5. Luck does play a role in trading even for high Sharpe Ratio portfolios.

Obviously, an edge becomes clearer with more observations. What happens if our investor only has one year of observations rather than ten years? The chart below shows the explosion in the variances of the Sharpe Ratios across correlations. Despite 5,000 trades most portfolios cannot be separated from random luck. It is clear why data-driven hedge funds prefer higher frequencies (intraday trading). It validates the signals faster.


If we simulate 10,000 time-series with the above properties what is the percentage of them with a p-value lower than 5%? The answer is close to 48%, which could lead most researchers to discard such daily strategies (with an annualized Sharpe Ratio of 0.6). However, blending such weak signals can result in magic – if the correlations are low enough – whereby a portfolio’s combined return stream becomes highly significant. Among the zero correlation portfolios all of them have a p-value of less than 5%.


A daily strategy with an annualized Sharpe Ratio of 0.6 would likely be discarded on its own by a researcher as insufficient to produce anything attractive in trading. But with the right (i.e. low) correlations to existing signals, it could well add value to the portfolio.

This post does not break new ground as the effects of diversification are well-known in the investment community, but it does serve as a reminder that instead of discarding that 0.6 Sharpe Ratio strategy of yours perhaps you can add it to your existing portfolio of strategies, thereby lowering your portfolio volatility and hence allowing for more leverage to be used enhancing the total return.

Note: This post will be the final post (for a while) on random returns and fundamental concepts. Upcoming blog posts will instead focus on specific trading strategies in various asset classes.

# define variables
N <- 252 * 10 # number of observations
M <- 500 # number of trials
cors <- seq(0, .9, .1) # sequence of cross-signal correlations
strats <- 20 # number of strategies
er <- 8 / 252 # expected daily log return in percent
esr <- 0.6 / sqrt(252) # expected daily sharpe ratio

# pre-allocate arrays to store portfolio returns and p-values
port.rets <- array(NA, dim = c(N, M, length(cors)))
port.pv <- array(NA, dim = c(M, length(cors)))

for (i in 1:length(cors)) {

# correlation structure
R <- matrix(1, strats, strats)
R[lower.tri(R)] <- cors[i]
R[upper.tri(R)] <- cors[i]

# transpose Choleski decomposition
W <- t(chol(R))

for (m in 1:M) {

# generate 5 random stratey return streams with zero mean and expected weekly log returns
rets <- matrix(rnorm(N*strats, 0, er / esr / 100), ncol = strats)

# multiply the Cholesky decomposition matrix with our random returns and add drift
rets <- t(W %*% t(rets)) + er / 100

# calculate equal-weighted portfolio return
port.rets[, m, i] <- log(apply(exp(rets) - 1, 1, mean) + 1)

# insert p-values for portfolios
port.pv[m, i] <- t.test(port.rets[, m, i])$p.value


cat(paste("Calculations done for correlation", cors[i]), "\n")


# calculate annualized Sharpe ratio and returns
ann.sr <- apply(port.rets, c(2, 3), mean) / apply(port.rets, c(2, 3), sd) * sqrt(252)
ann.ret <- apply(port.rets, c(2, 3), mean) * 252

# calculate percent of portfolios with p-value below 5% across correlations
pv <- apply(port.pv < 0.05, 2, mean)

# calculate theoretical percentage of time-series with p-value below 5% for the specified strategy properties
# daily strategy with annualized Sharpe Ratio of 0.6
t.pv <- sum(replicate(10000, ifelse(t.test(rnorm(252 * 10, 8 / 252 / 100, (8/252/100) / (0.6/sqrt(252))))[["p.value"]] < 0.05, 1, 0))) / 10000

# create data.frame for ggplot2 charts of p-values
pv.df <- data.frame(P.value = c(t.pv, sort(pv)) * 100,
Name = c("Single signal p-value", paste("Correlation (", sort(cors, decreasing = TRUE), ")", sep = "")))

# create data.frame for ggplot2 charts of annualized return and Sharpe Ratio
stat.df <- data.frame(Sharpe = c(ann.sr),
Ann.Returns = c(ann.ret),
Correlation = rep(cors,each = M))