16 Parametric portfolio policies

In this chapter, we apply different portfolio performance measures to evaluate and compare portfolio allocation strategies. For this purpose, we introduce a direct way to estimate optimal portfolio weights for large-scale cross-sectional applications. More precisely, the approach of Brandt, Santa-Clara, and Valkanov (2009) proposes to parametrize the optimal portfolio weights as a function of stock characteristics instead of estimating the stock’s expected return, variance, and covariances with other stocks in a prior step. We choose weights as a function of the characteristics, which maximize the expected utility of the investor. This approach is feasible for large portfolio dimensions (such as the entire CRSP universe) and has been proposed by Brandt, Santa-Clara, and Valkanov (2009). See the review paper Brandt (2010) for an excellent treatment of related portfolio choice methods.

The current chapter relies on the following set of packages:

16.1 Data preparation

To get started, we load the monthly CRSP file, which forms our investment universe. We load the data from our SQLite-database introduced in Chapters 2-4.

tidy_finance <- dbConnect(
  SQLite(), "data/tidy_finance.sqlite",
  extended_types = TRUE
)

crsp_monthly <- tbl(tidy_finance, "crsp_monthly") |>
  collect()

To evaluate the performance of portfolios, we further use monthly market returns as a benchmark to compute CAPM alphas.

factors_ff_monthly <- tbl(tidy_finance, "factors_ff_monthly") |>
  collect()

Next, we retrieve some stock characteristics that have been shown to have an effect on the expected returns or expected variances (or even higher moments) of the return distribution. In particular, we record the lagged one-year return momentum (momentum_lag), defined as the compounded return between months \(t-12\) and \(t-2\) for each firm. In finance, momentum is the empirically observed tendency for rising asset prices to rise further, and falling prices to keep falling (Jegadeesh and Titman 1993). The second characteristic is the firm’s market equity (size_lag), defined as the log of the price per share times the number of shares outstanding (Banz 1981). To construct the correct lagged values, we use the approach introduced in Chapter 3.

crsp_monthly_lags <- crsp_monthly |>
  transmute(permno,
    month_12 = month %m+% months(12),
    mktcap
  )

crsp_monthly <- crsp_monthly |>
  inner_join(crsp_monthly_lags,
    by = c("permno", "month" = "month_12"),
    suffix = c("", "_12")
  )

data_portfolios <- crsp_monthly |>
  mutate(
    momentum_lag = mktcap_lag / mktcap_12,
    size_lag = log(mktcap_lag)
  ) |>
  drop_na(contains("lag"))

16.2 Parametric portfolio policies

The basic idea of parametric portfolio weights is as follows. Suppose that at each date \(t\) we have \(N_t\) stocks in the investment universe, where each stock \(i\) has a return of \(r_{i, t+1}\) and is associated with a vector of firm characteristics \(x_{i, t}\) such as time-series momentum or the market capitalization. The investor’s problem is to choose portfolio weights \(w_{i,t}\) to maximize the expected utility of the portfolio return: \[\begin{aligned} \max_{w} E_t\left(u(r_{p, t+1})\right) = E_t\left[u\left(\sum\limits_{i=1}^{N_t}w_{i,t}r_{i,t+1}\right)\right] \end{aligned}\] where \(u(\cdot)\) denotes the utility function.

Where do the stock characteristics show up? We parameterize the optimal portfolio weights as a function of the stock characteristic \(x_{i,t}\) with the following linear specification for the portfolio weights: \[w_{i,t} = \bar{w}_{i,t} + \frac{1}{N_t}\theta'\hat{x}_{i,t},\] where \(\bar{w}_{i,t}\) is a stock’s weight in a benchmark portfolio (we use the value-weighted or naive portfolio in the application below), \(\theta\) is a vector of coefficients which we are going to estimate, and \(\hat{x}_{i,t}\) are the characteristics of stock \(i\), cross-sectionally standardized to have zero mean and unit standard deviation.

Intuitively, the portfolio strategy is a form of active portfolio management relative to a performance benchmark. Deviations from the benchmark portfolio are derived from the individual stock characteristics. Note that by construction the weights sum up to one as \(\sum_{i=1}^{N_t}\hat{x}_{i,t} = 0\) due to the standardization. Moreover, the coefficients are constant across assets and over time. The implicit assumption is that the characteristics fully capture all aspects of the joint distribution of returns that are relevant for forming optimal portfolios.

We first implement cross-sectional standardization for the entire CRSP universe. We also keep track of (lagged) relative market capitalization relative_mktcap, which will represent the value-weighted benchmark portfolio, while n denotes the number of traded assets \(N_t\), which we use to construct the naive portfolio benchmark.

data_portfolios <- data_portfolios |>
  group_by(month) |>
  mutate(
    n = n(),
    relative_mktcap = mktcap_lag / sum(mktcap_lag),
    across(contains("lag"), ~ (. - mean(.)) / sd(.)),
  ) |>
  ungroup() |>
  select(-mktcap_lag, -altprc)

16.3 Computing portfolio weights

Next, we move on to identify optimal choices of \(\theta\). We rewrite the optimization problem together with the weight parametrization and can then estimate \(\theta\) to maximize the objective function based on our sample \[\begin{aligned} E_t\left(u(r_{p, t+1})\right) = \frac{1}{T}\sum\limits_{t=0}^{T-1}u\left(\sum\limits_{i=1}^{N_t}\left(\bar{w}_{i,t} + \frac{1}{N_t}\theta'\hat{x}_{i,t}\right)r_{i,t+1}\right). \end{aligned}\] The allocation strategy is straightforward because the number of parameters to estimate is small. Instead of a tedious specification of the \(N_t\) dimensional vector of expected returns and the \(N_t(N_t+1)/2\) free elements of the covariance matrix, all we need to focus on in our application is the vector \(\theta\). \(\theta\) contains only two elements in our application - the relative deviation from the benchmark due to size and momentum.

To get a feeling for the performance of such an allocation strategy, we start with an arbitrary initial vector \(\theta_0\). The next step is to choose \(\theta\) optimally to maximize the objective function. We automatically detect the number of parameters by counting the number of columns with lagged values.

n_parameters <- sum(str_detect(
  colnames(data_portfolios), "lag"
))

theta <- rep(1.5, n_parameters)

names(theta) <- colnames(data_portfolios)[str_detect(
  colnames(data_portfolios), "lag"
)]

The function compute_portfolio_weights() below computes the portfolio weights \(\bar{w}_{i,t} + \frac{1}{N_t}\theta'\hat{x}_{i,t}\) according to our parametrization for a given value \(\theta_0\). Everything happens within a single pipeline. Hence, we provide a short walk-through.

We first compute characteristic_tilt, the tilting values \(\frac{1}{N_t}\theta'\hat{x}_{i, t}\) which resemble the deviation from the benchmark portfolio. Next, we compute the benchmark portfolio weight_benchmark, which can be any reasonable set of portfolio weights. In our case, we choose either the value or equal-weighted allocation. weight_tilt completes the picture and contains the final portfolio weights weight_tilt = weight_benchmark + characteristic_tilt which deviate from the benchmark portfolio depending on the stock characteristics.

The final few lines go a bit further and implement a simple version of a no-short sale constraint. While it is generally not straightforward to ensure portfolio weight constraints via parameterization, we simply normalize the portfolio weights such that they are enforced to be positive. Finally, we make sure that the normalized weights sum up to one again: \[w_{i,t}^+ = \frac{\max(0, w_{i,t})}{\sum_{j=1}^{N_t}\max(0, w_{i,t})}.\]

The following function computes the optimal portfolio weights in the way just described.

compute_portfolio_weights <- function(theta,
                                      data,
                                      value_weighting = TRUE,
                                      allow_short_selling = TRUE) {
  data |>
    group_by(month) |>
    bind_cols(
      characteristic_tilt = data |>
        transmute(across(contains("lag"), ~ . / n)) |>
        as.matrix() %*% theta |> as.numeric()
    ) |>
    mutate(
      # Definition of benchmark weight
      weight_benchmark = case_when(
        value_weighting == TRUE ~ relative_mktcap,
        value_weighting == FALSE ~ 1 / n
      ),
      # Parametric portfolio weights
      weight_tilt = weight_benchmark + characteristic_tilt,
      # Short-sell constraint
      weight_tilt = case_when(
        allow_short_selling == TRUE ~ weight_tilt,
        allow_short_selling == FALSE ~ pmax(0, weight_tilt)
      ),
      # Weights sum up to 1
      weight_tilt = weight_tilt / sum(weight_tilt)
    ) |>
    ungroup()
}

In the next step, we compute the portfolio weights for the arbitrary vector \(\theta_0\). In the example below, we use the value-weighted portfolio as a benchmark and allow negative portfolio weights.

weights_crsp <- compute_portfolio_weights(theta,
  data_portfolios,
  value_weighting = TRUE,
  allow_short_selling = TRUE
)

16.4 Portfolio performance

Are the computed weights optimal in any way? Most likely not, as we picked \(\theta_0\) arbitrarily. To evaluate the performance of an allocation strategy, one can think of many different approaches. In their original paper, Brandt, Santa-Clara, and Valkanov (2009) focus on a simple evaluation of the hypothetical utility of an agent equipped with a power utility function \(u_\gamma(r) = \frac{(1 + r)^\gamma}{1-\gamma}\), where \(\gamma\) is the risk aversion factor.

power_utility <- function(r, gamma = 5) {
  (1 + r)^(1 - gamma) / (1 - gamma)
}

We want to note that Gehrig, Sögner, and Westerkamp (2020) warn that, in the leading case of constant relative risk aversion (CRRA), strong assumptions on the properties of the returns, the variables used to implement the parametric portfolio policy, and the parameter space are necessary to obtain a well-defined optimization problem.

No doubt, there are many other ways to evaluate a portfolio. The function below provides a summary of all kinds of interesting measures that can be considered relevant. Do we need all these evaluation measures? It depends: the original paper Brandt, Santa-Clara, and Valkanov (2009) only cares about the expected utility to choose \(\theta\). However, if you want to choose optimal values that achieve the highest performance while putting some constraints on your portfolio weights, it is helpful to have everything in one function.

evaluate_portfolio <- function(weights_crsp,
                               full_evaluation = TRUE) {
  evaluation <- weights_crsp |>
    group_by(month) |>
    summarize(
      return_tilt = weighted.mean(ret_excess, weight_tilt),
      return_benchmark = weighted.mean(ret_excess, weight_benchmark)
    ) |>
    pivot_longer(-month,
      values_to = "portfolio_return",
      names_to = "model"
    ) |>
    group_by(model) |>
    left_join(factors_ff_monthly, by = "month") |>
    summarize(tibble(
      "Expected utility" = mean(power_utility(portfolio_return)),
      "Average return" = 100 * mean(12 * portfolio_return),
      "SD return" = 100 * sqrt(12) * sd(portfolio_return),
      "Sharpe ratio" = mean(portfolio_return) / sd(portfolio_return),
      "CAPM alpha" = coefficients(lm(portfolio_return ~ mkt_excess))[1],
      "Market beta" = coefficients(lm(portfolio_return ~ mkt_excess))[2]
    )) |>
    mutate(model = str_remove(model, "return_")) |>
    pivot_longer(-model, names_to = "measure") |>
    pivot_wider(names_from = model, values_from = value)

  if (full_evaluation) {
    weight_evaluation <- weights_crsp |>
      select(month, contains("weight")) |>
      pivot_longer(-month, values_to = "weight", names_to = "model") |>
      group_by(model, month) |>
      transmute(tibble(
        "Absolute weight" = abs(weight),
        "Max. weight" = max(weight),
        "Min. weight" = min(weight),
        "Avg. sum of negative weights" = -sum(weight[weight < 0]),
        "Avg. fraction of negative weights" = sum(weight < 0) / n()
      )) |>
      group_by(model) |>
      summarize(across(-month, ~ 100 * mean(.))) |>
      mutate(model = str_remove(model, "weight_")) |>
      pivot_longer(-model, names_to = "measure") |>
      pivot_wider(names_from = model, values_from = value)
    evaluation <- bind_rows(evaluation, weight_evaluation)
  }
  return(evaluation)
}

Let us take a look at the different portfolio strategies and evaluation measures.

evaluate_portfolio(weights_crsp) |>
  print(n = Inf)
# A tibble: 11 × 3
   measure                            benchmark     tilt
   <chr>                                  <dbl>    <dbl>
 1 Expected utility                  -0.249     -0.262  
 2 Average return                     7.12      -0.445  
 3 SD return                         15.3       21.0    
 4 Sharpe ratio                       0.135     -0.00613
 5 CAPM alpha                         0.000123  -0.00582
 6 Market beta                        0.993      0.930  
 7 Absolute weight                    0.0247     0.0632 
 8 Max. weight                        3.54       3.67   
 9 Min. weight                        0.0000277 -0.145  
10 Avg. sum of negative weights       0         77.9    
11 Avg. fraction of negative weights  0         49.4    

The value-weighted portfolio delivers an annualized return of more than 6 percent and clearly outperforms the tilted portfolio, irrespective of whether we evaluate expected utility, the Sharpe ratio, or the CAPM alpha. We can conclude the market beta is close to one for both strategies (naturally almost identically 1 for the value-weighted benchmark portfolio). When it comes to the distribution of the portfolio weights, we see that the benchmark portfolio weight takes less extreme positions (lower average absolute weights and lower maximum weight). By definition, the value-weighted benchmark does not take any negative positions, while the tilted portfolio also takes short positions.

16.5 Optimal parameter choice

Next, we move to a choice of \(\theta\) that actually aims to improve some (or all) of the performance measures. We first define a helper function compute_objective_function(), which we then pass to an optimizer.

compute_objective_function <- function(theta,
                                       data,
                                       objective_measure = "Expected utility",
                                       value_weighting = TRUE,
                                       allow_short_selling = TRUE) {
  processed_data <- compute_portfolio_weights(
    theta,
    data,
    value_weighting,
    allow_short_selling
  )

  objective_function <- evaluate_portfolio(processed_data,
    full_evaluation = FALSE
  ) |>
    filter(measure == objective_measure) |>
    pull(tilt)

  return(-objective_function)
}

You may wonder why we return the negative value of the objective function. This is simply due to the common convention for optimization procedures to search for minima as a default. By minimizing the negative value of the objective function, we get the maximum value as a result. In its most basic form, R optimization relies on the function optim(). As main inputs, the function requires an initial guess of the parameters and the objective function to minimize. Now, we are fully equipped to compute the optimal values of \(\hat\theta\), which maximize the hypothetical expected utility of the investor.

optimal_theta <- optim(
  par = theta,
  compute_objective_function,
  objective_measure = "Expected utility",
  data = data_portfolios,
  value_weighting = TRUE,
  allow_short_selling = TRUE
)

optimal_theta$par
momentum_lag     size_lag 
      0.0713      -1.9487 

The resulting values of \(\hat\theta\) are easy to interpret: intuitively, expected utility increases by tilting weights from the value-weighted portfolio towards smaller stocks (negative coefficient for size) and towards past winners (positive value for momentum). Both findings are in line with the well-documented size effect (Banz 1981) and the momentum anomaly (Jegadeesh and Titman 1993).

16.6 More model specifications

How does the portfolio perform for different model specifications? For this purpose, we compute the performance of a number of different modeling choices based on the entire CRSP sample. The next code chunk performs all the heavy lifting.

full_model_grid <- expand_grid(
  value_weighting = c(TRUE, FALSE),
  allow_short_selling = c(TRUE, FALSE),
  data = list(data_portfolios)
) |>
  mutate(optimal_theta = pmap(
    .l = list(
      data,
      value_weighting,
      allow_short_selling
    ),
    .f = ~ optim(
      par = theta,
      compute_objective_function,
      data = ..1,
      objective_measure = "Expected utility",
      value_weighting = ..2,
      allow_short_selling = ..3
    )$par
  ))

Finally, we can compare the results. The table below shows summary statistics for all possible combinations: equal- or value-weighted benchmark portfolio, with or without short-selling constraints, and tilted towards maximizing expected utility.

performance_table <- full_model_grid |>
  mutate(
    processed_data = pmap(
      .l = list(
        optimal_theta,
        data,
        value_weighting,
        allow_short_selling
      ),
      .f = ~ compute_portfolio_weights(..1, ..2, ..3, ..4)
    ),
    portfolio_evaluation = map(processed_data,
      evaluate_portfolio,
      full_evaluation = TRUE
    )
  ) |>
  select(
    value_weighting,
    allow_short_selling,
    portfolio_evaluation
  ) |>
  unnest(portfolio_evaluation)

performance_table |>
  rename(
    " " = benchmark,
    Optimal = tilt
  ) |>
  mutate(
    value_weighting = case_when(
      value_weighting == TRUE ~ "VW",
      value_weighting == FALSE ~ "EW"
    ),
    allow_short_selling = case_when(
      allow_short_selling == TRUE ~ "",
      allow_short_selling == FALSE ~ "(no s.)"
    )
  ) |>
  pivot_wider(
    names_from = value_weighting:allow_short_selling,
    values_from = " ":Optimal,
    names_glue = "{value_weighting} {allow_short_selling} {.value} "
  ) |>
  select(
    measure,
    `EW    `,
    `VW    `,
    sort(contains("Optimal"))
  ) |>
  print(n = 11)
# A tibble: 11 × 7
   measure      `EW    ` `VW    ` VW  Op…¹ VW (no…² EW  Op…³ EW (no…⁴
   <chr>           <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
 1 Expected ut… -0.250   -2.49e-1 -0.246   -0.247   -0.250   -2.50e-1
 2 Average ret… 10.7      7.12e+0 14.9     13.6     13.1      8.14e+0
 3 SD return    20.3      1.53e+1 20.5     19.5     22.6      1.70e+1
 4 Sharpe ratio  0.152    1.35e-1  0.209    0.202    0.168    1.38e-1
 5 CAPM alpha    0.00226  1.23e-4  0.00646  0.00526  0.00428  4.69e-4
 6 Market beta   1.13     9.93e-1  1.01     1.04     1.14     1.08e+0
 7 Absolute we…  0.0247   2.47e-2  0.0373   0.0247   0.0255   2.47e-2
 8 Max. weight   0.0247   3.54e+0  3.37     2.69     0.0672   2.20e-1
 9 Min. weight   0.0247   2.77e-5 -0.0283   0       -0.0305   0      
10 Avg. sum of…  0        0       26.4      0        1.73     0      
11 Avg. fracti…  0        0       38.7      0        6.27     0      
# … with abbreviated variable names ¹​`VW  Optimal `,
#   ²​`VW (no s.) Optimal `, ³​`EW  Optimal `, ⁴​`EW (no s.) Optimal `

The results indicate that the average annualized Sharpe ratio of the equal-weighted portfolio exceeds the Sharpe ratio of the value-weighted benchmark portfolio. Nevertheless, starting with the weighted value portfolio as a benchmark and tilting optimally with respect to momentum and small stocks yields the highest Sharpe ratio across all specifications. Finally, imposing no short-sale constraints does not improve the performance of the portfolios in our application.

16.7 Exercises

  1. How do the estimated parameters \(\hat\theta\) and the portfolio performance change if your objective is to maximize the Sharpe ratio instead of the hypothetical expected utility?
  2. The code above is very flexible in the sense that you can easily add new firm characteristics. Construct a new characteristic of your choice and evaluate the corresponding coefficient \(\hat\theta_i\).
  3. Tweak the function optimal_theta() such that you can impose additional performance constraints in order to determine \(\hat\theta\), which maximizes expected utility under the constraint that the market beta is below 1.
  4. Does the portfolio performance resemble a realistic out-of-sample backtesting procedure? Verify the robustness of the results by first estimating \(\hat\theta\) based on past data only. Then, use more recent periods to evaluate the actual portfolio performance.
  5. By formulating the portfolio problem as a statistical estimation problem, you can easily obtain standard errors for the coefficients of the weight function. Brandt, Santa-Clara, and Valkanov (2009) provide the relevant derivations in their paper in Equation (10). Implement a small function that computes standard errors for \(\hat\theta\).