Univariate Portfolio Sorts


You are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.

In this chapter, we dive into portfolio sorts, one of the most widely used statistical methodologies in empirical asset pricing (e.g., Bali, Engle, and Murray 2016). The key application of portfolio sorts is to examine whether one or more variables can predict future excess returns. In general, the idea is to sort individual stocks into portfolios, where the stocks within each portfolio are similar with respect to a sorting variable, such as firm size. The different portfolios then represent well-diversified investments that differ in the level of the sorting variable. You can then attribute the differences in the return distribution to the impact of the sorting variable. We start by introducing univariate portfolio sorts (which sort based on only one characteristic) and tackle bivariate sorting in Value and Bivariate Sorts.

A univariate portfolio sort considers only one sorting variable \(x_{t-1,i}\). Here, \(i\) denotes the stock and \(t-1\) indicates that the characteristic is observable by investors at time \(t\).
The objective is to assess the cross-sectional relation between \(x_{t-1,i}\) and, typically, stock excess returns \(r_{t,i}\) at time \(t\) as the outcome variable. To illustrate how portfolio sorts work, we use estimates for market betas from the previous chapter as our sorting variable.

The current chapter relies on the following set of R packages.


Compared to previous chapters, we introduce lmtest (Zeileis and Hothorn 2002) for inference for estimated coefficients, broom package (Robinson, Hayes, and Couch 2022) to tidy the estimation output of many estimated linear models, and sandwich (Zeileis 2006) for different covariance matrix estimators

Data Preparation

We start with loading the required data from our SQLite-database introduced in Accessing and Managing Financial Data and WRDS, CRSP, and Compustat. In particular, we use the monthly CRSP sample as our asset universe. Once we form our portfolios, we use the Fama-French market factor returns to compute the risk-adjusted performance (i.e., alpha). beta is the tibble with market betas computed in the previous chapter.

tidy_finance <- dbConnect(
  extended_types = TRUE

crsp_monthly <- tbl(tidy_finance, "crsp_monthly") |>
  select(permno, month, ret_excess, mktcap_lag) |>

factors_ff3_monthly <- tbl(tidy_finance, "factors_ff3_monthly") |>
  select(month, mkt_excess) |>

beta <- tbl(tidy_finance, "beta") |>
  select(permno, month, beta_monthly) |>

Sorting by Market Beta

Next, we merge our sorting variable with the return data. We use the one-month lagged betas as a sorting variable to ensure that the sorts rely only on information available when we create the portfolios. To lag stock beta by one month, we add one month to the current date and join the resulting information with our return data. This procedure ensures that month \(t\) information is available in month \(t+1\). You may be tempted to simply use a call such as crsp_monthly |> group_by(permno) |> mutate(beta_lag = lag(beta))) instead. This procedure, however, does not work correctly if there are non-explicit missing values in the time series.

beta_lag <- beta |>
  mutate(month = month %m+% months(1)) |>
  select(permno, month, beta_lag = beta_monthly) |>

data_for_sorts <- crsp_monthly |>
  inner_join(beta_lag, join_by(permno, month))

The first step to conduct portfolio sorts is to calculate periodic breakpoints that you can use to group the stocks into portfolios. For simplicity, we start with the median lagged market beta as the single breakpoint. We then compute the value-weighted returns for each of the two resulting portfolios, which means that the lagged market capitalization determines the weight in weighted.mean().

beta_portfolios <- data_for_sorts |>
  group_by(month) |>
    breakpoint = median(beta_lag),
    portfolio = case_when(
      beta_lag <= breakpoint ~ "low",
      beta_lag > breakpoint ~ "high"
  ) |>
  group_by(month, portfolio) |>
  summarize(ret = weighted.mean(ret_excess, mktcap_lag), 
            .groups = "drop")

Performance Evaluation

We can construct a long-short strategy based on the two portfolios: buy the high-beta portfolio and, at the same time, short the low-beta portfolio. Thereby, the overall position in the market is net-zero, i.e., you do not need to invest money to realize this strategy in the absence of frictions.

beta_longshort <- beta_portfolios |>
  pivot_wider(id_cols = month, names_from = portfolio, values_from = ret) |>
  mutate(long_short = high - low)

We compute the average return and the corresponding standard error to test whether the long-short portfolio yields on average positive or negative excess returns. In the asset pricing literature, one typically adjusts for autocorrelation by using Whitney K. Newey and West (1987) \(t\)-statistics to test the null hypothesis that average portfolio excess returns are equal to zero. One necessary input for Newey-West standard errors is a chosen bandwidth based on the number of lags employed for the estimation. While it seems that researchers often default on choosing a pre-specified lag length of 6 months, we instead recommend a data-driven approach. This automatic selection is advocated by Whitney K. Newey and West (1994) and available in the sandwich package. To implement this test, we compute the average return via lm() and then employ the coeftest() function. If you want to implement the typical 6-lag default setting, you can enforce it by passing the arguments lag = 6, prewhite = FALSE to the coeftest() function in the code below and it passes them on to NeweyWest().

model_fit <- lm(long_short ~ 1, data = beta_longshort)
coeftest(model_fit, vcov = NeweyWest)

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.39e-05   1.30e-03   -0.06     0.95

The results indicate that we cannot reject the null hypothesis of average returns being equal to zero. Our portfolio strategy using the median as a breakpoint hence does not yield any abnormal returns. Is this finding surprising if you reconsider the CAPM? It certainly is. The CAPM yields that the high beta stocks should yield higher expected returns. Our portfolio sort implicitly mimics an investment strategy that finances high beta stocks by shorting low beta stocks. Therefore, one should expect that the average excess returns yield a return that is above the risk-free rate.

Functional Programming for Portfolio Sorts

Now we take portfolio sorts to the next level. We want to be able to sort stocks into an arbitrary number of portfolios. For this case, functional programming is very handy: we employ the curly-curly-operator to give us flexibility concerning which variable to use for the sorting, denoted by sorting_variable. We use quantile() to compute breakpoints for n_portfolios. Then, we assign portfolios to stocks using the findInterval() function. The output of the following function is a new column that contains the number of the portfolio to which a stock belongs.

In some applications, the variable used for the sorting might be clustered (e.g., at a lower bound of 0). Then, multiple breakpoints may be identical, leading to empty portfolios. Similarly, some portfolios might have a very small number of stocks at the beginning of the sample. Cases where the number of portfolio constituents differs substantially due to the distribution of the characteristics require careful consideration and, depending on the application, might require customized sorting approaches.

assign_portfolio <- function(data, 
                             n_portfolios) {
  # Compute breakpoints
  breakpoints <- data |>
    pull({{ sorting_variable }}) |>
      probs = seq(0, 1, length.out = n_portfolios + 1),
      na.rm = TRUE,
      names = FALSE

  # Assign portfolios
  assigned_portfolios <- data |>
    mutate(portfolio = findInterval(
      pick(everything()) |>
        pull({{ sorting_variable }}),
      all.inside = TRUE
    )) |>
  # Output

We can use the above function to sort stocks into ten portfolios each month using lagged betas and compute value-weighted returns for each portfolio. Note that we transform the portfolio column to a factor variable because it provides more convenience for the figure construction below.

beta_portfolios <- data_for_sorts |>
  group_by(month) |>
    portfolio = assign_portfolio(
      data = pick(everything()),
      sorting_variable = beta_lag,
      n_portfolios = 10
    portfolio = as.factor(portfolio)
  ) |>
  group_by(portfolio, month) |>
    ret_excess = weighted.mean(ret_excess, mktcap_lag),
    .groups = "drop"
  left_join(factors_ff3_monthly, join_by(month))

More Performance Evaluation

In the next step, we compute summary statistics for each beta portfolio. Namely, we compute CAPM-adjusted alphas, the beta of each beta portfolio, and average returns.

beta_portfolios_summary <- beta_portfolios |>
  nest(data = c(month, ret_excess, mkt_excess)) |>
  mutate(estimates = map(
    data, ~ tidy(lm(ret_excess ~ 1 + mkt_excess, data = .x))
  )) |>
  unnest(estimates) |> 
  select(portfolio, term, estimate) |> 
  pivot_wider(names_from = term, values_from = estimate) |> 
  rename(alpha = `(Intercept)`, beta = mkt_excess) |> 
    beta_portfolios |> 
      group_by(portfolio) |> 
      summarize(ret_excess = mean(ret_excess),
                .groups = "drop"), join_by(portfolio)

Figure 1 illustrates the CAPM alphas of beta-sorted portfolios. It shows that low beta portfolios tend to exhibit positive alphas, while high beta portfolios exhibit negative alphas.

beta_portfolios_summary |>
  ggplot(aes(x = portfolio, y = alpha, fill = portfolio)) +
  geom_bar(stat = "identity") +
    title = "CAPM alphas of beta-sorted portfolios",
    x = "Portfolio",
    y = "CAPM alpha",
    fill = "Portfolio"
  ) +
  scale_y_continuous(labels = percent) +
  theme(legend.position = "None")
Title: CAPM alphas of beta-sorted portfolios. The figure shows bar charts of alphas of beta-sorted portfolios with the decile portfolio on the horizontal axis and the corresponding CAPM alpha on the vertical axis. Alphas for low beta portfolios are positive, while high beta portfolios show negative alphas.
Figure 1: Portfolios are sorted into deciles each month based on their estimated CAPM beta. The bar charts indicate the CAPM alpha of the resulting portfolio returns during the entire CRSP period.

These results suggest a negative relation between beta and future stock returns, which contradicts the predictions of the CAPM. According to the CAPM, returns should increase with beta across the portfolios and risk-adjusted returns should be statistically indistinguishable from zero.

The Security Market Line and Beta Portfolios

The CAPM predicts that our portfolios should lie on the security market line (SML). The slope of the SML is equal to the market risk premium and reflects the risk-return trade-off at any given time. Figure 2 illustrates the security market line: We see that (not surprisingly) the high beta portfolio returns have a high correlation with the market returns. However, it seems like the average excess returns for high beta stocks are lower than what the security market line implies would be an “appropriate” compensation for the high market risk.

sml_capm <- lm(ret_excess ~ 1 + beta, data = beta_portfolios_summary)$coefficients

beta_portfolios_summary |>
    x = beta, 
    y = ret_excess, 
    color = portfolio
  )) +
  geom_point() +
    intercept = 0,
    slope = mean(factors_ff3_monthly$mkt_excess),
    linetype = "solid"
  ) +
    intercept = sml_capm[1],
    slope = sml_capm[2],
    linetype = "dashed"
  ) +
    labels = percent,
    limit = c(0, mean(factors_ff3_monthly$mkt_excess) * 2)
  ) +
  scale_x_continuous(limits = c(0, 2)) +
    x = "Beta", y = "Excess return", color = "Portfolio",
    title = "Average portfolio excess returns and average beta estimates"
Title: Average portfolio excess returns and average beta estimates. The figure shows a scatter plot of the average excess returns per beta portfolio with average beta estimates per portfolio on the horizontal axis and average excess returns on the vertical axis. An increasing solid line indicates the security market line. A dashed increasing line with lower slope than the security market line indicates that the CAPM prediction is not valid for CRSP data.
Figure 2: Excess returns are computed as CAPM alphas of the beta-sorted portfolios. The horizontal axis indicates the CAPM beta of the resulting beta-sorted portfolio return time series. The dashed line indicates the slope coefficient of a linear regression of excess returns on portfolio betas.

To provide more evidence against the CAPM predictions, we again form a long-short strategy that buys the high-beta portfolio and shorts the low-beta portfolio.

beta_longshort <- beta_portfolios |>
  mutate(portfolio = case_when(
    portfolio == max(as.numeric(portfolio)) ~ "high",
    portfolio == min(as.numeric(portfolio)) ~ "low"
  )) |>
  filter(portfolio %in% c("low", "high")) |>
  pivot_wider(id_cols = month, 
              names_from = portfolio, 
              values_from = ret_excess) |>
  mutate(long_short = high - low) |>
  left_join(factors_ff3_monthly, join_by(month))

Again, the resulting long-short strategy does not exhibit statistically significant returns.

coeftest(lm(long_short ~ 1, data = beta_longshort),
  vcov = NeweyWest

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.00156    0.00325    0.48     0.63

However, the long-short portfolio yields a statistically significant negative CAPM-adjusted alpha, although, controlling for the effect of beta, the average excess stock returns should be zero according to the CAPM. The results thus provide no evidence in support of the CAPM. The negative value has been documented as the so-called betting against beta factor (Frazzini and Pedersen 2014). Betting against beta corresponds to a strategy that shorts high beta stocks and takes a (levered) long position in low beta stocks. If borrowing constraints prevent investors from taking positions on the SML they are instead incentivized to buy high beta stocks, which leads to a relatively higher price (and therefore lower expected returns than implied by the CAPM) for such high beta stocks. As a result, the betting-against-beta strategy earns from providing liquidity to capital constraint investors with lower risk aversion.

coeftest(lm(long_short ~ 1 + mkt_excess, data = beta_longshort),
  vcov = NeweyWest

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.00470    0.00245   -1.92    0.055 .  
mkt_excess   1.15395    0.08893   12.98   <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure 3 shows the annual returns of the extreme beta portfolios we are mainly interested in. The figure illustrates no consistent striking patterns over the last years; each portfolio exhibits periods with positive and negative annual returns.

beta_longshort |>
  group_by(year = year(month)) |>
    low = prod(1 + low),
    high = prod(1 + high),
    long_short = prod(1 + long_short)
  ) |>
  pivot_longer(cols = -year) |>
  ggplot(aes(x = year, y = 1 - value, fill = name)) +
  geom_col(position = "dodge") +
  facet_wrap(~name, ncol = 1) +
  theme(legend.position = "none") +
  scale_y_continuous(labels = percent) +
    title = "Annual returns of beta portfolios",
    x = NULL, y = NULL
Title: Annual returns of beta portfolios. The figure shows bar charts of annual returns of long, short, and long-short beta portfolios with years on the horizontal axis and returns on the vertical axis. Each portfolio is plotted in its own facet. The long-short portfolio strategy delivers very high losses during some periods.
Figure 3: We construct portfolios by sorting stocks into high and low based on their estimated CAPM beta. Long short indicates a strategy that goes long into high beta stocks and short low beta stocks.

Overall, this chapter shows how functional programming can be leveraged to form an arbitrary number of portfolios using any sorting variable and how to evaluate the performance of the resulting portfolios. In the next chapter, we dive deeper into the many degrees of freedom that arise in the context of portfolio analysis.


  1. Take the two long-short beta strategies based on different numbers of portfolios and compare the returns. Is there a significant difference in returns? How do the Sharpe ratios compare between the strategies? Find one additional portfolio evaluation statistic and compute it.
  2. We plotted the alphas of the ten beta portfolios above. Write a function that tests these estimates for significance. Which portfolios have significant alphas?
  3. The analysis here is based on betas from monthly returns. However, we also computed betas from daily returns. Re-run the analysis and point out differences in the results.
  4. Given the results in this chapter, can you define a long-short strategy that yields positive abnormal returns (i.e., alphas)? Plot the cumulative excess return of your strategy and the market excess return for comparison.


Bali, Turan G, Robert F Engle, and Scott Murray. 2016. Empirical asset pricing: The cross section of stock returns. John Wiley & Sons. https://doi.org/10.1002/9781118445112.stat07954.
Frazzini, Andrea, and Lasse Heje Pedersen. 2014. Betting against beta.” Journal of Financial Economics 111 (1): 1–25. https://doi.org/10.1016/j.jfineco.2013.10.005.
Newey, Whitney .K, and Kenneth D. West. 1994. Automatic lag selection in covariance matrix estimation.” The Review of Economic Studies 61 (4): 631–53. https://www.jstor.org/stable/2297912.
Newey, Whitney K., and Kenneth D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance Matrix.” Econometrica 55 (3): 703–8. http://www.jstor.org/stable/1913610.
Robinson, David, Alex Hayes, and Simon Couch. 2022. broom: Convert statistical objects into tidy tibbles. https://CRAN.R-project.org/package=broom.
Zeileis, Achim. 2006. Object-Oriented computation of sandwich estimators.” Journal of Statistical Software 16 (9): 1–16. http://dx.doi.org/10.18637/jss.v016.i09.
Zeileis, Achim, and Torsten Hothorn. 2002. Diagnostic checking in regression relationships.” R News 2 (3): 7–10. https://CRAN.R-project.org/doc/Rnews/.