4 TRACE and FISD

In this chapter, we dive into the US corporate bond market. Bond markets are far more diverse than stock markets, as most issuers have multiple bonds outstanding simultaneously with potentially very different indentures. This market segment is exciting due to its size (roughly 10 trillion USD outstanding), heterogeneity of issuers (as opposed to government bonds), market structure (mostly over-the-counter trades), and data availability. We introduce how to use bond characteristics from FISD and trade reports from TRACE and provide code to download and clean TRACE in R.

Many researchers study liquidity in the US corporate bond market O’Hara and Zhou (2021). We do not cover bond returns here, but you can compute them from TRACE data. Instead, we refer to studies on the topic such as Bessembinder et al. (2008), Bai, Bali, and Wen (2019), and Kelly, Palhares, and Pruitt (2021) and a survey by Huang and Shi (2021). Moreover, WRDS includes bond returns computed from TRACE data at a monthly frequency.

The current chapter relies on this set of packages.

Compared to previous chapters, we load the devtools package (Wickham, Hester, et al. 2022) to source code that we provided to the public via gist.

4.1 Bond data from WRDS

Both bond databases we need are available on WRDS to which we establish the RPostgres connection described in the previous chapter. Additionally, we connect to our local SQLite-database to store the data we download.

wrds <- dbConnect(
  Postgres(),
  host = "wrds-pgdata.wharton.upenn.edu",
  dbname = "wrds",
  port = 9737,
  sslmode = "require",
  user = Sys.getenv("user"),
  password = Sys.getenv("password")
)

tidy_finance <- dbConnect(
  SQLite(),
  "data/tidy_finance.sqlite",
  extended_types = TRUE
)

4.2 Mergent FISD

For research on US corporate bonds, the Mergent Fixed Income Securities Database (FISD) is the primary resource for bond characteristics. There is a detailed manual on WRDS, so we only cover the necessary subjects here. FISD data comes in two main variants, namely, centered on issuers or issues. In either case, the most useful identifiers are CUSIPs. 9-digit CUSIPs identify securities issued by issuers. The issuers can be identified from the first six digits of a security CUSIP, which is also called 6-digit CUSIP. Both stocks and bonds have CUSIPs. This connection would, in principle, allow matching them easily, but due to changing issuer details, this approach only yields small coverage.

We use the issue-centered version of FISD to identify the subset of US corporate bonds that meet the standard criteria (Bessembinder, Maxwell, and Venkataraman 2006). The WRDS table fisd_mergedissue contains most of the information we need on a 9-digit CUSIP level. Due to the diversity of corporate bonds, details in the indenture vary significantly. We focus on common bonds that make up the majority of trading volume in this market without diverging too much in indentures.

The following chunk connects to the data and selects the bond sample to remove certain bond types that are less commonly (see, e.g., Dick-Nielsen, Feldhütter, and Lando 2012; O’Hara and Zhou 2021, among many others).

mergent <- tbl(
  wrds,
  in_schema("fisd", "fisd_mergedissue")
) |>
  filter(
    security_level == "SEN", # senior bonds
    slob == "N", # secured lease obligation
    is.na(security_pledge), # unsecured bonds
    asset_backed == "N", # not asset backed
    defeased == "N", # not defeased
    bond_type %in% c(
      "CDEB", # US Corporate Debentures
      "CMTN", # US Corporate MTN (Medium Term Note)
      "CMTZ", # US Corporate MTN Zero
      "CZ", # US Corporate Zero,
      "USBN" # US Corporate Bank Note
    ), 
    pay_in_kind != "Y", # not payable in kind
    yankee == "N", # no foreign issuer
    canadian == "N", # not Canadian
    foreign_currency == "N", # USD
    coupon_type %in% c(
      "F", # fixed coupon
      "Z" # zero coupon
    ), 
    is.na(fix_frequency),
    coupon_change_indicator == "N",
    interest_frequency %in% c(
      "0", # per year
      "1",
      "2",
      "4",
      "12"
    ),
    rule_144a == "N", # publicly traded
    private_placement == "N",
    defaulted == "N", # not defaulted
    is.na(filing_date),
    is.na(settlement),
    convertible == "N", # not convertible
    is.na(exchange),
    putable == "N", # not putable
    unit_deal == "N", # not issued with another security
    exchangeable == "N", # not exchangeable
    perpetual == "N", # not perpetual
    preferred_security == "N" # not preferred
  ) |> 
  select(
    complete_cusip, maturity,
    offering_amt, offering_date,
    dated_date, 
    interest_frequency, coupon,
    last_interest_date, 
    issue_id, issuer_id
  ) |>
  collect()

We also pull issuer information from fisd_mergedissuer regarding the industry and country of the firm that issued a particular bond. Then, we filter to include only US-domiciled firms’ bonds. We match the data by issuer_id.

mergent_issuer <- tbl(wrds, in_schema("fisd", "fisd_mergedissuer")) |>
  select(issuer_id, sic_code, country_domicile) |>
  collect()

mergent <- mergent |>
  inner_join(mergent_issuer, by = "issuer_id") |>
  filter(country_domicile == "USA") |>
  select(-country_domicile)

Finally, we save the bond characteristics to our local database. This selection of bonds also constitutes the sample for which we will collect trade reports from TRACE below.

  dbWriteTable(
    conn = tidy_finance,
    name = "mergent",
    value = mergent,
    overwrite = TRUE
  )

The FISD database also contains other data. The issue-based file contains information on covenants, i.e., restrictions included in bond indentures to limit specific actions by firms (e.g., Handler, Jankowitsch, and Weiss 2021). Moreover, FISD also provides information on bond ratings. We do not need either here.

4.3 TRACE

The Financial Industry Regulatory Authority (FINRA) provides the Trade Reporting and Compliance Engine (TRACE). In TRACE, dealers that trade corporate bonds must report such trades individually. Hence, we observe trade messages in TRACE that contain information on the bond traded, the trade time, price, and volume. TRACE comes in two variants; standard and enhanced TRACE. We show how to download and clean enhanced TRACE as it contains uncapped volume, a crucial quantity missing in the standard distribution. Moreover, enhanced TRACE also provides information on the respective parties’ roles and the direction of the trade report. These items become essential in cleaning the messages.

Why do we repeatedly talk about cleaning TRACE? Trade messages are submitted within a short time window after a trade is executed (less than 15 minutes). These messages can contain errors, and the reporters subsequently correct them or they cancel a trade altogether. The cleaning needs are described by Dick-Nielsen (2009) in detail, and Dick-Nielsen (2014) shows how to clean the enhanced TRACE data using SAS. We do not go into the cleaning steps here, since the code is lengthy and serves no educational purpose. However, downloading and cleaning enhanced TRACE data is straightforward with our setup.

We store code for cleaning enhanced TRACE with R on the following Github gist. as a function. The appendix also contains the code for reference. We only need to source the code from the gist, which we can do with source_gist(). Alternatively, you can also go to the gist, download it, and source() the respective R-file. The clean_enhanced_trace() function takes a vector of CUSIPs, a connection to WRDS explained in Chapter 3, and a start and end date, respectively.

source_gist("3a05b3ab281563b2e94858451c2eb3a4")

The TRACE database is considerably large. Therefore, we only download subsets of data at once. Specifying too many CUSIPs over a long time horizon will result in very long download times and a potential failure due to the size of the request to WRDS. The size limit depends on many parameters, and we cannot give you a guideline here. If we were working with the complete TRACE data for all CUSIPs above, splitting the data into 100 parts takes roughly two hours using our setup. For the applications in this book, we need data around the Paris Agreement in December 2015 and download the data in ten sets, which we define below.

mergent_cusips <- mergent |>
  pull(complete_cusip)

mergent_parts <- split(
  mergent_cusips,
  rep(1:10, 
      length.out = length(mergent_cusips))
)

Finally, we run a loop in the same style as in Chapter 3 where we download daily returns from CRSP. For each of the CUSIP sets defined above, we call the cleaning function and save the resulting output. We add new data to the existing table for batch two and all following batches.

for (j in 1:length(mergent_parts)) {
  trace_enhanced <- clean_enhanced_trace(
    cusips = mergent_parts[[j]],
    connection = wrds,
    start_date = ymd("2014-01-01"),
    end_date = ymd("2016-11-30")
  )

  dbWriteTable(
      conn = tidy_finance,
      name = "trace_enhanced",
      value = trace_enhanced,
      overwrite = ifelse(j == 1, TRUE, FALSE),
      append = ifelse(j != 1, TRUE, FALSE)
    )

}

4.4 Insights into corporate bonds

While many news outlets readily provide information on stocks and the underlying firms, corporate bonds are not covered frequently. Additionally, the TRACE database contains trade-level information, potentially new to students. Therefore, we provide you with some insights by showing some summary statistics.

We start by looking into the number of bonds outstanding over time and compare it to the number of bonds traded in our sample. First, we compute the number of bonds outstanding for each quarter around the Paris Agreement from 2014 to 2016.

bonds_outstanding <- expand_grid("date" = seq(ymd("2014-01-01"),
                                              ymd("2016-11-30"), 
                                              by = "quarter"), 
                                 "complete_cusip" = mergent$complete_cusip) |> 
  left_join(mergent |> select(complete_cusip, 
                              offering_date,
                              maturity), 
            by = "complete_cusip") |> 
  mutate(offering_date = floor_date(offering_date),
         maturity = floor_date(maturity)) |> 
  filter(date >= offering_date & date <= maturity) |> 
  count(date) |> 
  mutate(type = "outstanding")

Next, we look at the bonds traded each quarter in the same period. Notice that we load the complete trace table from our database, as we only have a single part of it in the environment from the download loop from above.

trace_enhanced <- tbl(tidy_finance, "trace_enhanced") |>
  collect()

bonds_traded <- trace_enhanced |> 
  mutate(date = floor_date(trd_exctn_dt, "quarters")) |> 
  group_by(date) |> 
  summarize(n = length(unique(cusip_id)),
            type = "traded",
            .groups = "drop") 

Finally, we plot the two time series in the figure below.

bonds_outstanding |> 
  bind_rows(bonds_traded) |> 
  ggplot(aes(x = date, y = n, color = type, linetype = type)) +
  geom_line() +
  labs(
    x = NULL, y = NULL, color = NULL, linetype = NULL,
    title = "Number of bonds outstanding and traded each quarter"
  )
Title: Number of bonds outstanding and traded each quarter. The figure shows a time series of outstanding bonds and bonds traded. The amount outstanding increases monotonically between 2014 and 2016. The number of bonds traded represents only a fraction of roughly 60%, which peaks around the third quarter of 2016.

Figure 4.1: The number of corporate bonds outstanding each quarter as reported by Mergent FISD and the number of traded bonds from enhanced TRACE between 2014 and end of 2016.

We see that the number of bonds outstanding increases steadily between 2014 and 2016. During our sample period of trade data, we see that the fraction of bonds trading each quarter is roughly 60%. The relatively small number of traded bonds means that many bonds do not trade through an entire quarter. This lack of trading activity illustrates the generally low level of liquidity in the corporate bond market, where it can be hard to trade specific bonds. Does this lack of liquidity mean that corporate bond markets are irrelevant in terms of their size? With over 7,500 traded bonds each quarter, it is hard to say that the market is small. However, let us also investigate the characteristics of issued corporate bonds. In particular, we consider maturity (in years), coupon, and offering amount (in million USD).

mergent |>
  mutate(maturity = as.numeric(maturity - offering_date) / 365,
         offering_amt = offering_amt / 10^3) |> 
  pivot_longer(cols = c(maturity, coupon, offering_amt),
               names_to = "measure") |>
  drop_na() |> 
  group_by(measure) |>
  summarize(
    mean = mean(value),
    sd = sd(value),
    min = min(value),
    q05 = quantile(value, 0.05),
    q50 = quantile(value, 0.50),
    q95 = quantile(value, 0.95),
    max = max(value)
  )
# A tibble: 3 × 8
  measure        mean     sd   min   q05    q50     q95    max
  <chr>         <dbl>  <dbl> <dbl> <dbl>  <dbl>   <dbl>  <dbl>
1 coupon         6.21   2.43 0     2.25    6.38    9.86    39 
2 maturity      10.1    9.13 0.252 1.26    8.02   30.0    100.
3 offering_amt 357.   545.   0.001 0.875 200    1250    15000 

We see that the average bond in our sample period has an offering amount of over 357 million USD with a median of 200 million USD, which both cannot be considered small. The average bond has a maturity of 10 years and pays around 6% in coupons.

Finally, let us compute some summary statistics for the trades in this market. To this end, we show a summary based on aggregate information daily. In particular, we consider the trade size (in million USD) and the number of trades.

trace_enhanced |> 
  group_by(trd_exctn_dt) |> 
  summarize(trade_size = sum(entrd_vol_qt * rptd_pr / 100) / 10^6,
            trade_number = n(),
            .groups = "drop") |> 
  pivot_longer(cols = c(trade_size, trade_number),
               names_to = "measure") |> 
  group_by(measure) |>
  summarize(
    mean = mean(value),
    sd = sd(value),
    min = min(value),
    q05 = quantile(value, 0.05),
    q50 = quantile(value, 0.50),
    q95 = quantile(value, 0.95),
    max = max(value)
  )
# A tibble: 2 × 8
  measure        mean    sd   min    q05    q50    q95    max
  <chr>         <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 trade_number 25921. 5460. 438   17851. 26025  34458. 40889 
2 trade_size   12968. 3574.  17.2  6138. 13408. 17851. 20905.

On average, nearly 26 billion USD of corporate bonds are traded daily in nearly 13,000 transactions. We can hence conclude that the corporate bond market is indeed significant in terms of trading volume and activity.

4.5 Exercises

  1. Summarize the amount outstanding of all bonds over time and describe the resulting graph.
  2. Compute the number of days each bond is traded (accounting for the bonds maturity and issuance). Start by looking at the number of bonds traded each day in a graph similar to the one above. How many bonds trade on more than 75% of trading days?
  3. WRDS provides more information from Mergent FISD. In particular, they also provide rating information in fisd_ratings. Download the ratings for the bond sample. Then, plot the distribution of ratings in a histogram.
  4. Download the TRACE data until the end of September 2021. Hint: you want to download the data completely new, rather than adding the new information. Can you find a reason why?