You are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.

In this chapter, we dive into the US corporate bond market. Bond markets are far more diverse than stock markets, as most issuers have multiple bonds outstanding simultaneously with potentially very different indentures. This market segment is exciting due to its size (roughly 10 trillion USD outstanding), heterogeneity of issuers (as opposed to government bonds), market structure (mostly over-the-counter trades), and data availability. We introduce how to use bond characteristics from FISD and trade reports from TRACE and provide code to download and clean TRACE in R.

Many researchers study liquidity in the US corporate bond market O’Hara and Zhou (2021). We do not cover bond returns here, but you can compute them from TRACE data. Instead, we refer to studies on the topic such as Bessembinder et al. (2008), Bai, Bali, and Wen (2019), and Kelly, Palhares, and Pruitt (2021) and a survey by Huang and Shi (2021). Moreover, WRDS includes bond returns computed from TRACE data at a monthly frequency.

The current chapter relies on this set of R packages.


Compared to previous chapters, we load the devtools package (Wickham et al. 2022) to source code that we provided to the public via gist.

Bond Data from WRDS

Both bond databases we need are available on WRDS to which we establish the RPostgres connection described in WRDS, CRSP, and Compustat. Additionally, we connect to our local SQLite-database to store the data we download.

wrds <- dbConnect(
  host = "",
  dbname = "wrds",
  port = 9737,
  sslmode = "require",
  user = Sys.getenv("WRDS_USER"),
  password = Sys.getenv("WRDS_PASSWORD")

tidy_finance <- dbConnect(
  extended_types = TRUE

Mergent FISD

For research on US corporate bonds, the Mergent Fixed Income Securities Database (FISD) is the primary resource for bond characteristics. There is a detailed manual on WRDS, so we only cover the necessary subjects here. FISD data comes in two main variants, namely, centered on issuers or issues. In either case, the most useful identifiers are CUSIPs. 9-digit CUSIPs identify securities issued by issuers. The issuers can be identified from the first six digits of a security CUSIP, which is also called 6-digit CUSIP. Both stocks and bonds have CUSIPs. This connection would, in principle, allow matching them easily, but due to changing issuer details, this approach only yields small coverage.

We use the issue-centered version of FISD to identify the subset of US corporate bonds that meet the standard criteria (Bessembinder, Maxwell, and Venkataraman 2006). The WRDS table fisd_mergedissue contains most of the information we need on a 9-digit CUSIP level. Due to the diversity of corporate bonds, details in the indenture vary significantly. We focus on common bonds that make up the majority of trading volume in this market without diverging too much in indentures.

The following chunk connects to the data and selects the bond sample to remove certain bond types that are less commonly used (see, e.g., Dick-Nielsen, Feldhütter, and Lando 2012; O’Hara and Zhou 2021, among many others). In particular, we use the filters listed below. Note that we also treat missing values in these flags.

  1. Keep only senior bonds (security_level = 'SEN').
  2. Exclude bonds which are secured lease obligations (slob = 'N' OR slob IS NULL).
  3. Exclude secured bonds (security_pledge IS NULL).
  4. Exclude asset-backed bonds (asset_backed = 'N' OR asset_backed IS NULL).
  5. Exclude defeased bonds ((defeased = 'N' OR defeased IS NULL) AND defeased_date IS NULL).
  6. Keep only the bond types US Corporate Debentures ('CDEB'), US Corporate Medium Term Notes ('CMTN'), US Corporate Zero Coupon Notes and Bonds ('CMTZ', 'CZ'), and US Corporate Bank Note ('USBN').
  7. Exclude bonds that are payable in kind ((pay_in_kind != 'Y' OR pay_in_kind IS NULL) AND pay_in_kind_exp_date IS NULL).
  8. Exclude foreign (yankee == "N" OR and Canadian issuers (canadian = 'N' OR canadian IS NULL).
  9. Exclude bonds denominated in foreign currency (foreign_currency = 'N').
  10. Keep only fixed (F) and zero (Z) coupon bonds with additional requirements of fix_frequency IS NULL, coupon_change_indicator = 'N' and annual, semi-annual, quarterly, or monthly interest frequencies.
  11. Exclude bonds that were issued under SEC Rule 144A (rule_144a = 'N').
  12. Exlcude privately placed bonds (private_placement = 'N' OR private_placement IS NULL).
  13. Exclude defaulted bonds (defaulted = 'N' AND filing_date IS NULL AND settlement IS NULL).
  14. Exclude convertible (convertible = 'N'), putable (putable = 'N' OR putable IS NULL), exchangeable (exchangeable = 'N' OR exchangeable IS NULL), perpetual (perpetual = 'N'), or preferred bonds (preferred_security = 'N' OR preferred_security IS NULL).
  15. Exclude unit deal bonds ((unit_deal = 'N' OR unit_deal IS NULL)).
fisd_mergedissue_db <- tbl(wrds, I("fisd.fisd_mergedissue"))

fisd <- fisd_mergedissue_db |>
    security_level == "SEN",
    slob == "N" |,,
    asset_backed == "N" |,
    defeased == "N" |,,
    bond_type %in% c(
    pay_in_kind != "Y" |,,
    yankee == "N" |,
    canadian == "N" |,
    foreign_currency == "N",
    coupon_type %in% c(
    coupon_change_indicator == "N",
    interest_frequency %in% c(
    rule_144a == "N",
    private_placement == "N" |,
    defaulted == "N",,,
    convertible == "N",,
    putable == "N" |,
    unit_deal == "N" |,
    exchangeable == "N" |,
    perpetual == "N",
    preferred_security == "N" |
  ) |> 
    complete_cusip, maturity,
    offering_amt, offering_date,
    interest_frequency, coupon,
    issue_id, issuer_id
  ) |>

We also pull issuer information from fisd_mergedissuer regarding the industry and country of the firm that issued a particular bond. Then, we filter to include only US-domiciled firms’ bonds. We match the data by issuer_id.

fisd_mergedissuer_db <- tbl(wrds, I("fisd.fisd_mergedissuer")) 

fisd_issuer <- fisd_mergedissuer_db |>
  select(issuer_id, sic_code, country_domicile) |>

fisd <- fisd |>
  inner_join(fisd_issuer, join_by(issuer_id)) |>
  filter(country_domicile == "USA") |>

Finally, we save the bond characteristics to our local database. This selection of bonds also constitutes the sample for which we will collect trade reports from TRACE below.

  conn = tidy_finance,
  name = "fisd",
  value = fisd,
  overwrite = TRUE

The FISD database also contains other data. The issue-based file contains information on covenants, i.e., restrictions included in bond indentures to limit specific actions by firms (e.g., Handler, Jankowitsch, and Weiss 2021). Moreover, FISD also provides information on bond ratings. We do not need either here.


The Financial Industry Regulatory Authority (FINRA) provides the Trade Reporting and Compliance Engine (TRACE). In TRACE, dealers that trade corporate bonds must report such trades individually. Hence, we observe trade messages in TRACE that contain information on the bond traded, the trade time, price, and volume. TRACE comes in two variants: standard and enhanced TRACE. We show how to download and clean enhanced TRACE as it contains uncapped volume, a crucial quantity missing in the standard distribution. Moreover, enhanced TRACE also provides information on the respective parties’ roles and the direction of the trade report. These items become essential in cleaning the messages.

Why do we repeatedly talk about cleaning TRACE? Trade messages are submitted within a short time window after a trade is executed (less than 15 minutes). These messages can contain errors, and the reporters subsequently correct them or they cancel a trade altogether. The cleaning needs are described by Dick-Nielsen (2009) in detail, and Dick-Nielsen (2014) shows how to clean the enhanced TRACE data using SAS. We do not go into the cleaning steps here, since the code is lengthy and serves no educational purpose. However, downloading and cleaning enhanced TRACE data is straightforward with our setup.

We store code for cleaning enhanced TRACE with R on the following GitHub gist. as a function. The appendix also contains the code for reference. We only need to source the code from the gist, which we can do with source_gist(). Alternatively, you can also go to the gist, download it, and source() the respective R-file. The clean_enhanced_trace() function takes a vector of CUSIPs, a connection to WRDS explained in WRDS, CRSP, and Compustat, and a start and end date, respectively.


The TRACE database is considerably large. Therefore, we only download subsets of data at once. Specifying too many CUSIPs over a long time horizon will result in very long download times and a potential failure due to the size of the request to WRDS. The size limit depends on many parameters, and we cannot give you a guideline here. If we were working with the complete TRACE data for all CUSIPs above, splitting the data into 100 parts takes roughly two hours using our setup. For the applications in this book, we need data around the Paris Agreement in December 2015 and download the data in ten sets, which we define below.

fisd_cusips <- fisd |>

fisd_parts <- split(
      length.out = length(fisd_cusips))

Finally, we run a loop in the same style as in WRDS, CRSP, and Compustat where we download daily returns from CRSP. For each of the CUSIP sets defined above, we call the cleaning function and save the resulting output. We add new data to the existing table for batch two and all following batches.

batches <- length(fisd_parts)

for (j in 1:batches) {
  trace_enhanced <- clean_enhanced_trace(
    cusips = fisd_parts[[j]],
    connection = wrds,
    start_date = ymd("2014-01-01"),
    end_date = ymd("2016-11-30")

    conn = tidy_finance,
    name = "trace_enhanced",
    value = trace_enhanced,
    overwrite = ifelse(j == 1, TRUE, FALSE),
    append = ifelse(j != 1, TRUE, FALSE)
  cat("Batch", j, "out of", batches, "done (", 
      round(j / batches, 2) * 100, "%)\n")

Insights into Corporate Bonds

While many news outlets readily provide information on stocks and the underlying firms, corporate bonds are not covered frequently. Additionally, the TRACE database contains trade-level information, potentially new to students. Therefore, we provide you with some insights by showing some summary statistics.

We start by looking into the number of bonds outstanding over time and compare it to the number of bonds traded in our sample. First, we compute the number of bonds outstanding for each quarter around the Paris Agreement from 2014 to 2016.

bonds_outstanding <- expand_grid("date" = seq(ymd("2014-01-01"),
                                              by = "quarter"), 
                                 "complete_cusip" = fisd$complete_cusip) |> 
  left_join(fisd |> select(complete_cusip, 
            join_by(complete_cusip)) |> 
  mutate(offering_date = floor_date(offering_date),
         maturity = floor_date(maturity)) |> 
  filter(date >= offering_date & date <= maturity) |> 
  count(date) |> 
  mutate(type = "Outstanding")

Next, we look at the bonds traded each quarter in the same period. Notice that we load the complete trace table from our database, as we only have a single part of it in the environment from the download loop above.

trace_enhanced <- tbl(tidy_finance, "trace_enhanced") |>

bonds_traded <- trace_enhanced |> 
  mutate(date = floor_date(trd_exctn_dt, "quarters")) |> 
  group_by(date) |> 
  summarize(n = length(unique(cusip_id)),
            type = "Traded",
            .groups = "drop") 

Finally, we plot the two time series in Figure 1.

bonds_outstanding |> 
  bind_rows(bonds_traded) |> 
    x = date, 
    y = n, 
    color = type, 
    linetype = type
  )) +
  geom_line() +
    x = NULL, y = NULL, color = NULL, linetype = NULL,
    title = "Number of bonds outstanding and traded each quarter"
Title: Number of bonds outstanding and traded each quarter. The figure shows a time series of outstanding bonds and bonds traded. The amount outstanding increases monotonically between 2014 and 2016. The number of bonds traded represents only a fraction of roughly 60 percent, which peaks around the third quarter of 2016.
Figure 1: The number of corporate bonds outstanding each quarter as reported by Mergent FISD and the number of traded bonds from enhanced TRACE between 2014 and end of 2016.

We see that the number of bonds outstanding increases steadily between 2014 and 2016. During our sample period of trade data, we see that the fraction of bonds trading each quarter is roughly 60 percent. The relatively small number of traded bonds means that many bonds do not trade through an entire quarter. This lack of trading activity illustrates the generally low level of liquidity in the corporate bond market, where it can be hard to trade specific bonds. Does this lack of liquidity mean that corporate bond markets are irrelevant in terms of their size? With over 7,500 traded bonds each quarter, it is hard to say that the market is small. However, let us also investigate the characteristics of issued corporate bonds. In particular, we consider maturity (in years), coupon, and offering amount (in million USD).

fisd |>
  mutate(maturity = as.numeric(maturity - offering_date) / 365,
         offering_amt = offering_amt / 10^3) |> 
  pivot_longer(cols = c(maturity, coupon, offering_amt),
               names_to = "measure") |>
  drop_na() |> 
  group_by(measure) |>
    mean = mean(value),
    sd = sd(value),
    min = min(value),
    q05 = quantile(value, 0.05),
    q50 = quantile(value, 0.50),
    q95 = quantile(value, 0.95),
    max = max(value)
# A tibble: 3 × 8
  measure        mean     sd    min   q05    q50    q95    max
  <chr>         <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 coupon         5.94   2.67  0     1.90    6.06   10      39 
2 maturity       9.74   9.21 -6.24  1.03    7.18   30.0   101.
3 offering_amt 379.   570.    0.001 0.669 200    1400   15000 

We see that the average bond in our sample period has an offering amount of over 357 million USD with a median of 200 million USD, which both cannot be considered small. The average bond has a maturity of 10 years and pays around 6 percent in coupons.

Finally, let us compute some summary statistics for the trades in this market. To this end, we show a summary based on aggregate information daily. In particular, we consider the trade size (in million USD) and the number of trades.

trace_enhanced |> 
  group_by(trd_exctn_dt) |> 
  summarize(trade_size = sum(entrd_vol_qt * rptd_pr / 100) / 10^6,
            trade_number = n(),
            .groups = "drop") |> 
  pivot_longer(cols = c(trade_size, trade_number),
               names_to = "measure") |> 
  group_by(measure) |>
    mean = mean(value),
    sd = sd(value),
    min = min(value),
    q05 = quantile(value, 0.05),
    q50 = quantile(value, 0.50),
    q95 = quantile(value, 0.95),
    max = max(value)
# A tibble: 2 × 8
  measure        mean    sd   min    q05    q50    q95    max
  <chr>         <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 trade_number 25914. 5444. 439   17844. 26051  34383. 40839 
2 trade_size   12968. 3577.  17.2  6128. 13421. 17850. 21312.

On average, nearly 26 billion USD of corporate bonds are traded daily in nearly 13,000 transactions. We can hence conclude that the corporate bond market is indeed significant in terms of trading volume and activity.


  1. Compute the amount outstanding across all bonds over time. Make sure to subtract all matured bonds. How would you describe the resulting plot?
  2. Compute the number of days each bond is traded (accounting for the bonds’ maturities and issuances). Start by looking at the number of bonds traded each day in a graph similar to the one above. How many bonds trade on more than 75 percent of trading days?
  3. WRDS provides more information from Mergent FISD such as ratings in the table fisd_ratings. Download the ratings table and plot the distribution of ratings for the different rating providers. How would you map the different providers to a common numeric rating scale?


Bai, Jennie, Turan G Bali, and Quan Wen. 2019. Common risk factors in the cross-section of corporate bond returns.” Journal of Financial Economics 131 (3): 619–42.
Bessembinder, Hendrik, Kathleen M Kahle, William F Maxwell, and Danielle Xu. 2008. Measuring abnormal bond performance.” Review of Financial Studies 22 (10): 4219–58.
Bessembinder, Hendrik, William Maxwell, and Kumar Venkataraman. 2006. Market transparency, liquidity externalities, and institutional trading costs in corporate bonds.” Journal of Financial Economics 82 (2): 251–88.
Dick-Nielsen, Jens. 2009. Liquidity biases in TRACE.” The Journal of Fixed Income 19 (2): 43–55.
———. 2014. How to clean enhanced TRACE data.” Working Paper.
Dick-Nielsen, Jens, Peter Feldhütter, and David Lando. 2012. Corporate bond liquidity before and after the onset of the subprime crisis.” Journal of Financial Economics 103 (3): 471–92.
Edwards, Amy K., Lawrence E. Harris, and Michael S. Piwowar. 2007. Corporate bond market transaction costs and transparency.” The Journal of Finance 62 (3): 1421–51.
Handler, Lukas, Rainer Jankowitsch, and Patrick Weiss. 2021. Covenant prices of US corporate bonds.” Working Paper.
Huang, Jing-Zhi, and Zhan Shi. 2021. What do we know about corporate bond returns? Annual Review of Financial Economics 13 (1): 363–99.
Kelly, Bryan T., Diogo Palhares, and Seth Pruitt. 2021. Modeling corporate bond returns.” Working Paper.
O’Hara, Maureen, and Xing Alex Zhou. 2021. Anatomy of a liquidity crisis: Corporate bonds in the COVID-19 crisis.” Journal of Financial Economics 142 (1): 46–68.
Wickham, Hadley, Jim Hester, Winston Chang, and Jennifer Bryan. 2022. devtools: Tools to make developing R packages easier.