Difference in Differences

Note

You are reading the work-in-progress edition of Tidy Finance with Python. Code chunks and text might change over the next couple of months. We are always looking for feedback via contact@tidy-finance.org. Meanwhile, you can find the complete R version here.

In this chapter, we illustrate the concept of difference in differences (DD) estimators by evaluating the effects of climate change regulation on the pricing of bonds across firms. DD estimators are typically used to recover the treatment effects of natural or quasi-natural experiments that trigger sharp changes in the environment of a specific group. Instead of looking at differences in just one group (e.g., the effect in the treated group), DD investigates the treatment effects by looking at the difference between differences in two groups. Such experiments are usually exploited to address endogeneity concerns (e.g., Roberts and Whited 2013). The identifying assumption is that the outcome variable would change equally in both groups without the treatment. This assumption is also often referred to as the assumption of parallel trends. Moreover, we would ideally also want a random assignment to the treatment and control groups. Due to lobbying or other activities, this randomness is often violated in (financial) economics.

In the context of our setting, we investigate the impact of the Paris Agreement (PA), signed on December 12, 2015, on the bond yields of polluting firms. We first estimate the treatment effect of the agreement using panel regression techniques that we discuss in the Chapter 12. We then present two methods to illustrate the treatment effect over time graphically. Although we demonstrate that the treatment effect of the agreement is anticipated by bond market participants well in advance, the techniques we present below can also be applied to many other settings.

The approach we use here replicates the results of Seltzer, Starks, and Zhu (2022) partly. Specifically, we borrow their industry definitions for grouping firms into green and brown types. Overall, the literature on ESG effects in corporate bond markets is already large but continues to grow (for recent examples, see, e.g., Halling, Yu, and Zechner (2021), Handler, Jankowitsch, and Pasler (2022), Huynh and Xia (2021), among many others).

The current chapter relies on this set of packages.

import pandas as pd
import numpy as np
import sqlite3

import linearmodels as lm
import statsmodels.formula.api as smf
from scipy.stats import norm

from plotnine import *

Data Preparation

We use TRACE and Mergent FISD as data sources from our SQLite-database introduced in Chapters 2-4.

tidy_finance = sqlite3.connect("data/tidy_finance.sqlite")

mergent = (pd.read_sql_query(
  sql="SELECT complete_cusip, maturity, offering_amt, sic_code FROM mergent",
  con=tidy_finance,
  parse_dates={"maturity": {"unit": "D", "origin": "unix"}}
  )
  .dropna()
)

trace_enhanced = (pd.read_sql_query(
  sql="SELECT cusip_id, trd_exctn_dt, rptd_pr, entrd_vol_qt, yld_pt FROM trace_enhanced",
  con=tidy_finance,
  parse_dates={"trd_exctn_dt": {"unit": "D", "origin": "unix"}}
  )
  .dropna()
)

We start our analysis by preparing the sample of bonds. We only consider bonds with a time to maturity of more than one year to the signing of the PA, so that we have sufficient data to analyze the yield behavior after the treatment date. This restriction also excludes all bonds issued after the agreement. We also consider only the first two digits of the SIC industry code to identify the polluting industries (in line with Seltzer, Starks, and Zhu 2022).

treatment_date = pd.to_datetime("2015-12-12")
polluting_industries = [
  49, 13, 45, 29, 28, 33, 40, 20,
  26, 42, 10, 53, 32, 99, 37
]

bonds = (mergent
  .query("offering_amt > 0")
  .assign(
    time_to_maturity = lambda x: (x["maturity"]-treatment_date).dt.days / 365,
    sic_code = lambda x: x["sic_code"].astype(str).str[:2].astype(int),
    log_offering_amt = lambda x: np.log(x["offering_amt"])
  )
  .query("time_to_maturity >= 1")
  .rename(columns={"complete_cusip": "cusip_id"})
  .get(["cusip_id", "time_to_maturity", "log_offering_amt", "sic_code"])
  .assign(
    polluter = lambda x: x["sic_code"].isin(polluting_industries)
  )
  .reset_index(drop=True)
)

Next, we aggregate the individual transactions as reported in TRACE to a monthly panel of bond yields. We consider bond yields for a bond’s last trading day in a month. Therefore, we first aggregate bond data to daily frequency and apply common restrictions from the literature (see, e.g., Bessembinder et al. 2008). We weigh each transaction by volume to reflect a trade’s relative importance and avoid emphasizing small trades. Moreover, we only consider transactions with reported prices rptd_pr larger than 25 (to exclude bonds that are close to default) and only bond-day observations with more than five trades on a corresponding day (to exclude prices based on too few, potentially non-representative transactions).

trace_aggregated = (trace_enhanced
  .query("rptd_pr > 25")
  .groupby(["cusip_id", "trd_exctn_dt"])
  .aggregate(
    avg_yield = ("yld_pt", lambda x: np.average(x, weights=trace_enhanced
                                     .loc[x.index, "entrd_vol_qt"]
                                     * trace_enhanced
                                     .loc[x.index, "rptd_pr"])),
    trades = ("rptd_pr", "count")
  )
  .reset_index()
  .dropna(subset=["avg_yield"])
  .query("trades >= 5")
  .assign(
    trd_exctn_dt = lambda x: pd.to_datetime(x["trd_exctn_dt"])
  )
  .assign(
    month = lambda x: x["trd_exctn_dt"] - pd.tseries.offsets.MonthBegin()
  )
  .groupby(["cusip_id", "month"])
  .apply(lambda x: x[x["trd_exctn_dt"] == x["trd_exctn_dt"].max()])
  .reset_index(drop=True)
  .get(["cusip_id", "month", "avg_yield"])
)

By combining the bond-specific information from Mergent FISD for our bond sample with the aggregated TRACE data, we arrive at the main sample for our analysis.

bonds_panel = (bonds
  .merge(trace_aggregated, how="inner", on="cusip_id")
  .dropna()
)

Before we can run the first regression, we need to define the treated indicator, which is the product of the post_period (i.e., all months after the signing of the PA) and the polluter indicator defined above.

bonds_panel = (bonds_panel
  .assign(post_period = lambda x:x["month"] >= (treatment_date- pd.tseries.offsets.MonthBegin()))
  .assign(treated = lambda x: x["polluter"] & x["post_period"])
  .assign(month_cat = lambda x: pd.Categorical(x["month"], ordered=True))
)

As usual, we tabulate summary statistics of the variables that enter the regression to check the validity of our variable definitions.

bonds_panel_summary = (bonds_panel
  .melt(var_name="measure",
        value_vars=["avg_yield", "time_to_maturity", "log_offering_amt"]
  )
  .groupby("measure")
  .describe(percentiles=[0.05, 0.5, 0.95])
)
bonds_panel_summary                
value
count mean std min 5% 50% 95% max
measure
avg_yield 127452.0 4.079181 4.213431 0.059488 1.268195 3.374052 8.081803 127.968831
log_offering_amt 127452.0 13.274360 0.823694 4.644391 12.206073 13.217674 14.508658 16.523561
time_to_maturity 127452.0 8.543981 8.410348 1.005479 1.501370 5.808219 27.405479 100.704110

Panel Regressions

The PA is a legally binding international treaty on climate change. It was adopted by 196 Parties at COP 21 in Paris on 12 December 2015 and entered into force on 4 November 2016. The PA obliges developed countries to support efforts to build clean, climate-resilient futures. One may thus hypothesize that adopting climate-related policies may affect financial markets. To measure the magnitude of this effect, we first run an OLS regression without fixed effects where we include the treated, post_period, and polluter dummies, as well as the bond-specific characteristics log_offering_amt and time_to_maturity. This simple model assumes that there are essentially two periods (before and after the PA) and two groups (polluters and non-polluters). Nonetheless, it should indicate whether polluters have higher yields following the PA compared to non-polluters.

The second model follows the typical DD regression approach by including individual (cusip_id) and time (month) fixed effects. In this model, we do not include any other variables from the simple model because the fixed effects subsume them, and we observe the coefficient of our main variable of interest: treated.

model_without_fe = lm.PanelOLS.from_formula(
    formula=("avg_yield ~ treated + post_period + polluter "
             "+ log_offering_amt + time_to_maturity + 1"),
    data=bonds_panel.set_index(["cusip_id", "month"]),
).fit()

model_with_fe = lm.PanelOLS.from_formula(
    formula="avg_yield ~ treated + EntityEffects + TimeEffects",
    data=bonds_panel.set_index(["cusip_id", "month"]),
).fit()

comparison = lm.panel.results.compare([model_without_fe, model_with_fe])
comparison
Model Comparison
Model 0 Model 1
Dep. Variable avg_yield avg_yield
Estimator PanelOLS PanelOLS
No. Observations 127452 127452
Cov. Est. Unadjusted Unadjusted
R-squared 0.0315 0.0070
R-Squared (Within) 0.0044 0.0117
R-Squared (Between) -1.364e-05 0.0474
R-Squared (Overall) 0.0315 0.0339
F-statistic 829.03 851.76
P-value (F-stat) 0.0000 0.0000
===================== ============ ============
treated 0.4521 0.9720
(9.1035) (29.185)
post_period -0.1780
(-6.0541)
polluter 0.4798
(15.215)
log_offering_amt -0.5448
(-38.564)
time_to_maturity 0.0572
(41.257)
Intercept 10.662
(56.636)
======================= ============== ==============
Effects Entity
Time


T-stats reported in parentheses
id: 0x28b83d2fb50

Both models indicate that polluters have significantly higher yields after the PA than non-polluting firms. Note that the magnitude of the treated coefficient varies considerably across models.

Exercises

  1. The 46th President of the US rejoined the Paris Agreement in February 2021. Repeat the difference in differences analysis for the day of his election victory. Note that you will also have to download new TRACE data. How did polluters’ yields react to this action?
  2. Based on the exercise on ratings in Chapter 4, include ratings as a control variable in the analysis above. Do the results change?

References

Bessembinder, Hendrik, Kathleen M Kahle, William F Maxwell, and Danielle Xu. 2008. Measuring abnormal bond performance.” Review of Financial Studies 22 (10): 4219–58. https://doi.org/10.1093/rfs/hhn105.
Halling, Michael, Jin Yu, and Josef Zechner. 2021. “Primary Corporate Bond Markets and Social Responsibility.” Working Paper. https://dx.doi.org/10.2139/ssrn.3681666.
Handler, Lukas, Rainer Jankowitsch, and Alexander Pasler. 2022. “The Effects of ESG Performance and Preferences on US Corporate Bond Prices.” Working Paper. https://dx.doi.org/10.2139/ssrn.4099566.
Huynh, Thanh D., and Ying Xia. 2021. “Climate Change News Risk and Corporate Bond Returns.” Journal of Financial and Quantitative Analysis 56 (6): 1985–2009. https://doi.org/10.1017/S0022109020000757.
Roberts, Michael R., and Toni M. Whited. 2013. “Endogeneity in Empirical Corporate Finance.” In Handbook of the Economics of Finance, 2:493–572. Elsevier. https://EconPapers.repec.org/RePEc:eee:finchp:2-a-493-572.
Seltzer, Lee H., Laura Starks, and Qifei Zhu. 2022. “Climate Regulatory Risk and Corporate Bonds.” Working Paper. https://www.nber.org/papers/w29994.