5 Other data providers

In the previous chapters, we introduced many ways to get financial data that researchers regularly use. We showed how to load data into R from Yahoo!Finance and commonly used file types, such as comma-separated or Excel files. Then, we introduced remotely connecting to WRDS and downloading data from there. However, this is only a subset of the vast amounts of data available these days.

In this short chapter, we aim to provide an overview of common alternative data providers for which direct access via R packages exists. Such a list requires constant adjustments because both data providers and access methods change. However, we want to emphasize two main insights: First, the number of R packages that provide access to (financial) data is large. Too large actually to survey here exhaustively. Instead, we can only cover the tip of the iceberg. Second, R provides the functionalities to access basically any form of files or data available online. Thus, even if a desired data source does not come with a well-established R package, chances are high that data can be retrieved by establishing your own API connection or by scrapping the content.

In our non-exhaustive list below, we restrict ourselves to listing data sources accessed through easy-to-use R packages. For further inspiration on potential data sources, we recommend reading the R task view empirical finance. Further inspiration (on more general social sciences) can be found here.

If you feel that we miss a fantastic financial data source, please get in touch with via contact@tidy-finance.org - thank you very much for your support!

Source Description R packages
Macroeconomic Variables
FED The Federal Reserve Bank of St. Louis provides more than 818,000 US and international time series from 109 sources via the API FRED. The data is freely available and can be browsed online on the FRED homepage. fredr (Boysel and Vaughan 2021) and alfred (Kleen 2021)
ECB The European Central Bank’s Statistical Data Warehouse provides data on Euro area monetary policy, financial stability, and other topics relevant to the activities of the ECB and the European System of Central Banks (ESCB). ecb (Persson 2021)
Financial data
Bloomberg Bloomberg’s Fundamental coverage includes current and normalized historical data for the balance sheet, income statement, cash flows statement, and financial ratios. Additionally, it provides industry-specific data for communications, consumer, energy, health care, and many more. In order to retrieve Bloomberg data, a paid subscription is needed. Rblpapi (Armstrong, Eddelbuettel, and Laing 2022)
Refinitiv Eikon Eikon provides access to real-time market data, news, fundamental data, analytics, trading, and messaging tools. Refinitiv’s Eikon is a paid service. Apart from the CRAN version, there is also https://github.com/philaris/eikonapir. DatastreamDSWS2R (Cara 2021) and eikonapir
Nasdaq Data Link (Quandl) Quandl is a publisher of alternative data. Quandl publishes free data, scraped from many different sources from the web. However, some of the data requires specific subscriptions on the Quandl platform. Quandl (McTaggart, Daroczi, and Leung 2021)
Global factor data The data repository of Jensen, Kelly, and Pedersen (2022). They provide return data for characteristic-managed portfolios from around the world. The database includes factors for 153 characteristics in 13 themes, using data from 93 countries. Download the data here.
Open Source Asset Pricing The data repository of A. Y. Chen and Zimmermann (2022). They provide return data for over 200 trading strategies with different time periods and specifications. The authors also provide signals and explanations of the factor construction. Download the data here.
Simfin Simfin make fundamental financial data freely available to private investors, researchers, and students. The data provider applies automating data collection processes to collect a large set of publicly available information from firms’ financial statements. simfinapi (Gomolka 2021)
High-frequency data
IEX The IEX Group operates the Investors Exchange (IEX), a stock exchange for US equities. IEX offers US reference and market data including end-of-day and intraday pricing data. IEX offers an API which is freely available. Riex (Ibrahim 2021)
TAQ TAQ data provides subscribed users access to all trades and quotes for all issues traded on NYSE, Nasdaq, and the regional exchanges. TAQ data can be accessed from WRDS via Postgres. The highfrequency package delivers useful workflows to clean TAQ data. highfrequency (Boudt et al. 2022)
Other (free) data
Crypto data The data provider coinmarketcap retrieves cryptocurrency information and historical prices as well as information on the exchanges they are listed on. crypto2 (Stoeckl 2022)
Twitter Twitter provides (limited) access for academic research to extract and analyze Tweets. rtweet (Kearney 2019)
SEC company fillings The EDGAR database provides free public access to corporate information, allowing you to research a public company’s financial information and operations by reviewing the filings the company makes with the SEC. You can also research information provided by mutual funds (including money market funds), exchange-traded funds (ETFs), and variable annuities. edgarWebR (Waldstein 2021)
Google trends Google offers public access to global search volumes through its search engine through the Google Trends portal. globaltrends (Puhr and Müllner 2021) and gtrends (Massicotte and Eddelbuettel 2022)

5.1 Exercises

  1. Select one of the data sources in the table above and retrieve some data: Browse the homepage of the data provider or the package documentation to find inspiration on which type of data is available to you and how to download the data into your R session.
  2. Generate summary statistics of the data you retrieved and provide some useful visualization. The possibilities are endless: Maybe there is some interesting economic event you want to analyse such as stock market responses to Twitter activity?
  3. Simfin provides excellent data coverage. Use their API to find out if the information Simfin provides overlaps with the CRSP/Compustat dataset in the tidy_finance.sqlite database introduced in the Chapters 2-4.