ratios

Retrieve and process data from WRDS Financial Ratios Suite.

From the manual

Data Source:

All accounting related data are obtained from Compustat Quarterly and Annual file. Pricing related data, such as Market Capitalization and Price, are obtained from both CRSP and Compustat, and we rely on CRSP as the primary data source for pricing data. Earnings related data are from IBES database.

Data Frequency:

The final outputs for both individual firm and industry-level aggregated value are at monthly frequency. In order to populate the data to monthly frequency, we carry forward the most recent quarterly or annual data item, whichever is most recently available at a given time stamp, to the subsequent months before the next filing data becomes available. In addition, in order to make sure that all data is publicly available at the monthly time stamp, we lag all observations by two months to avoid any look ahead bias.4

Outlier Control:

As ratio metrics often produce unintended extreme outliers, we impose two layers of outliers control before aggregating at the industry level. First, for all the monthly frequency firm level individual ratio results, we impose a winsorization at 1% level for extreme values, and truncate the outliers in the top and bottom percentile to be missing. Secondly, to arrive at the final ratio output, we enforce a 12 month moving average on the monthly frequency financial ratios. The second step serves two purpose: to further smooth the final output, and to fill in the truncated extreme months (from step 1) with firm-specific moving average. Note that the outlier controls are only applied to the ratios fed to the industry-level aggregation. Outputs for firm-level financial ratios are raw ratios without any truncation or smoothing. Hence researchers are advised to censor/smooth the raw ratios to get rid of the extreme outliers before conducting further analysis.

public_date: date on which the information was available to the public

adate: fiscal year to which the information pertains

qdate: fiscal quarter to which the information pertains

NOTE:

This dataset has three variables that are in levels (apart from the ID variables): be, mktcap, and price (i.e. book equity, market cap, and stock price).
Excludes financials.
ID variables are: permno, gvkey, ticker, cusip, public_date, adate, qdate, gsector, gicdesc, and all variables starting with “ffi”

from __future__ import annotations
from typing import List

import pandas as pd

import pandasmore as pdm
from finsets.wrds import wrds_api

PROVIDER = 'Wharton Research Data Services (WRDS)'
URL = 'https://wrds-www.wharton.upenn.edu/pages/get-data/financial-ratios-suite-wrds/financial-ratios-with-ibes-subscription/financial-ratios-firm-level-ibes/'
LIBRARY = 'wrdsapps_finratio_ibes'
TABLE = 'firm_ratio_ibes'
FREQ = 'M'
MIN_YEAR = 1970
MAX_YEAR = None
ENTITY_ID_IN_RAW_DSET = 'permno'
ENTITY_ID_IN_CLEAN_DSET = 'permno'
TIME_VAR_IN_RAW_DSET = 'public_date'
TIME_VAR_IN_CLEAN_DSET = f'{FREQ}date'

source

list_all_vars

 list_all_vars ()

Collects names of all available variables from WRDS f{LIBRARY}.{TABLE}

all_vars = list_all_vars()

all_vars.name.count()

source

get_raw_data

 get_raw_data (vars:List[str]=None, nrows:int=None, start_date:str=None,
               end_date:str=None)

Downloads vars from start_date to end_date from WRDS {LIBRARY}.{TABLE} library

	Type	Default	Details
vars	List[str]	None	If None or ’*’, downloads all variables
nrows	int	None	Number of rows to download. If None, full dataset will be downloaded
start_date	str	None	Start date in MM/DD/YYYY format
end_date	str	None	End date in MM/DD/YYYY format
Returns	pd.DataFrame

raw = get_raw_data(start_date='01/01/2021', nrows=1000)

raw.head(0)

	gvkey	permno	adate	qdate	public_date	capei	be	bm	evm	pe_op_basic	...	ffi30_desc	ffi30	ffi38_desc	ffi38	ffi48_desc	ffi48	ffi49_desc	ffi49	ticker	cusip

0 rows × 100 columns

source

process_raw_data

 process_raw_data (df:pandas.core.frame.DataFrame=None,
                   clean_kwargs:dict={})

Converts some variables to categorical and applies pandasmore.setup_panel to df

	Type	Default	Details
df	pd.DataFrame	None	Must contain `permno` and `datadate` columns
clean_kwargs	dict	{}	Params to pass to `pdm.setup_panel` other than `panel_ids`, `time_var`, and `freq`
Returns	pd.DataFrame

df_clean = process_raw_data(raw)

df_clean.head(1)

		public_date	dtdate	gvkey	adate	qdate	capei	be	bm	evm	pe_op_basic	...	ffi30_desc	ffi30	ffi38_desc	ffi38	ffi48_desc	ffi48	ffi49_desc	ffi49	ticker	cusip
permno	Mdate
10145	2021-01	2021-01-31	2021-01-31	001300	2019-12-31	2020-09-30	28.329023	19548.0	0.16924	17.302839	24.574843	...	NaN	NaN	GOVT	37	NaN	NaN	NaN	NaN	HON	43851610

1 rows × 100 columns

source

keep_only_ratios

 keep_only_ratios (df:pandas.core.frame.DataFrame)

keep_only_ratios(df_clean).head(1)

		capei	bm	evm	pe_op_basic	pe_op_dil	pe_exi	pe_inc	ps	pcf	dpr	...	rd_sale	adv_sale	staff_sale	accrual	ret_crsp	ptb	peg_trailing	divyield	peg_1yrforward	peg_ltgforward
permno	Mdate
10145	2021-01	28.329023	0.16924	17.302839	24.574843	24.856234	28.030129	28.030129	4.125068	22.696753	0.514853	...	0.046821	0.0	0.0	-0.017443	-0.081476	7.012911	0.958785	0.019041	-2.024123	10.657844

1 rows × 72 columns