from __future__ import annotations
from typing import List
import pandas as pd
import pandasmore as pdm
from finsets.wrds import wrds_api
ratios
From the manual
Data Source:
All accounting related data are obtained from Compustat Quarterly and Annual file. Pricing related data, such as Market Capitalization and Price, are obtained from both CRSP and Compustat, and we rely on CRSP as the primary data source for pricing data. Earnings related data are from IBES database.
Data Frequency:
The final outputs for both individual firm and industry-level aggregated value are at monthly frequency. In order to populate the data to monthly frequency, we carry forward the most recent quarterly or annual data item, whichever is most recently available at a given time stamp, to the subsequent months before the next filing data becomes available. In addition, in order to make sure that all data is publicly available at the monthly time stamp, we lag all observations by two months to avoid any look ahead bias.4
Outlier Control:
As ratio metrics often produce unintended extreme outliers, we impose two layers of outliers control before aggregating at the industry level. First, for all the monthly frequency firm level individual ratio results, we impose a winsorization at 1% level for extreme values, and truncate the outliers in the top and bottom percentile to be missing. Secondly, to arrive at the final ratio output, we enforce a 12 month moving average on the monthly frequency financial ratios. The second step serves two purpose: to further smooth the final output, and to fill in the truncated extreme months (from step 1) with firm-specific moving average. Note that the outlier controls are only applied to the ratios fed to the industry-level aggregation. Outputs for firm-level financial ratios are raw ratios without any truncation or smoothing. Hence researchers are advised to censor/smooth the raw ratios to get rid of the extreme outliers before conducting further analysis.
public_date
: date on which the information was available to the public
adate
: fiscal year to which the information pertains
qdate
: fiscal quarter to which the information pertains
NOTE:
This dataset has three variables that are in levels (apart from the ID variables):
be
,mktcap
, andprice
(i.e. book equity, market cap, and stock price).Excludes financials.
ID variables are: permno, gvkey, ticker, cusip, public_date, adate, qdate, gsector, gicdesc, and all variables starting with “ffi”
= 'Wharton Research Data Services (WRDS)'
PROVIDER = 'https://wrds-www.wharton.upenn.edu/pages/get-data/financial-ratios-suite-wrds/financial-ratios-with-ibes-subscription/financial-ratios-firm-level-ibes/'
URL = 'wrdsapps_finratio_ibes'
LIBRARY = 'firm_ratio_ibes'
TABLE = 'M'
FREQ = 1970
MIN_YEAR = None
MAX_YEAR = 'permno'
ENTITY_ID_IN_RAW_DSET = 'permno'
ENTITY_ID_IN_CLEAN_DSET = 'public_date'
TIME_VAR_IN_RAW_DSET = f'{FREQ}date' TIME_VAR_IN_CLEAN_DSET
list_all_vars
list_all_vars ()
Collects names of all available variables from WRDS f{LIBRARY}.{TABLE}
= list_all_vars() all_vars
all_vars.name.count()
100
get_raw_data
get_raw_data (vars:List[str]=None, nrows:int=None, start_date:str=None, end_date:str=None)
Downloads vars
from start_date
to end_date
from WRDS {LIBRARY}.{TABLE}
library
Type | Default | Details | |
---|---|---|---|
vars | List[str] | None | If None or ’*’, downloads all variables |
nrows | int | None | Number of rows to download. If None, full dataset will be downloaded |
start_date | str | None | Start date in MM/DD/YYYY format |
end_date | str | None | End date in MM/DD/YYYY format |
Returns | pd.DataFrame |
= get_raw_data(start_date='01/01/2021', nrows=1000) raw
0) raw.head(
gvkey | permno | adate | qdate | public_date | capei | be | bm | evm | pe_op_basic | ... | ffi30_desc | ffi30 | ffi38_desc | ffi38 | ffi48_desc | ffi48 | ffi49_desc | ffi49 | ticker | cusip |
---|
0 rows × 100 columns
process_raw_data
process_raw_data (df:pandas.core.frame.DataFrame=None, clean_kwargs:dict={})
Converts some variables to categorical and applies pandasmore.setup_panel
to df
Type | Default | Details | |
---|---|---|---|
df | pd.DataFrame | None | Must contain permno and datadate columns |
clean_kwargs | dict | {} | Params to pass to pdm.setup_panel other than panel_ids , time_var , and freq |
Returns | pd.DataFrame |
= process_raw_data(raw) df_clean
1) df_clean.head(
public_date | dtdate | gvkey | adate | qdate | capei | be | bm | evm | pe_op_basic | ... | ffi30_desc | ffi30 | ffi38_desc | ffi38 | ffi48_desc | ffi48 | ffi49_desc | ffi49 | ticker | cusip | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
permno | Mdate | |||||||||||||||||||||
10145 | 2021-01 | 2021-01-31 | 2021-01-31 | 001300 | 2019-12-31 | 2020-09-30 | 28.329023 | 19548.0 | 0.16924 | 17.302839 | 24.574843 | ... | NaN | NaN | GOVT | 37 | NaN | NaN | NaN | NaN | HON | 43851610 |
1 rows × 100 columns
keep_only_ratios
keep_only_ratios (df:pandas.core.frame.DataFrame)
1) keep_only_ratios(df_clean).head(
capei | bm | evm | pe_op_basic | pe_op_dil | pe_exi | pe_inc | ps | pcf | dpr | ... | rd_sale | adv_sale | staff_sale | accrual | ret_crsp | ptb | peg_trailing | divyield | peg_1yrforward | peg_ltgforward | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
permno | Mdate | |||||||||||||||||||||
10145 | 2021-01 | 28.329023 | 0.16924 | 17.302839 | 24.574843 | 24.856234 | 28.030129 | 28.030129 | 4.125068 | 22.696753 | 0.514853 | ... | 0.046821 | 0.0 | 0.0 | -0.017443 | -0.081476 | 7.012911 | 0.958785 | 0.019041 | -2.024123 | 10.657844 |
1 rows × 72 columns