= 'Gerard Hoberg and Gordon Phillips, 2010, 2016'
PROVIDER = 'https://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip'
URL = 'tnic3_data.txt'
TXT_FILE = 'https://hobergphillips.tuck.dartmouth.edu/industryclass.htm'
HOST_WEBSITE = 'A'
FREQ = 1989
MIN_YEAR = 2021
MAX_YEAR = 'gvkey'
ENTITY_ID_IN_RAW_DSET = 'gvkey'
ENTITY_ID_IN_CLEAN_DSET = 'date'
TIME_VAR_IN_RAW_DSET = f'{FREQ}date' TIME_VAR_IN_CLEAN_DSET
Hoberg, Phillips (2010, 2016)
10-K Text-based Network Industry Classifications (TNIC) data
This module downloads and processes data developed by:
- Text-Based Network Industries and Endogenous Product Differentiation. Gerard Hoberg and Gordon Phillips, 2016, Journal of Political Economy 124 (5), 1423-1465.
- Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis.Gerard Hoberg and Gordon Phillips, 2010, Review of Financial Studies 23 (10), 3773-3811.
See the authors’ dedicated website for more information on this dataset: https://hobergphillips.tuck.dartmouth.edu/industryclass.htm
get_raw_data
get_raw_data (url:str='https://hobergphillips.tuck.dartmouth.edu/idata/tn ic3_data.zip', txt_file:str='tnic3_data.txt')
Download raw data from url
Type | Default | Details | |
---|---|---|---|
url | str | https://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip | |
txt_file | str | tnic3_data.txt | Name of the data txt file inside the zip file found at url |
Returns | pd.DataFrame |
= get_raw_data() raw
process_raw_data
process_raw_data (df:pandas.core.frame.DataFrame=None, gvkey_to_permno:Union[bool,pandas.core.frame.DataFrame] =True)
Cleans up dates and optionally adds CRSP permnos
Type | Default | Details | |
---|---|---|---|
df | pd.DataFrame | None | |
gvkey_to_permno | bool | pd.DataFrame | True | Whether to download permno-gvkey link. If DataFrame, must contain ‘gvkey’ |
Returns | pd.DataFrame |
= process_raw_data(raw) clean
clean
gvkey1 | gvkey2 | score | Adate | permno1 | permno2 | |
---|---|---|---|---|---|---|
0 | 1011 | 3226 | 0.1508 | 1988 | 10082 | 25022 |
1 | 1011 | 6282 | 0.0851 | 1988 | 10082 | 46747 |
2 | 1011 | 6734 | 0.0258 | 1988 | 10082 | 49606 |
3 | 1011 | 7609 | 0.0097 | 1988 | 10082 | 12058 |
4 | 1011 | 9526 | 0.0369 | 1988 | 10082 | 69519 |
... | ... | ... | ... | ... | ... | ... |
25479601 | 349972 | 322154 | 0.0444 | 2021 | 15642 | 22523 |
25479602 | 349972 | 331856 | 0.0169 | 2021 | 15642 | 14615 |
25479603 | 349972 | 332115 | 0.0214 | 2021 | 15642 | 80577 |
25479604 | 349972 | 345556 | 0.0781 | 2021 | 15642 | 16069 |
25479605 | 349972 | 347007 | 0.0711 | 2021 | 15642 | 15533 |
25479606 rows × 6 columns