PROVIDER = 'Gerard Hoberg and Gordon Phillips, 2010, 2016'
URL = 'https://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip'
TXT_FILE = 'tnic3_data.txt'
HOST_WEBSITE = 'https://hobergphillips.tuck.dartmouth.edu/industryclass.htm'
FREQ = 'A'
MIN_YEAR = 1989
MAX_YEAR = 2021
ENTITY_ID_IN_RAW_DSET = 'gvkey'
ENTITY_ID_IN_CLEAN_DSET = 'gvkey'
TIME_VAR_IN_RAW_DSET = 'date'
TIME_VAR_IN_CLEAN_DSET = f'{FREQ}date'Hoberg, Phillips (2010, 2016)
10-K Text-based Network Industry Classifications (TNIC) data
This module downloads and processes data developed by:
- Text-Based Network Industries and Endogenous Product Differentiation. Gerard Hoberg and Gordon Phillips, 2016, Journal of Political Economy 124 (5), 1423-1465.
- Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis.Gerard Hoberg and Gordon Phillips, 2010, Review of Financial Studies 23 (10), 3773-3811.
See the authors’ dedicated website for more information on this dataset: https://hobergphillips.tuck.dartmouth.edu/industryclass.htm
get_raw_data
get_raw_data (url:str='https://hobergphillips.tuck.dartmouth.edu/idata/tn ic3_data.zip', txt_file:str='tnic3_data.txt')
Download raw data from url
| Type | Default | Details | |
|---|---|---|---|
| url | str | https://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip | |
| txt_file | str | tnic3_data.txt | Name of the data txt file inside the zip file found at url |
| Returns | pd.DataFrame |
raw = get_raw_data()process_raw_data
process_raw_data (df:pandas.core.frame.DataFrame=None, gvkey_to_permno:Union[bool,pandas.core.frame.DataFrame] =True)
Cleans up dates and optionally adds CRSP permnos
| Type | Default | Details | |
|---|---|---|---|
| df | pd.DataFrame | None | |
| gvkey_to_permno | bool | pd.DataFrame | True | Whether to download permno-gvkey link. If DataFrame, must contain ‘gvkey’ |
| Returns | pd.DataFrame |
clean = process_raw_data(raw)clean| gvkey1 | gvkey2 | score | Adate | permno1 | permno2 | |
|---|---|---|---|---|---|---|
| 0 | 1011 | 3226 | 0.1508 | 1988 | 10082 | 25022 |
| 1 | 1011 | 6282 | 0.0851 | 1988 | 10082 | 46747 |
| 2 | 1011 | 6734 | 0.0258 | 1988 | 10082 | 49606 |
| 3 | 1011 | 7609 | 0.0097 | 1988 | 10082 | 12058 |
| 4 | 1011 | 9526 | 0.0369 | 1988 | 10082 | 69519 |
| ... | ... | ... | ... | ... | ... | ... |
| 25479601 | 349972 | 322154 | 0.0444 | 2021 | 15642 | 22523 |
| 25479602 | 349972 | 331856 | 0.0169 | 2021 | 15642 | 14615 |
| 25479603 | 349972 | 332115 | 0.0214 | 2021 | 15642 | 80577 |
| 25479604 | 349972 | 345556 | 0.0781 | 2021 | 15642 | 16069 |
| 25479605 | 349972 | 347007 | 0.0711 | 2021 | 15642 | 15533 |
25479606 rows × 6 columns