finsets
  1. Papers
  2. Hoberg, Phillips (2010, 2016)
  • finsets
  • FRED
    • fred
    • fred_api
  • WRDS
    • wrds_api
    • crspm
    • crspd
    • compa
    • compa_ccm
    • compq
    • compq_ccm
    • ratios
    • ibes_ltg
    • bondret
    • mergent
    • linking
  • Papers
    • Dickerson, et al. (2023)
    • Gilchrist and Zakrajsek (2012)
    • Gürkaynak, et al. (2007)
    • Hassan, et al. (2019)
    • Hoberg, Phillips (2010, 2016)
    • Peters and Taylor (2016)

On this page

  • get_raw_data
  • process_raw_data
  • Report an issue
  1. Papers
  2. Hoberg, Phillips (2010, 2016)

Hoberg, Phillips (2010, 2016)

10-K Text-based Network Industry Classifications (TNIC) data

This module downloads and processes data developed by:

  • Text-Based Network Industries and Endogenous Product Differentiation. Gerard Hoberg and Gordon Phillips, 2016, Journal of Political Economy 124 (5), 1423-1465.
  • Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis.Gerard Hoberg and Gordon Phillips, 2010, Review of Financial Studies 23 (10), 3773-3811.

See the authors’ dedicated website for more information on this dataset: https://hobergphillips.tuck.dartmouth.edu/industryclass.htm

PROVIDER = 'Gerard Hoberg and Gordon Phillips, 2010, 2016'
URL = 'https://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip' 
TXT_FILE = 'tnic3_data.txt'
HOST_WEBSITE = 'https://hobergphillips.tuck.dartmouth.edu/industryclass.htm'
FREQ = 'A'
MIN_YEAR = 1989
MAX_YEAR = 2021
ENTITY_ID_IN_RAW_DSET = 'gvkey' 
ENTITY_ID_IN_CLEAN_DSET = 'gvkey' 
TIME_VAR_IN_RAW_DSET = 'date'
TIME_VAR_IN_CLEAN_DSET = f'{FREQ}date'

source

get_raw_data

 get_raw_data (url:str='https://hobergphillips.tuck.dartmouth.edu/idata/tn
               ic3_data.zip', txt_file:str='tnic3_data.txt')

Download raw data from url

Type Default Details
url str https://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip
txt_file str tnic3_data.txt Name of the data txt file inside the zip file found at url
Returns pd.DataFrame
raw = get_raw_data()

source

process_raw_data

 process_raw_data (df:pandas.core.frame.DataFrame=None,
                   gvkey_to_permno:Union[bool,pandas.core.frame.DataFrame]
                   =True)

Cleans up dates and optionally adds CRSP permnos

Type Default Details
df pd.DataFrame None
gvkey_to_permno bool | pd.DataFrame True Whether to download permno-gvkey link. If DataFrame, must contain ‘gvkey’
Returns pd.DataFrame
clean = process_raw_data(raw)
clean
gvkey1 gvkey2 score Adate permno1 permno2
0 1011 3226 0.1508 1988 10082 25022
1 1011 6282 0.0851 1988 10082 46747
2 1011 6734 0.0258 1988 10082 49606
3 1011 7609 0.0097 1988 10082 12058
4 1011 9526 0.0369 1988 10082 69519
... ... ... ... ... ... ...
25479601 349972 322154 0.0444 2021 15642 22523
25479602 349972 331856 0.0169 2021 15642 14615
25479603 349972 332115 0.0214 2021 15642 80577
25479604 349972 345556 0.0781 2021 15642 16069
25479605 349972 347007 0.0711 2021 15642 15533

25479606 rows × 6 columns

  • Report an issue