# Keep postive total assets, sort and drop duplicatescomp = comp[comp['at']>0].copy()comp = comp.sort_values(['permno','year'])comp = comp.drop_duplicates(['permno','year'], keep='last', ignore_index=True)comp.head()
permno
datadate
at
revt
cogs
dltt
dlc
ib
dtdate
year
0
10000
1986-10-31
2.115
1.026
0.511
0.058
0.968
-0.730
1986-10-31
1986
1
10001
1986-06-30
12.242
21.460
19.565
2.946
0.343
0.669
1986-06-30
1986
2
10001
1987-06-30
11.771
16.621
15.538
2.750
0.377
0.312
1987-06-30
1987
3
10001
1988-06-30
11.735
16.978
15.556
2.555
0.325
0.542
1988-06-30
1988
4
10001
1989-06-30
18.565
22.910
19.856
7.370
0.185
1.208
1989-06-30
1989
comp.shape
(236316, 10)
New variables
# Calculate some of the variables in Table 1comp['at_lag1'] = comp.groupby('permno')['at'].shift(1)comp['AG'] = comp['at'] / comp['at_lag1'] -1comp['L2AG'] = comp.groupby('permno')['AG'].shift(1)comp['Leverage'] = (comp['dltt'] + comp['dlc']) / comp['at_lag1'] comp['ROA'] = comp['ib'] / comp['at']
# Keep only the variables we need, and observations with non-missing AGcomp = comp.loc[comp['AG'].notnull() ,:].copy()comp_cgs = comp[['permno','year','dtdate','AG','L2AG', 'at','Leverage', 'ROA']].copy()comp_cgs.head()
permno
year
dtdate
AG
L2AG
at
Leverage
ROA
2
10001
1987
1987-06-30
-0.038474
NaN
11.771
0.255432
0.026506
3
10001
1988
1988-06-30
-0.003058
-0.038474
11.735
0.244669
0.046187
4
10001
1989
1989-06-30
0.582020
-0.003058
18.565
0.643801
0.065069
5
10001
1990
1990-06-30
0.017021
0.582020
18.881
0.396984
0.059901
6
10001
1991
1991-06-30
0.038028
0.017021
19.599
0.380012
0.054748
comp_cgs.shape
(212504, 8)
# Save this for later usecomp_cgs.to_pickle('../data/comp_cgs.zip')
# Keep only the variables we needcrsp_cgs = crsp[['permno','mdate','dtdate','ret','MV','mktcap_lag1']].copy()crsp_cgs.head(2)
permno
mdate
dtdate
ret
MV
mktcap_lag1
0
10000
1986-02
1986-02-28
-0.257143
11.96
NaN
1
10000
1986-03
1986-03-31
0.365385
16.33
11.96
# Save for later usecrsp_cgs.to_pickle('../data/crsp_cgs.zip')
Calculate CRSP variables for Table 1
# Calculate cummulative returns over the past 6 months (BHRET6)crsp['BHRET6'] =1for i inrange(0,6): crsp['BHRET6'] = crsp['BHRET6'] * (1+ crsp.groupby('permno')['ret'].shift(i)) crsp['BHRET6'] = crsp['BHRET6'] -1crsp['BHRET6'].describe()
count 1.957695e+06
mean 6.840459e-02
std 5.437771e-01
min -9.999818e-01
25% -1.941748e-01
50% 1.254532e-02
75% 2.265447e-01
max 6.691429e+01
Name: BHRET6, dtype: float64
Challenge:
Write a function called compound that calculates cumulative returns over a sequence of months and adds them as another column in the crsp dataframe (i.e. exactly what we did above, with BHRET6 but for arbitrary number of lags, not just 6).
# Keep only the variables we need we need for Table 1crsp_table1 = crsp[['permno','mdate','MV','BHRET6','BHRET36','FBHRET12']].copy()crsp_table1.head(7)
permno
mdate
MV
BHRET6
BHRET36
FBHRET12
0
10000
1986-02
11.960000
NaN
NaN
-0.875000
1
10000
1986-03
16.330000
NaN
NaN
-0.943662
2
10000
1986-04
15.172000
NaN
NaN
-0.941406
3
10000
1986-05
11.793859
NaN
NaN
-0.929648
4
10000
1986-06
11.734594
NaN
NaN
NaN
5
10000
1986-07
10.786344
-0.350000
NaN
NaN
6
10000
1986-08
4.148594
-0.663462
NaN
NaN
# Save for later usecrsp_table1.to_pickle('../data/crsp_cgs_table1.zip')