pandasmore

Extends pandas with common functions used in finance and economics research

The full documentation site is here, and the GitHub page is here.

Here is a short description of some of the main functions (more details below and in the documentation):

setup_tseries: cleans up dates and sets them as the index
setup_panel: cleans up dates and panel id’s and sets them as the index (panel id, period date)
lag: robust lagging that accounts for panel structure, unsorted or duplicate dates, or gaps in the time-series

Install

pip install pandasmore

How to use

First, we set up an example dataset to showcase the functions in this module.

import pandas as pd
import numpy as np
import pandasmore as pdm

raw = pd.DataFrame(np.random.rand(15,2), 
                    columns=list('AB'), 
                    index=pd.MultiIndex.from_product(
                        [[1,2, np.nan],[np.nan,'2010-01','2010-02','2010-02','2010-04']],
                        names = ['firm_id','date'])
                      ).reset_index()
raw

	firm_id	date	A	B
0	1.0	NaN	0.249370	0.926335
1	1.0	2010-01	0.282501	0.513859
2	1.0	2010-02	0.804278	0.307171
3	1.0	2010-02	0.828895	0.746789
4	1.0	2010-04	0.569099	0.331814
5	2.0	NaN	0.533977	0.823457
6	2.0	2010-01	0.207558	0.401378
7	2.0	2010-02	0.086001	0.959371
8	2.0	2010-02	0.054230	0.993980
9	2.0	2010-04	0.062525	0.200272
10	NaN	NaN	0.091012	0.635409
11	NaN	2010-01	0.866369	0.972394
12	NaN	2010-02	0.432087	0.837597
13	NaN	2010-02	0.878219	0.148009
14	NaN	2010-04	0.820386	0.834821

df = pdm.setup_tseries(raw.query('firm_id==1'),
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df

	date	dtdate	firm_id	A	B
Mdate
2010-01	2010-01	2010-01-01	1.0	0.282501	0.513859
2010-02	2010-02	2010-02-01	1.0	0.828895	0.746789
2010-04	2010-04	2010-04-01	1.0	0.569099	0.331814

df = pdm.setup_panel(raw,
                        panel_ids='firm_id',
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df

		date	dtdate	A	B
firm_id	Mdate
1	2010-01	2010-01	2010-01-01	0.282501	0.513859
	2010-02	2010-02	2010-02-01	0.828895	0.746789
	2010-04	2010-04	2010-04-01	0.569099	0.331814
2	2010-01	2010-01	2010-01-01	0.207558	0.401378
	2010-02	2010-02	2010-02-01	0.054230	0.993980
	2010-04	2010-04	2010-04-01	0.062525	0.200272

pdm.lag(df['A'])

firm_id  Mdate  
1        2010-01         NaN
         2010-02    0.282501
         2010-04         NaN
2        2010-01         NaN
         2010-02    0.207558
         2010-04         NaN
Name: A_lag1, dtype: float64