Pandas Pandas: How the filling date varies in a multiindex
source link: https://www.codesd.com/item/pandas-pandas-how-the-filling-date-varies-in-a-multiindex.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Pandas Pandas: How the filling date varies in a multiindex
Suppose I was trying to organize sales data for a membership business.
I only have the start and end dates. Ideally sales between the start and end dates appear as 1, instead of missing.
I can't get the 'date' column to be filled with in-between dates. That is: I want a continuous set of months instead of gaps. Plus I need to fill missing data in columns with ffill.
I have tried different ways such as stack/unstack and reindex but different errors occur. I'm guessing there's a clean way to do this. What's the best practice to do this?
Suppose the multiindexed data structure:
variable sales
vendor date
a 2014-01-01 start date 1
2014-03-01 end date 1
b 2014-03-01 start date 1
2014-07-01 end date 1
And the desired result
variable sales
vendor date
a 2014-01-01 start date 1
2014-02-01 NaN 1
2014-03-01 end date 1
b 2014-03-01 start date 1
2014-04-01 NaN 1
2014-05-01 NaN 1
2014-06-01 NaN 1
2014-07-01 end date 1
you can do:
>>> f = lambda df: df.resample(rule='M', how='first')
>>> df.reset_index(level=0).groupby('vendor').apply(f).drop('vendor', axis=1)
variable sales
vendor date
a 2014-01-31 start date 1
2014-02-28 NaN NaN
2014-03-31 end date 1
b 2014-03-31 start date 1
2014-04-30 NaN NaN
2014-05-31 NaN NaN
2014-06-30 NaN NaN
2014-07-31 end date 1
and then just .fillna
on sales
column if needed.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK