7
To solve the problem about pivot() of Pandas
source link: http://www.donghao.org/2021/01/08/to-solve-the-problem-about-pivot-of-pandas/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
To solve the problem about pivot() of Pandas
Below is an example from pandas official document for pivot():
import pandas as pd df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], 'baz': [1, 2, 3, 4, 5, 6], 'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) result = df.pivot(index='foo', columns='bar', values='baz') print(result)
Python
xxxxxxxxxx
import pandas as pd
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
result = df.pivot(index='foo', columns='bar', values='baz')
print(result)
the result will be
bar A B C foo one 1 2 3 two 4 5 6
Python
xxxxxxxxxx
bar A B C
foo
one 1 2 3
two 4 5 6
But if there are duplicate rows in the Dataframe, it will report error:
ValueError: Index contains duplicate entries, cannot reshape
Python
xxxxxxxxxx
ValueError: Index contains duplicate entries, cannot reshape
To fix this, we can just drop_duplicates()
first and then pivot()
:
result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')
Python
xxxxxxxxxx
result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')
As matter of fact, there are situations that drop_duplicates()
couldn’t fix:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'A', 'C', 'A', 'B', 'C'], 'baz': [1, 2, 3, 4, 5, 6], 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
Python
xxxxxxxxxx
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'A', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
Now we will need to use groupby()
and unstack()
to replace pivot()
:
result = (df.groupby(["foo", "bar"]) .baz .first() .unstack() )
Python
xxxxxxxxxx
result = (df.groupby(["foo", "bar"])
.baz
.first()
.unstack()
)
And the result is
bar A B C foo one 1.0 NaN 3.0 two 4.0 5.0 6.0
Python
xxxxxxxxxx
bar A B C
foo
one 1.0 NaN 3.0
two 4.0 5.0 6.0
Like this:
Loading...
pandas
Leave a comment
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK