7

To solve the problem about pivot() of Pandas

 3 years ago
source link: http://www.donghao.org/2021/01/08/to-solve-the-problem-about-pivot-of-pandas/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

To solve the problem about pivot() of Pandas

Below is an example from pandas official document for pivot():

import pandas as pd

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='foo', columns='bar', values='baz')
print(result)
Python
xxxxxxxxxx
import pandas as pd
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
result = df.pivot(index='foo', columns='bar', values='baz')
print(result)

the result will be

bar  A  B  C
foo         
one  1  2  3
two  4  5  6
Python
xxxxxxxxxx
bar  A  B  C
foo         
one  1  2  3
two  4  5  6

But if there are duplicate rows in the Dataframe, it will report error:

ValueError: Index contains duplicate entries, cannot reshape
Python
xxxxxxxxxx
ValueError: Index contains duplicate entries, cannot reshape

To fix this, we can just drop_duplicates() first and then pivot():

result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')
Python
xxxxxxxxxx
result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')

As matter of fact, there are situations that drop_duplicates() couldn’t fix:

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'A', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
Python
xxxxxxxxxx
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'A', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

Now we will need to use groupby() and unstack() to replace pivot():

result = (df.groupby(["foo", "bar"])
    .baz
    .first()
    .unstack()
)
Python
xxxxxxxxxx
result = (df.groupby(["foo", "bar"])
    .baz
    .first()
    .unstack()
)

And the result is

bar    A    B    C
foo               
one  1.0  NaN  3.0
two  4.0  5.0  6.0
Python
xxxxxxxxxx
bar    A    B    C
foo               
one  1.0  NaN  3.0
two  4.0  5.0  6.0

Like this:

Loading...
12:58 am ROBIN DONG bigdata
pandas
Leave a comment

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK