To solve the problem about pivot() of Pandas

Below is an example from pandas official document for pivot():

import pandas as pd

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='foo', columns='bar', values='baz')
print(result)

Python

xxxxxxxxxx

import pandas as pd

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],

                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],

                   'baz': [1, 2, 3, 4, 5, 6],

                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='foo', columns='bar', values='baz')

print(result)

the result will be

bar  A  B  C
foo         
one  1  2  3
two  4  5  6

Python

xxxxxxxxxx

bar  A  B  C

foo

one  1  2  3

two  4  5  6

But if there are duplicate rows in the Dataframe, it will report error:

ValueError: Index contains duplicate entries, cannot reshape

Python

xxxxxxxxxx

ValueError: Index contains duplicate entries, cannot reshape

To fix this, we can just drop_duplicates() first and then pivot():

result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')

Python

xxxxxxxxxx

result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')

As matter of fact, there are situations that drop_duplicates() couldn’t fix:

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'A', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

Python

xxxxxxxxxx

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],

                   'bar': ['A', 'A', 'C', 'A', 'B', 'C'],

                   'baz': [1, 2, 3, 4, 5, 6],

                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

Now we will need to use groupby() and unstack() to replace pivot():

result = (df.groupby(["foo", "bar"])
    .baz
    .first()
    .unstack()
)

Python

xxxxxxxxxx

result = (df.groupby(["foo", "bar"])

    .baz

    .first()

    .unstack()

And the result is

bar    A    B    C
foo               
one  1.0  NaN  3.0
two  4.0  5.0  6.0

Python

xxxxxxxxxx

bar    A    B    C

foo

one  1.0  NaN  3.0

two  4.0  5.0  6.0

Like this:

12:58 am ROBIN DONG bigdata
pandas
Leave a comment

To solve the problem about pivot() of Pandas

To solve the problem about pivot() of Pandas

Like this:

Recommend

Building a Chatbot to Order a Pizza

Stack and Queue in JavaScript

Xamarin Essentials: Features, Advantages and Benefits

XLS: The King of Future Past

A First Look at Vue Slots

The Best Way to Protect Sensitive Data | Voice of the DBA

Speaking at C# Corner’s Virtual Conference | Voice of the DBA

Daily Coping 8 Jan 2021 | Voice of the DBA

Goal Progress–Wrapping up 2020 | Voice of the DBA

Court stays Lynn Wright probate case until after Kleiman v Wright

About Joyk