1

Replace column values based on conditions in Pandas

 1 year ago
source link: https://thispointer.com/replace-column-values-based-on-conditions-in-pandas/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this article, we will discuss various methods to replace the column values based on conditions in a pandas DataFrame. Let’s look at the table of contents describing the list of methods.

Table of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np
# List of Tuples
employees= [('Shubham', 'India', 'Tech', 5, 4),
('Riti', 'India', 'Design' , 7, 7),
('Shanky', 'India', 'PMO' , 2, 2),
('Shreya', 'India', 'Design' , 2, 0),
('Aadi', 'US', 'PMO', 11, 5),
('Sim', 'US', 'Tech', 4, 4)]
# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
columns=['Name', 'Location', 'Team', 'Experience', 'RelevantExperience'],
index = ['A', 'B', 'C', 'D', 'E', 'F'])
print(df)
import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'India', 'Tech',   5, 4),
            ('Riti', 'India', 'Design' ,   7, 7),
            ('Shanky', 'India', 'PMO' ,   2, 2),
            ('Shreya', 'India', 'Design' ,   2, 0),
            ('Aadi', 'US', 'PMO', 11, 5),
            ('Sim', 'US', 'Tech', 4, 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience', 'RelevantExperience'],
                  index = ['A', 'B', 'C', 'D', 'E', 'F'])
print(df)

Contents of the created dataframe are,

Name Location Team Experience RelevantExperience
A Shubham India Tech 5 4
B Riti India Design 7 7
C Shanky India PMO 2 2
D Shreya India Design 2 0
E Aadi US PMO 11 5
F Sim US Tech 4 4
      Name Location    Team  Experience  RelevantExperience
A  Shubham    India    Tech           5                   4
B     Riti    India  Design           7                   7
C   Shanky    India     PMO           2                   2
D   Shreya    India  Design           2                   0
E     Aadi       US     PMO          11                   5
F      Sim       US    Tech           4                   4

Method 1: Using .loc property of DataFrame

The loc property comes in handy whenever we want to filter the DataFrame based on certain conditions. Here, we will use the loc to filter the DataFrame and assign a new value to the filtered rows. Let’s understand with an example, say, we need to replace the “Tech” value in the “Team” column with “Tech & Data”.

# replace Tech with "Tech & Data" using loc
df.loc[df['Team'] == 'Tech', 'Team'] = 'Tech & Data'
print (df)
# replace Tech with "Tech & Data" using loc
df.loc[df['Team'] == 'Tech', 'Team'] = 'Tech & Data'

print (df)

Output

Name Location Team Experience RelevantExperience
A Shubham India Tech & Data 5 4
B Riti India Design 7 7
C Shanky India PMO 2 2
D Shreya India Design 2 0
E Aadi US PMO 11 5
F Sim US Tech & Data 4 4
      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

As observed, we first filtered all the rows satisfying the condition and then replaced the value by assigning the new value to the filtered rows.

Method 2: Using numpy.where() method

Another method is to use the numpy.where() function to replace values based on the condition. Let’s look at the function syntax and implement it in the above example.

np.where(condition, value if condition is TRUE, value if condition is False)

# replace Tech with "Tech & Data" using np.where
df['Team'] = np.where(df['Team'] == 'Tech', 'Tech & Data', df['Team'])
print (df)
# replace Tech with "Tech & Data" using np.where
df['Team'] = np.where(df['Team'] == 'Tech', 'Tech & Data', df['Team'])

print (df)

Output

Name Location Team Experience RelevantExperience
A Shubham India Tech & Data 5 4
B Riti India Design 7 7
C Shanky India PMO 2 2
D Shreya India Design 2 0
E Aadi US PMO 11 5
F Sim US Tech & Data 4 4
      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

Method 3: Using DataFrame.where() method

There is a where() method in pandas DataFrame as well. Let’s look at the syntax and implementation here.

pd.DataFrame['column_name'].where(~(condition), other=value if condition is True, inplace=True)
pd.DataFrame['column_name'].where(~(condition), other=value if condition is True, inplace=True)

Let’s implement it on the same example discussed above.

# replace Tech with "Tech & Data" using DataFrame.where
df['Team'].where(~(df.Team == 'Tech'), other='Tech & Data', inplace=True)
print(df)
# replace Tech with "Tech & Data" using DataFrame.where
df['Team'].where(~(df.Team == 'Tech'), other='Tech & Data', inplace=True)

print(df)

Output

Name Location Team Experience RelevantExperience
A Shubham India Tech & Data 5 4
B Riti India Design 7 7
C Shanky India PMO 2 2
D Shreya India Design 2 0
E Aadi US PMO 11 5
F Sim US Tech & Data 4 4
      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

The inplace=True helps us store the changes directly in the same DataFrame.

Method 4: Using mask() function from pandas

The final method is to use the masking function from pandas which are generally used for replacing the values of any row/column based on certain conditions. Let’s implement using the above example.

# replace Tech with "Tech & Data" using masking
df['Team'].mask(lambda col: col == 'Tech', 'Tech & Data', inplace=True)
print (df)
# replace Tech with "Tech & Data" using masking
df['Team'].mask(lambda col: col == 'Tech', 'Tech & Data', inplace=True)

print (df)

Output

Name Location Team Experience RelevantExperience
A Shubham India Tech & Data 5 4
B Riti India Design 7 7
C Shanky India PMO 2 2
D Shreya India Design 2 0
E Aadi US PMO 11 5
F Sim US Tech & Data 4 4
      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

As observed, we have got a similar output as the above methods.

Summary

In this article, we have discussed how to replace column values based on conditions in pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK