Replace column values based on conditions in Pandas
source link: https://thispointer.com/replace-column-values-based-on-conditions-in-pandas/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
In this article, we will discuss various methods to replace the column values based on conditions in a pandas DataFrame. Let’s look at the table of contents describing the list of methods.
Table of Contents
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'India', 'Tech', 5, 4), ('Riti', 'India', 'Design' , 7, 7), ('Shanky', 'India', 'PMO' , 2, 2), ('Shreya', 'India', 'Design' , 2, 0), ('Aadi', 'US', 'PMO', 11, 5), ('Sim', 'US', 'Tech', 4, 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience', 'RelevantExperience'], index = ['A', 'B', 'C', 'D', 'E', 'F']) print(df)
Contents of the created dataframe are,
Name Location Team Experience RelevantExperience A Shubham India Tech 5 4 B Riti India Design 7 7 C Shanky India PMO 2 2 D Shreya India Design 2 0 E Aadi US PMO 11 5 F Sim US Tech 4 4
Method 1: Using .loc property of DataFrame
The loc
property comes in handy whenever we want to filter the DataFrame based on certain conditions. Here, we will use the loc
to filter the DataFrame and assign a new value to the filtered rows. Let’s understand with an example, say, we need to replace the “Tech” value in the “Team” column with “Tech & Data”.
# replace Tech with "Tech & Data" using loc df.loc[df['Team'] == 'Tech', 'Team'] = 'Tech & Data' print (df)
Output
Name Location Team Experience RelevantExperience A Shubham India Tech & Data 5 4 B Riti India Design 7 7 C Shanky India PMO 2 2 D Shreya India Design 2 0 E Aadi US PMO 11 5 F Sim US Tech & Data 4 4
As observed, we first filtered all the rows satisfying the condition and then replaced the value by assigning the new value to the filtered rows.
Method 2: Using numpy.where() method
Another method is to use the numpy.where() function to replace values based on the condition. Let’s look at the function syntax and implement it in the above example.
np.where(condition, value if condition is TRUE, value if condition is False)
# replace Tech with "Tech & Data" using np.where df['Team'] = np.where(df['Team'] == 'Tech', 'Tech & Data', df['Team']) print (df)
Output
Name Location Team Experience RelevantExperience A Shubham India Tech & Data 5 4 B Riti India Design 7 7 C Shanky India PMO 2 2 D Shreya India Design 2 0 E Aadi US PMO 11 5 F Sim US Tech & Data 4 4
Method 3: Using DataFrame.where() method
There is a where() method in pandas DataFrame as well. Let’s look at the syntax and implementation here.
pd.DataFrame['column_name'].where(~(condition), other=value if condition is True, inplace=True)
Let’s implement it on the same example discussed above.
# replace Tech with "Tech & Data" using DataFrame.where df['Team'].where(~(df.Team == 'Tech'), other='Tech & Data', inplace=True) print(df)
Output
Name Location Team Experience RelevantExperience A Shubham India Tech & Data 5 4 B Riti India Design 7 7 C Shanky India PMO 2 2 D Shreya India Design 2 0 E Aadi US PMO 11 5 F Sim US Tech & Data 4 4
The inplace=True helps us store the changes directly in the same DataFrame.
Method 4: Using mask() function from pandas
The final method is to use the masking function from pandas which are generally used for replacing the values of any row/column based on certain conditions. Let’s implement using the above example.
# replace Tech with "Tech & Data" using masking df['Team'].mask(lambda col: col == 'Tech', 'Tech & Data', inplace=True) print (df)
Output
Name Location Team Experience RelevantExperience A Shubham India Tech & Data 5 4 B Riti India Design 7 7 C Shanky India PMO 2 2 D Shreya India Design 2 0 E Aadi US PMO 11 5 F Sim US Tech & Data 4 4
As observed, we have got a similar output as the above methods.
Summary
In this article, we have discussed how to replace column values based on conditions in pandas DataFrame. Thanks.
Pandas Tutorials -Learn Data Analysis with Python
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.
Join a LinkedIn Community of Python Developers
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK