1

Remap values in Pandas Column with Dictionary

 1 year ago
source link: https://thispointer.com/remap-values-in-pandas-column-with-dictionary/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Remap values in Pandas Column with Dictionary

In Pandas, A DataFrame is a two-dimensional array. Many times while working with pandas DataFrame, we need to remap the values of a specific column with dictionary and preserve NaNs. In this article, we will learn how to do that.

Table Of Contents

For that we need to create a new column by mapping the DataFrame column values with the Dictionary Key.

There are different methods to remap values in pandas DataFrame column with a dictionary and preserve NaNs. Let’s discuss each method one by one.

Remap values in a Column with Dictionary using DataFrame.map()

We can create a new column by mapping the values of an existing DataFrame column with the keys of a Dictionary using the DataFrame.map() function. We will pass a dictionary as an argument to map() function. In this dictionary keys are mapped with the values of an existing column. Corresponding values in the dictionary will be used to create a new column.

Advertisements

Example of remap column values with a dict using DataFrame.map()

A script to create new column course_code by remapping course code with the course column using DataFrame.map() and a dictionary.

import pandas as pd
import numpy as np
student = {'Rollno':[1,2,3,4,5],
'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
'Duration':['120days','150days','130days', None,np.nan],
'Course':["BCA","BSc","MCA","MSc","BBA"] }
df = pd.DataFrame(student)
print(df)
# Difine Dict with the key-value pair to remap.
dict_course_code = {"BCA" : 'BC',
"BSc" : 'BS',
"MCA": 'MC',
"MSc" : 'MS',
"BBA": 'BB'}
# Create a new column by mapping values of an existing column
df['Course_code'] = df['Course'].map(dict_course_code)
print(df)
import pandas as pd
import numpy as np

student = {'Rollno':[1,2,3,4,5],
            'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
            'Duration':['120days','150days','130days', None,np.nan],
            'Course':["BCA","BSc","MCA","MSc","BBA"] }

df = pd.DataFrame(student)
print(df)

# Difine Dict with the key-value pair to remap.
dict_course_code = {"BCA" : 'BC',
                    "BSc" : 'BS',
                    "MCA": 'MC',
                    "MSc" : 'MS',
                    "BBA": 'BB'}

# Create a new column by mapping values of an existing column
df['Course_code'] = df['Course'].map(dict_course_code)

print(df)

Output

Rollno Name Duration Course
0 1 Reema 120days BCA
1 2 Rekha 150days BSc
2 3 Jaya 130days MCA
3 4 Susma None MSc
4 5 Meena NaN BBA
Rollno Name Duration Course Course_code
0 1 Reema 120days BCA BC
1 2 Rekha 150days BSc BS
2 3 Jaya 130days MCA MC
3 4 Susma None MSc MS
4 5 Meena NaN BBA BB
   Rollno   Name Duration Course
0       1  Reema  120days    BCA
1       2  Rekha  150days    BSc
2       3   Jaya  130days    MCA
3       4  Susma     None    MSc
4       5  Meena      NaN    BBA

   Rollno   Name Duration Course Course_code
0       1  Reema  120days    BCA          BC
1       2  Rekha  150days    BSc          BS
2       3   Jaya  130days    MCA          MC
3       4  Susma     None    MSc          MS
4       5  Meena      NaN    BBA          BB

In the above script, the DataFrame.map() function is used to remap course column value with the key-value pairs of a dictionary and create new column of course_code which contains the remaped value of each course.

Example of Remapping column values while preserve values(NaN)

A script to fill NaN values, if the mapping value for a particular record is not present in dictionary.

import pandas as pd
import numpy as np
student= { 'Rollno':[1,2,3,4,5],
'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
'Duration':['120days','150days','130days', None, np.nan],
'Course':["BCA","BSc","MCA","MSc","BBA"] }
df = pd.DataFrame(student)
print(df)
# Define Dict with the key-value pair to remap.
dict_course_code = {"BCA" : 'BC',
"BSc" : 'BS',
"MCA": 'MC'}
# Create a new column by mapping values of an existing column
# Fill missing values in column with NaN
df['Course_code'] = df['Course'].map(dict_course_code).fillna(df['Course'])
print(df)
import pandas as pd
import numpy as np
student= {  'Rollno':[1,2,3,4,5],
            'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
            'Duration':['120days','150days','130days', None, np.nan],
            'Course':["BCA","BSc","MCA","MSc","BBA"] }

df = pd.DataFrame(student)
print(df)

# Define Dict with the key-value pair to remap.
dict_course_code = {"BCA" : 'BC',
                    "BSc" : 'BS',
                    "MCA": 'MC'}

# Create a new column by mapping values of an existing column
# Fill missing values in column with NaN
df['Course_code'] = df['Course'].map(dict_course_code).fillna(df['Course'])

print(df)

Output

Rollno Name Duration Course
0 1 Reema 120days BCA
1 2 Rekha 150days BSc
2 3 Jaya 130days MCA
3 4 Susma None MSc
4 5 Meena NaN BBA
Rollno Name Duration Course Course_code
0 1 Reema 120days BCA BC
1 2 Rekha 150days BSc BS
2 3 Jaya 130days MCA MC
3 4 Susma None MSc MSc
4 5 Meena NaN BBA BBA
   Rollno   Name Duration Course
0       1  Reema  120days    BCA
1       2  Rekha  150days    BSc
2       3   Jaya  130days    MCA
3       4  Susma     None    MSc
4       5  Meena      NaN    BBA

   Rollno   Name Duration Course Course_code
0       1  Reema  120days    BCA          BC
1       2  Rekha  150days    BSc          BS
2       3   Jaya  130days    MCA          MC
3       4  Susma     None    MSc         MSc
4       5  Meena      NaN    BBA         BBA

In the above script, we have created a DataFrame with four columns. Then created a dictionary to map values of course column with Course_code. But the remap value for course MCA and BBA don’t exists. Therefore, fillna() is used to fill the non existing value with the NaN.

Remap values in a Column with Dictionary using DataFrame.replace()

The DataFrame.replace() method has different overloaded implementations. We can use the one which takes a Dictionary (Dict) to remap the column values. As you know Dictionary contains key-value pairs, where the key is the existing value on a column and value is the replacement value.

Example of Remap Column Values with a Dict Using Pandas DataFrame.replace()

A script to remap course name with the code using DataFrame.replace().

import pandas as pd
import numpy as np
student= { 'Rollno':[1,2,3,4,5],
'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
'Duration':['120days','150days','130days', None, np.nan],
'Course':["BCA","BSc","MCA","MSc","BBA"] }
df = pd.DataFrame(student)
print(df)
# Define Dict with the key-value pair to remap.
dictObj = { "BCA" : 'BC',
"BSc" : 'BS',
"MCA": 'MC',
"MSc" : 'MS',
"BBA": 'BB'}
df = df.replace({"Course": dictObj})
print(df)
import pandas as pd
import numpy as np
student= {  'Rollno':[1,2,3,4,5],
            'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
            'Duration':['120days','150days','130days', None, np.nan],
            'Course':["BCA","BSc","MCA","MSc","BBA"] }

df = pd.DataFrame(student)
print(df)

# Define Dict with the key-value pair to remap.
dictObj = { "BCA" : 'BC',
            "BSc" : 'BS',
            "MCA": 'MC',
            "MSc" : 'MS',
            "BBA": 'BB'}

df = df.replace({"Course": dictObj})

print(df)

Output

Rollno Name Duration Course
0 1 Reema 120days BCA
1 2 Rekha 150days BSc
2 3 Jaya 130days MCA
3 4 Susma None MSc
4 5 Meena NaN BBA
Rollno Name Duration Course
0 1 Reema 120days BC
1 2 Rekha 150days BS
2 3 Jaya 130days MC
3 4 Susma None MS
4 5 Meena NaN BB>
Rollno   Name Duration Course
0       1  Reema  120days    BCA
1       2  Rekha  150days    BSc
2       3   Jaya  130days    MCA
3       4  Susma     None    MSc
4       5  Meena      NaN    BBA


    Rollno   Name Duration Course
0       1  Reema  120days     BC
1       2  Rekha  150days     BS
2       3   Jaya  130days     MC
3       4  Susma     None     MS
4       5  Meena      NaN     BB> 

In the above script, first we have created a DataFrame with four columns i.e. rollno, name, duration and course. Then we defined a dictionary with key-value pairs. Then using dataframe.replace() function. we remaped course name with the codes.

Example of Remap None or NaN Column Values

A script to remap none or NaN value of duration column value with 150 days using dataframe.replace() function.

import pandas as pd
import numpy as np
students = {'Rollno':[1,2,3,4,5],
'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
'Duration':['120days','150days','130days', None, np.nan],
'Course':["BCA","BSc","MCA","MSc","BBA"] }
df = pd.DataFrame(students)
print(df)
# Define Dict with the key-value pairs to remap
dict_duration = {"120days" : '120',
"150days" : '150',
"130days": '130',
np.nan:'150'}
# Remap all values in 'Duration' column with a dictionary
df.replace( {"Duration": dict_duration}, inplace=True)
print(df)
import pandas as pd
import numpy as np

students = {'Rollno':[1,2,3,4,5],
            'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
            'Duration':['120days','150days','130days', None, np.nan],
            'Course':["BCA","BSc","MCA","MSc","BBA"] }

df = pd.DataFrame(students)

print(df)

# Define Dict with the key-value pairs to remap
dict_duration = {"120days" : '120',
                 "150days" : '150',
                 "130days": '130',
                 np.nan:'150'}

# Remap all values in 'Duration' column with a dictionary
df.replace( {"Duration": dict_duration}, inplace=True)

print(df)

Output

Rollno Name Duration Course
0 1 Reema 120days BCA
1 2 Rekha 150days BSc
2 3 Jaya 130days MCA
3 4 Susma None MSc
4 5 Meena NaN BBA
Rollno Name Duration Course
0 1 Reema 120 BCA
1 2 Rekha 150 BSc
2 3 Jaya 130 MCA
3 4 Susma 150 MSc
4 5 Meena 150 BBA
Rollno   Name Duration Course
0       1  Reema  120days    BCA
1       2  Rekha  150days    BSc
2       3   Jaya  130days    MCA
3       4  Susma     None    MSc
4       5  Meena      NaN    BBA

   Rollno   Name Duration Course
0       1  Reema      120    BCA
1       2  Rekha      150    BSc
2       3   Jaya      130    MCA
3       4  Susma      150    MSc
4       5  Meena      150    BBA

In the above script, first we created a DataFrame with four columns rollno, name, duration and course. Then we created a Dictionary with key-value pairs, where values of column duration are mapped. In that we mapped the none and NaNs value with 150 days. Then we used the Dataframe.replace() to remap values of ‘Duration’ with the dictionary.

Remap Multiple Column Values in single dataframe.replace() function

A script to remap two columns i.e. courses and duration with respective dictionary values.

import pandas as pd
import numpy as np
student= { 'Rollno':[1,2,3,4,5],
'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
'Duration':['120days','150days','130days', None,np.nan],
'Course':["BCA","BSc","MCA","MSc","BBA"] }
df = pd.DataFrame(student)
print(df)
# Define Dictionaries with the key-value pair to remap.
dict_obj = {"BCA" : 'BC',
"BSc" : 'BS',
"MCA": 'MC',
"MSc" : 'MS',
"BBA": 'BB'}
dict_duration = {"120days" : '120',
"150days" : '150',
"130days" : '130',
np.nan :'150'}
# Map column Course with first dictionary
# Map column Duration with second dictionary
df.replace({"Course": dict_obj,
"Duration": dict_duration},
inplace=True)
print(df)
import pandas as pd
import numpy as np

student= {  'Rollno':[1,2,3,4,5],
            'Name' :["Reema","Rekha","Jaya","Susma","Meena"],
            'Duration':['120days','150days','130days', None,np.nan],
            'Course':["BCA","BSc","MCA","MSc","BBA"] }

df = pd.DataFrame(student)

print(df)

# Define Dictionaries with the key-value pair to remap.
dict_obj = {"BCA" : 'BC',
        "BSc" : 'BS',
        "MCA": 'MC',
        "MSc" : 'MS',
        "BBA": 'BB'}

dict_duration = {"120days" : '120',
                 "150days" : '150',
                 "130days" : '130',
                 np.nan    :'150'}

# Map column Course with first dictionary
# Map column Duration with second dictionary
df.replace({"Course": dict_obj,
            "Duration": dict_duration},
            inplace=True)

print(df)

Output

Rollno Name Duration Course
0 1 Reema 120days BCA
1 2 Rekha 150days BSc
2 3 Jaya 130days MCA
3 4 Susma None MSc
4 5 Meena NaN BBA
Rollno Name Duration Course
0 1 Reema 120 BC
1 2 Rekha 150 BS
2 3 Jaya 130 MC
3 4 Susma 150 MS
4 5 Meena 150 BB
   Rollno   Name Duration Course
0       1  Reema  120days    BCA
1       2  Rekha  150days    BSc
2       3   Jaya  130days    MCA
3       4  Susma     None    MSc
4       5  Meena      NaN    BBA

   Rollno   Name Duration Course
0       1  Reema      120     BC
1       2  Rekha      150     BS
2       3   Jaya      130     MC
3       4  Susma      150     MS
4       5  Meena      150     BB 

Summary

In the article we learned how to remap values in pandas DataFrame column with a dictionary and preserve NaNs. Happy Learning.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK