1

Pandas: Apply Function to Column

 1 year ago
source link: https://thispointer.com/pandas-apply-a-function-to-single-or-selected-columns-or-rows-in-dataframe/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Pandas: Apply Function to Column

In this article, we will discuss different ways to apply a given function to selected columns or rows of a Pandas DataFrame.

Table Of Contents

Suppose we have a dataframe object i.e.

import pandas as pd
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print(dfObj)
import pandas as pd

# List of Tuples
matrix = [(22, 34, 23),
          (33, 31, 11),
          (44, 16, 21),
          (55, 32, 22),
          (66, 33, 27),
          (77, 35, 11)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print(dfObj)

Contents of the dataframe object dfObj are,

a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
    x   y   z
a  22  34  23
b  33  31  11
c  44  16  21
d  55  32  22
e  66  33  27
f  77  35  11

Now if we want to call or apply a function on some of the elements of DataFrame. Like on a single or multiple columns or rows of DataFrame? For example,

Advertisements

  • Apply a function on a column, that should multiply all the values in column ‘x’ by 2
  • Apply a function on a row, that should multiply all the values in row ‘c’ by 10
  • Apply a function on a two columns, that should add 10 in all the values in column ‘y’ & ‘z’

Let’s see how to do that using different techniques,

Apply a function to a single column in Dataframe

Suppose we want to square all the values in column ‘z’ for above created DataFrame object dfObj. We can do that using different methods i.e.

Method 1 : Using Dataframe.apply()

Apply a lambda function to all the columns in dataframe using Dataframe.apply() and inside this lambda function check if column name is ‘z’ then square all the values in it i.e.

import pandas as pd
import numpy as np
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print(dfObj)
# Apply function numpy.square() to square the value one column only i.e. with column name 'z'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
import pandas as pd
import numpy as np

# List of Tuples
matrix = [(22, 34, 23),
          (33, 31, 11),
          (44, 16, 21),
          (55, 32, 22),
          (66, 33, 27),
          (77, 35, 11)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print(dfObj)

# Apply function numpy.square() to square the value one column only i.e. with column name 'z'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)

print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')

Output

Output:

a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
Modified Dataframe : Squared the values in column 'z'
a 22 34 529
b 33 31 121
c 44 16 441
d 55 32 484
e 66 33 729
f 77 35 121
    x   y   z
a  22  34  23
b  33  31  11
c  44  16  21
d  55  32  22
e  66  33  27
f  77  35  11

Modified Dataframe : Squared the values in column 'z'
    x   y    z
a  22  34  529
b  33  31  121
c  44  16  441
d  55  32  484
e  66  33  729
f  77  35  121

There are 2 other ways to achieve the same effect i.e.

Method 2 : Using [] Operator

Select the column from dataframe as series using [] operator and apply numpy.square() method on it. Then assign it back to column i.e.

# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = dfObj['z'].apply(np.square)
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = dfObj['z'].apply(np.square)

It will basically square all the values in column ‘z’

Method 3 : Using numpy.square()

# Method 3:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = np.square(dfObj['z'])
# Method 3:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = np.square(dfObj['z'])

It will also square all the values in column ‘z’

Apply a function to a single row in Dataframe

Suppose we want to square all the values in row ‘b’ for above created dataframe object dfObj. We can do that using different methods i.e.

Method 1 : Using Dataframe.apply()

Apply a lambda function to all the rows in dataframe using Dataframe.apply() and inside this lambda function check if row index label is ‘b’ then square all the values in it i.e.

import pandas as pd
import numpy as np
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print(dfObj)
# Apply function numpy.square() to square the values of one row only i.e. row with index name 'b'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)
print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')
import pandas as pd
import numpy as np

# List of Tuples
matrix = [(22, 34, 23),
          (33, 31, 11),
          (44, 16, 21),
          (55, 32, 22),
          (66, 33, 27),
          (77, 35, 11)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print(dfObj)

# Apply function numpy.square() to square the values of one row only i.e. row with index name 'b'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)

print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')

Output:

a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
Modified Dataframe : Squared the values in row 'b'
a 22 34 23
b 1089 961 121
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
    x   y   z
a  22  34  23
b  33  31  11
c  44  16  21
d  55  32  22
e  66  33  27
f  77  35  11

Modified Dataframe : Squared the values in row 'b'
      x    y    z
a    22   34   23
b  1089  961  121
c    44   16   21
d    55   32   22
e    66   33   27
f    77   35   11

There are 2 other ways to achieve the same effect i.e.

Method 2 : Using [] Operator

Select the row from dataframe as series using dataframe.loc[] operator and apply numpy.square() method on it. Then assign it back to row i.e.

# Apply a function to one row and assign it back to the row in dataframe
dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)
# Apply a function to one row and assign it back to the row in dataframe
dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)

It will basically square all the values in row ‘b’

Method 3 : Using numpy.square()

# Apply a function to one row and assign it back to the column in dataframe
dfObj.loc['b'] = np.square(dfObj.loc['b'])
# Apply a function to one row and assign it back to the column in dataframe
dfObj.loc['b'] = np.square(dfObj.loc['b'])

It will also square all the values in row ‘b’.

Apply a function to a certain columns in Dataframe

We can apply a given function to only specified columns too. For example square the values in column ‘x’ & ‘y’ i.e.

import pandas as pd
import numpy as np
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print(dfObj)
# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)
print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
import pandas as pd
import numpy as np

# List of Tuples
matrix = [(22, 34, 23),
          (33, 31, 11),
          (44, 16, 21),
          (55, 32, 22),
          (66, 33, 27),
          (77, 35, 11)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print(dfObj)

# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)

print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')

Output:

a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
Modified Dataframe : Squared the values in column x & y :
a 484 1156 23
b 1089 961 11
c 1936 256 21
d 3025 1024 22
e 4356 1089 27
f 5929 1225 11
    x   y   z
a  22  34  23
b  33  31  11
c  44  16  21
d  55  32  22
e  66  33  27
f  77  35  11

Modified Dataframe : Squared the values in column x & y :
      x     y   z
a   484  1156  23
b  1089   961  11
c  1936   256  21
d  3025  1024  22
e  4356  1089  27
f  5929  1225  11

Basically we just modified the if condition in lambda function and squared the values in columns with name x & y.

Apply a function to a certain rows in Dataframe

We can apply a given function to only specified rows too. For example square the values in column ‘b’ & ‘c’ i.e.

import pandas as pd
import numpy as np
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print(dfObj)
# Apply function numpy.square() to square the values of 2 rows
# only i.e. with row index name 'b' and 'c' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)
print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
import pandas as pd
import numpy as np

# List of Tuples
matrix = [(22, 34, 23),
          (33, 31, 11),
          (44, 16, 21),
          (55, 32, 22),
          (66, 33, 27),
          (77, 35, 11)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print(dfObj)

# Apply function numpy.square() to square the values of 2 rows
# only i.e. with row index name 'b' and 'c' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)

print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')

Output:

a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
Modified Dataframe : Squared the values in row b & c :
a 22 34 23
b 1089 961 121
c 1936 256 441
d 55 32 22
e 66 33 27
f 77 35 11
    x   y   z
a  22  34  23
b  33  31  11
c  44  16  21
d  55  32  22
e  66  33  27
f  77  35  11

Modified Dataframe : Squared the values in row b & c :
      x    y    z
a    22   34   23
b  1089  961  121
c  1936  256  441
d    55   32   22
e    66   33   27
f    77   35   11

Basically we just modified the if condition in lambda function and squared the values in rows with name b & c.

Complete example is as follows :

import pandas as pd
import numpy as np
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print("Original Dataframe", dfObj, sep='\n')
print('********* Apply a function to a single row or column in DataFrame ********')
print('*** Apply a function to a single column *** ')
# Method 1:
# Apply function numpy.square() to square the value one column only i.e. with column name 'z'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
# Method 2:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = dfObj['z'].apply(np.square)
# Method 3:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = np.square(dfObj['z'])
print('*** Apply a function to a single row *** ')
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
# Method 1:
# Apply function numpy.square() to square the values of one row only i.e. row with index name 'b'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)
print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')
# Method 2:
# Apply a function to one row and assign it back to the row in dataframe
dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)
# Method 3:
# Apply a function to one row and assign it back to the column in dataframe
dfObj.loc['b'] = np.square(dfObj.loc['b'])
print('********* Apply a function to certains row or column in DataFrame ********')
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print('Apply a function to certain columns only')
# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)
print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
print('Apply a function to certain rows only')
# Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)
print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
import pandas as pd
import numpy as np

# List of Tuples
matrix = [(22, 34, 23),
            (33, 31, 11),
            (44, 16, 21),
            (55, 32, 22),
            (66, 33, 27),
            (77, 35, 11)
            ]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print("Original Dataframe", dfObj, sep='\n')

print('********* Apply a function to a single row or column in DataFrame ********')

print('*** Apply a function to a single column *** ')

# Method 1:
# Apply function numpy.square() to square the value one column only i.e. with column name 'z'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)

print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')

# Method 2:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = dfObj['z'].apply(np.square)

# Method 3:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = np.square(dfObj['z'])


print('*** Apply a function to a single row *** ')

dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

# Method 1:
# Apply function numpy.square() to square the values of one row only i.e. row with index name 'b'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)

print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')


# Method 2:
# Apply a function to one row and assign it back to the row in dataframe
dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)

# Method 3:
# Apply a function to one row and assign it back to the column in dataframe
dfObj.loc['b'] = np.square(dfObj.loc['b'])



print('********* Apply a function to certains row or column in DataFrame ********')

dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))

print('Apply a function to certain columns only')

# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)

print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')

print('Apply a function to certain rows only')

# Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)

print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')

Output:

Original Dataframe
a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
********* Apply a function to a single row or column in DataFrame ********
*** Apply a function to a single column ***
Modified Dataframe : Squared the values in column 'z'
a 22 34 529
b 33 31 121
c 44 16 441
d 55 32 484
e 66 33 729
f 77 35 121
*** Apply a function to a single row ***
Modified Dataframe : Squared the values in row 'b'
a 22 34 23
b 1089 961 121
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
********* Apply a function to certains row or column in DataFrame ********
Apply a function to certain columns only
Modified Dataframe : Squared the values in column x & y :
a 484 1156 23
b 1089 961 11
c 1936 256 21
d 3025 1024 22
e 4356 1089 27
f 5929 1225 11
Apply a function to certain rows only
Modified Dataframe : Squared the values in row b & c :
a 22 34 23
b 1089 961 121
c 1936 256 441
d 55 32 22
e 66 33 27
f 77 35 11
Original Dataframe
    x   y   z
a  22  34  23
b  33  31  11
c  44  16  21
d  55  32  22
e  66  33  27
f  77  35  11
********* Apply a function to a single row or column in DataFrame ********
*** Apply a function to a single column *** 
Modified Dataframe : Squared the values in column 'z'
    x   y    z
a  22  34  529
b  33  31  121
c  44  16  441
d  55  32  484
e  66  33  729
f  77  35  121
*** Apply a function to a single row *** 
Modified Dataframe : Squared the values in row 'b'
      x    y    z
a    22   34   23
b  1089  961  121
c    44   16   21
d    55   32   22
e    66   33   27
f    77   35   11
********* Apply a function to certains row or column in DataFrame ********
Apply a function to certain columns only
Modified Dataframe : Squared the values in column x & y :
      x     y   z
a   484  1156  23
b  1089   961  11
c  1936   256  21
d  3025  1024  22
e  4356  1089  27
f  5929  1225  11
Apply a function to certain rows only
Modified Dataframe : Squared the values in row b & c :
      x    y    z
a    22   34   23
b  1089  961  121
c  1936  256  441
d    55   32   22
e    66   33   27
f    77   35   11

Summary

We learned about different ways to apply a function to DataFrame columns or rows in Pandas.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK