4

Iterate over Rows of DataFrame in Pandas

 1 year ago
source link: https://thispointer.com/pandas-6-different-ways-to-iterate-over-rows-in-a-dataframe-update-while-iterating-row-by-row/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Iterate over Rows of DataFrame in Pandas

This article will discuss six different techniques to iterate over a dataframe row by row. Then we will also discuss how to update the contents of a Dataframe while iterating over it row by row.

Table of Contents

Suppose we have a dataframe i.e

import pandas as pd
# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5),
('Riti', 31, 'Delhi' , 7),
('Aadi', 16, 'New York', 11)]
# Create a DataFrame object from list of tuples
df = pd.DataFrame( empoyees,
columns=['Name', 'Age', 'City', 'Experience'],
index=['a', 'b', 'c'])
print(df)
import pandas as pd

# List of Tuples
empoyees = [('jack', 34, 'Sydney',   5),
            ('Riti', 31, 'Delhi' ,   7),
            ('Aadi', 16, 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(  empoyees,
                    columns=['Name', 'Age', 'City', 'Experience'],
                    index=['a', 'b', 'c'])

print(df)

Contents of the created dataframe are,

Name Age City Experience
a jack 34 Sydney 5
b Riti 31 Delhi 7
c Aadi 16 New York 11
   Name  Age      City  Experience
a  jack   34    Sydney           5
b  Riti   31     Delhi           7
c  Aadi   16  New York          11

Let’s see different ways to iterate over the rows of this dataframe,

Advertisements

Loop over Rows of Pandas Dataframe using iterrows()

Dataframe class provides a member function iterrows() i.e.

DataFrame.iterrows()

It yields an iterator which can can be used to iterate over all the rows of a dataframe in tuples. For each row it returns a tuple containing the index label and row contents as series.

Let’s iterate over all the rows of above created dataframe using iterrows() i.e.

# Loop through all rows of Dataframe along with index label
for (index_label, row_series) in df.iterrows():
print('Row Index label : ', index_label)
print('Row Content as Series : ', row_series.values)
# Loop through all rows of Dataframe along with index label
for (index_label, row_series) in df.iterrows():
    print('Row Index label : ', index_label)
    print('Row Content as Series : ', row_series.values)

Output:

Row Index label : a
Row Content as Series : ['jack' 34 'Sydney' 5]
Row Index label : b
Row Content as Series : ['Riti' 31 'Delhi' 7]
Row Index label : c
Row Content as Series : ['Aadi' 16 'New York' 11]
Row Index label :  a
Row Content as Series :  ['jack' 34 'Sydney' 5]
Row Index label :  b
Row Content as Series :  ['Riti' 31 'Delhi' 7]
Row Index label :  c
Row Content as Series :  ['Aadi' 16 'New York' 11]

Important points about Dataframe.iterrows()

  • Do not Preserve the data types:
    • As iterrows() returns each row contents as series but it does not preserve dtypes of values in the rows.
    • We can not modify something while iterating over the rows using iterrows(). The iterator does not returns a view instead it returns a copy. So, making any modification in returned row contents will have no effect on actual dataframe

Loop over Rows of Pandas Dataframe using itertuples()

Dataframe class provides a member function itertuples() i.e.

DataFrame.itertuples()

For each row it yields a named tuple containing the all the column names and their value for that row. Let’s use it to iterate over all the rows of above created dataframe i.e.

# Iterate over the Dataframe rows as named tuples
for namedTuple in df.itertuples():
print(namedTuple)
# Iterate over the Dataframe rows as named tuples
for namedTuple in df.itertuples():
    print(namedTuple)

Output:

Pandas(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
Pandas(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)

For every row in the dataframe a named tuple is returned. From named tuple you can access the individual values by indexing i.e.
To access the 1st value i.e. value with tag ‘index’ use,

print(namedTuple[0] )
print(namedTuple[0] )

Output:

c

To access the 2nd value i.e. value with tag ‘Name’ use

print(namedTuple[1] )
print(namedTuple[1] )

Output:

Aadi

Named Tuples without index 

If we don’t want index column to be included in these named tuple then we can pass argument index=False i.e.

# Iterate over the Dataframe rows as named tuples without index
for namedTuple in df.itertuples(index=False):
print(namedTuple)
# Iterate over the Dataframe rows as named tuples without index
for namedTuple in df.itertuples(index=False):
    print(namedTuple)

Output:

Pandas(Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Name='Aadi', Age=16, City='New York', Experience=11)
Pandas(Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Name='Aadi', Age=16, City='New York', Experience=11)

Named Tuples with custom names

By default named tuple returned is with name Pandas, we can provide our custom names too by providing name argument i.e.

# Give Custom Name to the tuple while Iterating over the Dataframe rows
for row in df.itertuples(name='Employee'):
print(row)
# Give Custom Name to the tuple while Iterating over the Dataframe rows
for row in df.itertuples(name='Employee'):
    print(row)

Output:

Employee(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Employee(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Employee(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
Employee(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Employee(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Employee(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)

Pandas – Iterate over Rows as dictionary

We can also iterate over the rows of dataframe and convert them to dictionary for accessing by column label using same itertuples() i.e.

# itertuples() yields an iterate to named tuple
for row in df.itertuples(name='Employee'):
# Convert named tuple to dictionary
dictRow = row._asdict()
# Print dictionary
print(dictRow)
# Access elements from dict i.e. row contents
print(dictRow['Name'] , ' is from ' , dictRow['City'])
# itertuples() yields an iterate to named tuple
for row in df.itertuples(name='Employee'):
    # Convert named tuple to dictionary
    dictRow = row._asdict()
    # Print dictionary
    print(dictRow)
    # Access elements from dict i.e. row contents
    print(dictRow['Name'] , ' is from ' , dictRow['City'])

Output:

OrderedDict([('Index', 'a'), ('Name', 'jack'), ('Age', 34), ('City', 'Sydney'), ('Experience', 5)])
jack is from Sydney
OrderedDict([('Index', 'b'), ('Name', 'Riti'), ('Age', 31), ('City', 'Delhi'), ('Experience', 7)])
Riti is from Delhi
OrderedDict([('Index', 'c'), ('Name', 'Aadi'), ('Age', 16), ('City', 'New York'), ('Experience', 11)])
Aadi is from New York
OrderedDict([('Index', 'a'), ('Name', 'jack'), ('Age', 34), ('City', 'Sydney'), ('Experience', 5)])      
jack  is from  Sydney
OrderedDict([('Index', 'b'), ('Name', 'Riti'), ('Age', 31), ('City', 'Delhi'), ('Experience', 7)])       
Riti  is from  Delhi
OrderedDict([('Index', 'c'), ('Name', 'Aadi'), ('Age', 16), ('City', 'New York'), ('Experience', 11)])   
Aadi  is from  New York

Iterate over Rows of Pandas Dataframe using index position and iloc

We can calculate the number of rows in a dataframe. Then loop through 0th index to last row and access each row by index position using iloc[] i.e.

# Loop through rows of dataframe by index i.e.
# from 0 to number of rows
for i in range(0, df.shape[0]):
# get row contents as series using iloc{]
# and index position of row
rowSeries = df.iloc[i]
# print row contents
print(rowSeries.values)
# Loop through rows of dataframe by index i.e.
# from 0 to number of rows
for i in range(0, df.shape[0]):
    # get row contents as series using iloc{]
    # and index position of row
    rowSeries = df.iloc[i]
    # print row contents
    print(rowSeries.values)

Output:

['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]
['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]

Iterate over rows in Dataframe in reverse using index position and iloc

Get the number of rows in a dataframe. Then loop through last index to 0th index and access each row by index position using iloc[] i.e.

# Loop through rows of dataframe by index in reverse
# i.e. from last row to row at 0th index.
for i in range(df.shape[0] - 1, -1, -1):
# get row contents as series using iloc{] & index pos of row
rowSeries = df.iloc[i]
# print row contents
print(rowSeries.values)
# Loop through rows of dataframe by index in reverse
# i.e. from last row to row at 0th index.
for i in range(df.shape[0] - 1, -1, -1):
    # get row contents as series using iloc{] & index pos of row
    rowSeries = df.iloc[i]
    # print row contents
    print(rowSeries.values)

Output:

['Aadi' 16 'New York' 11]
['Riti' 31 'Delhi' 7]
['jack' 34 'Sydney' 5]
['Aadi' 16 'New York' 11]
['Riti' 31 'Delhi' 7]
['jack' 34 'Sydney' 5]

Iterate over rows in dataframe using index labels and loc[]

As Dataframe.index returns a sequence of index labels, so we can iterate over those labels and access each row by index label i.e.

# loop through all the names in index
# label sequence of dataframe
for index in df.index:
# For each index label,
# access the row contents as series
rowSeries = df.loc[index]
# print row contents
print(rowSeries.values)
# loop through all the names in index
# label sequence of dataframe
for index in df.index:
    # For each index label,
    # access the row contents as series
    rowSeries = df.loc[index]
    # print row contents
    print(rowSeries.values)

Output:

['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]
['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]

Pandas : Iterate over rows and update

What if we want to change values while iterating over the rows of a Pandas Dataframe?

As Dataframe.iterrows() returns a copy of the dataframe contents in tuple, so updating it will have no effect on actual dataframe. So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access each row using at() to update it’s contents.

Let’s see an example,

Suppose we have a dataframe i.e

# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
(12, 7, 72200, 1100) ,
(13, 11, 84999, 1000)
# Create a DataFrame object
df = pd.DataFrame( salaries,
columns=['ID', 'Experience' , 'Salary', 'Bonus'])
print(df)
# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
           (12, 7, 72200, 1100) ,
           (13, 11, 84999, 1000)
           ]

# Create a DataFrame object
df = pd.DataFrame(  salaries,
                    columns=['ID', 'Experience' , 'Salary', 'Bonus'])

print(df)

Contents of the created dataframe df are,

ID Experience Salary Bonus
0 11 5 70000 1000
1 12 7 72200 1100
2 13 11 84999 1000
   ID  Experience  Salary  Bonus
0  11           5   70000   1000
1  12           7   72200   1100
2  13          11   84999   1000

Let’s update each value in column ‘Bonus’ by multiplying it with 2 while iterating over the dataframe row by row i.e.

# iterate over the dataframe row by row
for index_label, row_series in df.iterrows():
# For each row update the 'Bonus' value to it's double
df.at[index_label , 'Bonus'] = row_series['Bonus'] * 2
print(df)
# iterate over the dataframe row by row
for index_label, row_series in df.iterrows():
    # For each row update the 'Bonus' value to it's double
    df.at[index_label , 'Bonus'] = row_series['Bonus'] * 2

print(df)

Output:

ID Experience Salary Bonus
0 11 5 70000 2000
1 12 7 72200 2200
2 13 11 84999 2000
   ID  Experience  Salary  Bonus
0  11           5   70000   2000
1  12           7   72200   2200
2  13          11   84999   2000

Dataframe got updated i.e. we changed the values while iterating over the rows of Dataframe. Bonus value for each row became double.

The complete example is as follows,

import pandas as pd
# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5),
('Riti', 31, 'Delhi' , 7),
('Aadi', 16, 'New York', 11)]
# Create a DataFrame object from list of tuples
df = pd.DataFrame( empoyees,
columns=['Name', 'Age', 'City', 'Experience'],
index=['a', 'b', 'c'])
print(df)
print('**** Example 1 *********')
# Loop through all rows of Dataframe along with index label
for (index_label, row_series) in df.iterrows():
print('Row Index label : ', index_label)
print('Row Content as Series : ', row_series.values)
print('**** Example 2 *********')
# Iterate over the Dataframe rows as named tuples
for namedTuple in df.itertuples():
print(namedTuple)
print(namedTuple[0] )
print(namedTuple[1] )
print('**** Example 3 *********')
# Iterate over the Dataframe rows as named tuples without index
for namedTuple in df.itertuples(index=False):
print(namedTuple)
print('**** Example 4 *********')
# Give Custom Name to the tuple while Iterating over the Dataframe rows
for row in df.itertuples(name='Employee'):
print(row)
print('**** Example 5 *********')
# itertuples() yields an iterate to named tuple
for row in df.itertuples(name='Employee'):
# Convert named tuple to dictionary
dictRow = row._asdict()
# Print dictionary
print(dictRow)
# Access elements from dict i.e. row contents
print(dictRow['Name'] , ' is from ' , dictRow['City'])
print('**** Example 6 *********')
# Loop through rows of dataframe by index i.e.
# from 0 to number of rows
for i in range(0, df.shape[0]):
# get row contents as series using iloc{]
# and index position of row
rowSeries = df.iloc[i]
# print row contents
print(rowSeries.values)
print('**** Example 7 *********')
# Loop through rows of dataframe by index in reverse
# i.e. from last row to row at 0th index.
for i in range(df.shape[0] - 1, -1, -1):
# get row contents as series using iloc{] & index pos of row
rowSeries = df.iloc[i]
# print row contents
print(rowSeries.values)
print('**** Example 8 *********')
# loop through all the names in index
# label sequence of dataframe
for index in df.index:
# For each index label,
# access the row contents as series
rowSeries = df.loc[index]
# print row contents
print(rowSeries.values)
print('**** Example 9 *********')
# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
(12, 7, 72200, 1100) ,
(13, 11, 84999, 1000)
# Create a DataFrame object
df = pd.DataFrame( salaries,
columns=['ID', 'Experience' , 'Salary', 'Bonus'])
print(df)
# iterate over the dataframe row by row
for index_label, row_series in df.iterrows():
# For each row update the 'Bonus' value to it's double
df.at[index_label , 'Bonus'] = row_series['Bonus'] * 2
print(df)
import pandas as pd

# List of Tuples
empoyees = [('jack', 34, 'Sydney',   5),
            ('Riti', 31, 'Delhi' ,   7),
            ('Aadi', 16, 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(  empoyees,
                    columns=['Name', 'Age', 'City', 'Experience'],
                    index=['a', 'b', 'c'])

print(df)

print('**** Example 1 *********')

# Loop through all rows of Dataframe along with index label
for (index_label, row_series) in df.iterrows():
    print('Row Index label : ', index_label)
    print('Row Content as Series : ', row_series.values)


print('**** Example 2 *********')

# Iterate over the Dataframe rows as named tuples
for namedTuple in df.itertuples():
    print(namedTuple)

print(namedTuple[0] )

print(namedTuple[1] )

print('**** Example 3 *********')

# Iterate over the Dataframe rows as named tuples without index
for namedTuple in df.itertuples(index=False):
    print(namedTuple)

print('**** Example 4 *********')

# Give Custom Name to the tuple while Iterating over the Dataframe rows
for row in df.itertuples(name='Employee'):
    print(row)


print('**** Example 5 *********')

# itertuples() yields an iterate to named tuple
for row in df.itertuples(name='Employee'):
    # Convert named tuple to dictionary
    dictRow = row._asdict()
    # Print dictionary
    print(dictRow)
    # Access elements from dict i.e. row contents
    print(dictRow['Name'] , ' is from ' , dictRow['City'])



print('**** Example 6 *********')

# Loop through rows of dataframe by index i.e.
# from 0 to number of rows
for i in range(0, df.shape[0]):
    # get row contents as series using iloc{]
    # and index position of row
    rowSeries = df.iloc[i]
    # print row contents
    print(rowSeries.values)

print('**** Example 7 *********')


# Loop through rows of dataframe by index in reverse
# i.e. from last row to row at 0th index.
for i in range(df.shape[0] - 1, -1, -1):
    # get row contents as series using iloc{] & index pos of row
    rowSeries = df.iloc[i]
    # print row contents
    print(rowSeries.values)

print('**** Example 8 *********')

# loop through all the names in index
# label sequence of dataframe
for index in df.index:
    # For each index label,
    # access the row contents as series
    rowSeries = df.loc[index]
    # print row contents
    print(rowSeries.values)

print('**** Example 9 *********')

# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
           (12, 7, 72200, 1100) ,
           (13, 11, 84999, 1000)
           ]

# Create a DataFrame object
df = pd.DataFrame(  salaries,
                    columns=['ID', 'Experience' , 'Salary', 'Bonus'])

print(df)


# iterate over the dataframe row by row
for index_label, row_series in df.iterrows():
    # For each row update the 'Bonus' value to it's double
    df.at[index_label , 'Bonus'] = row_series['Bonus'] * 2

print(df)

Output:

Name Age City Experience
a jack 34 Sydney 5
b Riti 31 Delhi 7
c Aadi 16 New York 11
**** Example 1 *********
Row Index label : a
Row Content as Series : ['jack' 34 'Sydney' 5]
Row Index label : b
Row Content as Series : ['Riti' 31 'Delhi' 7]
Row Index label : c
Row Content as Series : ['Aadi' 16 'New York' 11]
**** Example 2 *********
Pandas(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
**** Example 3 *********
Pandas(Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Name='Aadi', Age=16, City='New York', Experience=11)
**** Example 4 *********
Employee(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Employee(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Employee(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
**** Example 5 *********
OrderedDict([('Index', 'a'), ('Name', 'jack'), ('Age', 34), ('City', 'Sydney'), ('Experience', 5)])
jack is from Sydney
OrderedDict([('Index', 'b'), ('Name', 'Riti'), ('Age', 31), ('City', 'Delhi'), ('Experience', 7)])
Riti is from Delhi
OrderedDict([('Index', 'c'), ('Name', 'Aadi'), ('Age', 16), ('City', 'New York'), ('Experience', 11)])
Aadi is from New York
**** Example 6 *********
['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]
**** Example 7 *********
['Aadi' 16 'New York' 11]
['Riti' 31 'Delhi' 7]
['jack' 34 'Sydney' 5]
**** Example 8 *********
['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]
**** Example 9 *********
ID Experience Salary Bonus
0 11 5 70000 1000
1 12 7 72200 1100
2 13 11 84999 1000
ID Experience Salary Bonus
0 11 5 70000 2000
1 12 7 72200 2200
2 13 11 84999 2000
   Name  Age      City  Experience
a  jack   34    Sydney           5
b  Riti   31     Delhi           7
c  Aadi   16  New York          11
**** Example 1 *********
Row Index label :  a
Row Content as Series :  ['jack' 34 'Sydney' 5]
Row Index label :  b
Row Content as Series :  ['Riti' 31 'Delhi' 7]
Row Index label :  c
Row Content as Series :  ['Aadi' 16 'New York' 11]
**** Example 2 *********
Pandas(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
c
Aadi
**** Example 3 *********
Pandas(Name='jack', Age=34, City='Sydney', Experience=5)
Pandas(Name='Riti', Age=31, City='Delhi', Experience=7)
Pandas(Name='Aadi', Age=16, City='New York', Experience=11)
**** Example 4 *********
Employee(Index='a', Name='jack', Age=34, City='Sydney', Experience=5)
Employee(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7)
Employee(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
**** Example 5 *********
OrderedDict([('Index', 'a'), ('Name', 'jack'), ('Age', 34), ('City', 'Sydney'), ('Experience', 5)])
jack  is from  Sydney
OrderedDict([('Index', 'b'), ('Name', 'Riti'), ('Age', 31), ('City', 'Delhi'), ('Experience', 7)])
Riti  is from  Delhi
OrderedDict([('Index', 'c'), ('Name', 'Aadi'), ('Age', 16), ('City', 'New York'), ('Experience', 11)])
Aadi  is from  New York
**** Example 6 *********
['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]
**** Example 7 *********
['Aadi' 16 'New York' 11]
['Riti' 31 'Delhi' 7]
['jack' 34 'Sydney' 5]
**** Example 8 *********
['jack' 34 'Sydney' 5]
['Riti' 31 'Delhi' 7]
['Aadi' 16 'New York' 11]
**** Example 9 *********
   ID  Experience  Salary  Bonus
0  11           5   70000   1000
1  12           7   72200   1100
2  13          11   84999   1000
   ID  Experience  Salary  Bonus
0  11           5   70000   2000
1  12           7   72200   2200
2  13          11   84999   2000

Summary

We learned about different ways to iterate over all rows of dataframe and change values while iterating.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK