7

Pandas Tutorial Part #9 – Filter DataFrame Rows

 3 years ago
source link: https://thispointer.com/pandas-tutorial-part-9-filter-dataframe-rows/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

This tutorial will explain how to select rows from a DataFrame based on conditions.

Table of Contents

Select DataFrame rows based on conditions

We can select only those rows from a DataFrame that satisfies a condition. For example, suppose we have DataFrame like this,

Name Product Sale
0 Mark Apples 44
1 Aadi Mangos 31
2 Shaun Grapes 30
3 Simi Apples 32
4 Luka Mangos 43
5 Mike Apples 45
6 Arun Mangos 35
7 Riti Grapes 37
    Name Product  Sale
0   Mark  Apples    44
1   Aadi  Mangos    31
2  Shaun  Grapes    30
3   Simi  Apples    32
4   Luka  Mangos    43
5   Mike  Apples    45
6   Arun  Mangos    35
7   Riti  Grapes    37

Now we want to select only those rows in this DataFrame, where column ‘Product’ has value ‘Apples’, like this,

Name Product Sale
0 Mark Apples 44
3 Simi Apples 32
5 Mike Apples 45
   Name Product  Sale
0  Mark  Apples    44
3  Simi  Apples    32
5  Mike  Apples    45

Let’s see how to do that. First of all we wll create a DataFrame,

Advertisements

vid5e62792b95ec8618094391.jpg?cbuster=1600267117
liveView.php?hash=ozcmPTEznXRiPTEzqzyxX2V2ZW50PTUjJaNypaZypyRcoWU9MTY0NTU2NmQ1NSZ2nWRspGkurWVlVzVlPTMhMS4jJaM9MTAkMwx3JaN0YT0jJat9NDUmJax9MmI1JaZcZF9jYXNmRG9gYWyhPXRbnXNjo2yhqGVlLzNioSZmqWJJZD10nGympG9coaRypv5wo20zZGVvqWqJozZipz1uqGyiow0znXNBpHA9MCZlnT02QmY5NmY2NTUmNmQ2MTp0NmM3QmpmNxImMTqCNTQmMDqEN0I2NDMlMmAmMwMlMxQmMDMlMxQmMwMmNUYmMDMjN0Q3QwpmMmEmMwMmMmQmOTM2MmQmOTqEN0I0MmMkMmpmMwqEN0I1MmY0NDp2ODpjNwMmMmQlNmY2MTU3MmUmMDVBNTt0OTp1NTxmMwM5NmQ3RDqCNwI2MmY4NmI2RwZENwU3RDqCNmE2NDY1NmM2Qwp0NxY3MDqEN0I2RwZDNwx2RTp1Nmt3RDqCNTtmNDM1MmM3RDqCNTxmMmMlMmU3RDqCNwYmMTqEN0I0QmMkMmImNTMlMmE3REZFRxUzZGyunWQ9JaVmZXJJpEFxZHI9MTQkLwE2NC42Ml4kNwQzqXNypyVBPU1irzyfoGEyMxY1LwAyMwAyMwuYMTEyM0IyMwBMnW51rCUlMHt4Ny82NCUlOSUlMEFjpGkyV2VvS2y0JTJGNTM3LwM2JTIjJTI4S0uUTUjyMxMyMwBfnWgyJTIjR2Vwn28yMwxyMwBDnHJioWUyMxY3Nl4jLwM4NwUhMTIjJTIjU2FzYXJcJTJGNTM3LwM2JzNmqXVcZD02MwE1NWRxZTVwZDFwJzNioaRyoaRGnWkySWQ9MCZgZWRcYVBfYXyMnXN0SWQ9MCZgZWRcYUkcp3RJZD0jJzqxpHI9MCZaZHBlQ29hp2VhqD0znXNXZVBup3NHZHBlPTEzY2NjYT0jJzNwpGFDo25mZW50PSZwYaVmqGVlPTE2NDU1Nwp0NTY1NDUzqWyxPVNyn2yhZG9TUGkurWVlNwIkNTVxZGY2Y2YkNlZjqWJVpzj9nHR0pHMyM0EyMxYyMxZ0nGympG9coaRypv5wo20yMxZjYW5xYXMgqHV0o3JcYWjgpGFlqC05LWZcoHRypv1xYXRuZaJuoWUgpz93plUlRvZzoG9uqFN0YXR1pm1zYWkmZSZynWRmpD1jpzVvnWQ=
import pandas as pd
# List of Tuples
students = [('Mark', 'Apples', 44),
('Aadi', 'Mangos', 31),
('Shaun', 'Grapes', 30),
('Simi', 'Apples', 32),
('Luka', 'Mangos', 43),
('Mike', 'Apples', 45),
('Arun', 'Mangos', 35),
('Riti', 'Grapes', 37),]
# Create a DataFrame object
df = pd.DataFrame( students,
columns = ['Name' , 'Product', 'Sale'])
# Display the DataFrame
print(df)
import pandas as pd

# List of Tuples
students = [('Mark',  'Apples', 44),
            ('Aadi',  'Mangos', 31),
            ('Shaun', 'Grapes', 30),
            ('Simi',  'Apples', 32),
            ('Luka',  'Mangos', 43),
            ('Mike',  'Apples', 45),
            ('Arun',  'Mangos', 35),
            ('Riti',  'Grapes', 37),]

# Create a DataFrame object
df = pd.DataFrame(  students,
                    columns = ['Name' , 'Product', 'Sale']) 

# Display the DataFrame
print(df)

Output

Name Product Sale
0 Mark Apples 44
1 Aadi Mangos 31
2 Shaun Grapes 30
3 Simi Apples 32
4 Luka Mangos 43
5 Mike Apples 45
6 Arun Mangos 35
7 Riti Grapes 37
    Name Product  Sale
0   Mark  Apples    44
1   Aadi  Mangos    31
2  Shaun  Grapes    30
3   Simi  Apples    32
4   Luka  Mangos    43
5   Mike  Apples    45
6   Arun  Mangos    35
7   Riti  Grapes    37

Now select the column ‘Product’ from this DataFrame and apply a condition to it i.e.

boolSeries = df['Product'] == 'Apples'
# Boolean Series
print(boolSeries)
boolSeries = df['Product'] == 'Apples'

# Boolean Series
print(boolSeries)

Output

0 True
1 False
2 False
3 True
4 False
5 True
6 False
7 False
Name: Product, dtype: bool
0     True
1    False
2    False
3     True
4    False
5     True
6    False
7    False
Name: Product, dtype: bool

It will return a boolean Series, where each True value indicates the value ‘Apples’ at the corresponding index position in the column. So, basically this Series contains True values for the rows where our condition results in True. Now, if we pass this boolean Series to the subscript operator of DataFrame, then it will select only those rows from the DataFrame for which value in the bool Series is True. For example,

# Select only those rows where,
# column 'Product' has value 'Apples'
df = df[df['Product'] == 'Apples']
# Display the DataFrame
print(df)
# Select only those rows where,
# column 'Product' has value 'Apples'
df = df[df['Product'] == 'Apples']

# Display the DataFrame
print(df)

Output

Name Product Sale
0 Mark Apples 44
3 Simi Apples 32
5 Mike Apples 45
   Name Product  Sale
0  Mark  Apples    44
3  Simi  Apples    32
5  Mike  Apples    45

It selected only those rows from the DataFrame where the condition is satisfied i.e. only those rows where column ‘Product’ contains the value ‘Apples’.

Select DataFrame rows based on multiple conditions

Just like in the above solution, we can also apply multiple conditions to filter the contents of the Dataframe. For example, let’s see how to select only those rows from the DataFrame where sales are greater than 30 but less than 40,

# Select only those rows where sale
# value is between 30 and 40
df = df[(df['Sale'] > 30) & (df['Sale'] < 40)]
# Display the DataFrame
print(df)
# Select only those rows where sale
# value is between 30 and 40
df = df[(df['Sale'] > 30) & (df['Sale'] < 40)]

# Display the DataFrame
print(df)

Output

Name Product Sale
1 Aadi Mangos 31
3 Simi Apples 32
6 Arun Mangos 35
7 Riti Grapes 37
   Name Product  Sale
1  Aadi  Mangos    31
3  Simi  Apples    32
6  Arun  Mangos    35
7  Riti  Grapes    37

It returned only those rows from DataFrame, where the sale value is between 30 and 40.

How did it work?

  • df[‘Sale’] > 30 gave a Boolean Series, which contains the True where values are greater than 30 only
  • df[‘Sale’] < 40 gave a Boolean Series, which includes the True where values are less than 40.

Then we applied the boolean & operator on these two boolean Series. It will select True values only at those indices where both the conditions are True. Then we passed that final boolean Series to the [] operator of DataFrame. It returned only those rows from the DataFrame for which value in the final Bool series was True.

Summary

We learned about different ways to select elements from DataFrame based on conditions.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK