Pandas Dataframe.loc[] – thisPointer - JOYK Joy of Geek, Geek News, Link all geek

Pandas Dataframe.loc[] – thisPointer Skip to content

In this article, we will discuss how to use the loc property of the Dataframe with examples.

In Pandas, the Dataframe provides a property loc[], to select the subset of Dataframe based on row and column names/labels. We can choose single or multiple rows & columns using it. Let’s learn more about it,

Syntax:

Dataframe.loc[row_segment , column_segment]

Dataframe.loc[row_segment]

Dataframe.loc[row_segment , column_segment]
Dataframe.loc[row_segment]

The column_segment argument is optional. Therefore, if column_segment is not provided, loc [] will select the subset of Dataframe based on row_segment argument only.

Arguments:

row_segement:
- It contains information about the rows to be selected. Its value can be,
  - A single label like ‘A’ or 7 etc.
    - In this case, it selects the single row with given label name.
    - For example, if ‘B’ only is given, then only the row with label ‘B’ is selected from Dataframe.
  - A list/array of label names like, [‘B’, ‘E’, ‘H’]
    - In this case, multiple rows will be selected based on row labels given in the list.
    - For example, if [‘B’, ‘E’, ‘H’] is given as argument in row segment, then the rows with label name ‘B’, ‘E’ and ‘H’ will be selected.
  - A slice object with ints like -> a:e .
    - This case will select multiple rows i.e. from row with label a to one before the row with label e.
    - For example, if ‘B’:’E’ is provided in the row segment of loc[], it will select a range of rows from label ‘B’ to one before label ‘E’
    - For selecting all rows, provide the value ( : )
  - A boolean sequence of same size as number of rows.
    - In this case, it will select only those rows for which the corresponding value in boolean array/list is True.
  - A callable function :
    - It can be a lambda function or general function, which accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

column_segement:
- It is optional.
- It contains information about the columns to be selected. Its value can be,
  - A single label like ‘A’ or 7 etc.
    - In this case, it selects the single column with given label name.
    - For example, if ‘Age’ only is given, then only the column with label ‘Age’ is selected from Dataframe.
  - A list/array of label names like, [‘Name’, ‘Age’, ‘City’]
    - In this case, multiple columns will be selected based on column labels given in the list.
    - For example, if [‘Name’, ‘Age’, ‘City’] is given as argument in column segment, then the columns with label names ‘Name’, ‘Age’, and ‘City’ will be selected.
  - A slice object with ints like -> a:e .
    - This case will select multiple columns i.e. from column with label a to one before the column with label e.
    - For example, if ‘Name’:’City’ is provided in the column segment of loc[], it will select a range of columns from label ‘Name’ to one before label ‘City’
    - For selecting all columns, provide the value ( : )
  - A boolean sequence of same size as number of columns.
    - In this case, it will select only those columns for which the corresponding value in boolean array/list is True.
  - A callable function :
    - It can be a lambda function or general function that accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

Returns :

It returns a reference to the selected subset of the dataframe based on the provided row and column names.
Also, if column_segment is not provided, it returns the subset of the Dataframe containing only selected rows based on the row_segment argument.

Error scenarios:

Dataframe.loc[row_sgement, column_segement] will give KeyError, if any label name provided is invalid.

Let’s understand more about it with some examples,

Pandas Dataframe.loc[] – Examples

We have divided examples in three parts i.e.

Let’s look at these examples one by one. But before that we will create a Dataframe from list of tuples,

import pandas as pd

# List of Tuples

students = [('jack', 34, 'Sydeny', 'Australia'),

('Riti', 30, 'Delhi', 'India'),

('Vikas', 31, 'Mumbai', 'India'),

('Neelu', 32, 'Bangalore', 'India'),

('John', 16, 'New York', 'US'),

('Mike', 17, 'las vegas', 'US')]

# Create a DataFrame from list of tuples

df = pd.DataFrame( students,

columns=['Name', 'Age', 'City', 'Country'],

index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df)

import pandas as pd

# List of Tuples
students = [('jack',  34, 'Sydeny',    'Australia'),
            ('Riti',  30, 'Delhi',     'India'),
            ('Vikas', 31, 'Mumbai',    'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John',  16, 'New York',   'US'),
            ('Mike',  17, 'las vegas',  'US')]

# Create a DataFrame from list of tuples
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df)

Output:

Name Age City Country

a jack 34 Sydeny Australia

b Riti 30 Delhi India

c Vikas 31 Mumbai India

d Neelu 32 Bangalore India

e John 16 New York US

f Mike 17 las vegas US

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Select a few rows from Dataframe

Here we will provide only row segment argument to the Dataframe.loc[]. Therefore it will select rows based on given names and all columns.

Select a single row of Dataframe

To select a row from the dataframe, pass the row name to the loc[]. For example,

# Select row at with label name 'c'

row = df.loc['c']

print(row)

# Select row at with label name 'c'
row = df.loc['c']

print(row)

Output:

Name Vikas

Age 31

City Mumbai

Country India

Name: c, dtype: object

Name        Vikas
Age            31
City       Mumbai
Country     India
Name: c, dtype: object

It returned the row with label name ‘c’ from the Dataframe, as a Series object.

Select multiple rows from Dataframe based on list of names

Pass a list of row label names to the row_segment of loc[]. It will return a subset of the Dataframe containing only mentioned rows. For example,

# Select multiple rows from Dataframe by label names

subsetDf = df.loc[ ['c', 'f', 'a'] ]

print(subsetDf)

# Select multiple rows from Dataframe by label names
subsetDf = df.loc[ ['c', 'f', 'a'] ]

print(subsetDf)

Output:

Name Age City Country

c Vikas 31 Mumbai India

f Mike 17 las vegas US

a jack 34 Sydeny Australia

    Name  Age       City    Country
c  Vikas   31     Mumbai      India
f   Mike   17  las vegas         US
a   jack   34     Sydeny  Australia

It returned a subset of the Dataframe containing only three rows with labels ‘c’, ‘f’ and ‘a’.

Select multiple rows from Dataframe based on name range

Pass an name range -> start:end in row segment of loc. It will return a subset of the Dataframe containing only the rows from name start to end from the original dataframe. For example,

# Select rows of Dataframe based on row label range

subsetDf = df.loc[ 'b' : 'f' ]

print(subsetDf)

# Select rows of Dataframe based on row label range
subsetDf = df.loc[ 'b' : 'f' ]

print(subsetDf)

Output:

Name Age City Country

b Riti 30 Delhi India

c Vikas 31 Mumbai India

d Neelu 32 Bangalore India

e John 16 New York US

f Mike 17 las vegas US

    Name  Age       City Country
b   Riti   30      Delhi   India
c  Vikas   31     Mumbai   India
d  Neelu   32  Bangalore   India
e   John   16   New York      US
f   Mike   17  las vegas      US

It returned a subset of the Dataframe containing only five rows from the original dataframe i.e. rows from label ‘b’ to label ‘f’.

Select rows of Dataframe based on bool array

Pass a boolean array/list in the row segment of loc[]. It will return a subset of the Dataframe containing only the rows for which the corresponding value in the boolean array/list is True. For example,

# Select rows of Dataframe based on bool array

subsetDf = df.loc[ [True, False, True, False, True, False] ]

print(subsetDf)

# Select rows of Dataframe based on bool array
subsetDf = df.loc[ [True, False, True, False, True, False] ]

print(subsetDf)

Output:

Name Age City Country

a jack 34 Sydeny Australia

c Vikas 31 Mumbai India

e John 16 New York US

    Name  Age      City    Country
a   jack   34    Sydeny  Australia
c  Vikas   31    Mumbai      India
e   John   16  New York         US

Select rows of Dataframe based on Callable function

Create a lambda function that accepts a dataframe as an argument, applies a condition on a column, and returns a bool list. This bool list will contain True only for those rows where the condition is True. Pass that lambda function to loc[] and returns only those rows will be selected for which condition returns True in the list.

For example, select only those rows where column ‘Age’ has a value of more than 25,

# Select rows of Dataframe based on callable function

subsetDf = df.loc[ lambda x : (x['Age'] > 25).tolist() ]

print(subsetDf)

# Select rows of Dataframe based on callable function
subsetDf = df.loc[ lambda x : (x['Age'] > 25).tolist() ]

print(subsetDf)

Output:

Name Age City Country

a jack 34 Sydeny Australia

b Riti 30 Delhi India

c Vikas 31 Mumbai India

d Neelu 32 Bangalore India

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India

Select a few Columns from Dataframe

Here we will provide the (:) in the row segment argument of the Dataframe.loc[]. Therefore it will select all rows, but only a few columns based on the names provided in column_segement.

Select a single column of Dataframe

To select a column from the dataframe, pass the column name to the loc[]. For example,

# Select single column from Dataframe by column name

column = df.loc[:, 'Age']

print(column)

# Select single column from Dataframe by column name
column = df.loc[:, 'Age']

print(column)

Output:

Name: Age, dtype: int64

a    34
b    30
c    31
d    32
e    16
f    17
Name: Age, dtype: int64

It returned the column ‘Age’ from Dataframe, as a Series object.

Select multiple columns from Dataframe based on list of names

Pass a list of column names to the column_segment of loc[]. It will return a subset of the Dataframe containing only mentioned columns. For example,

# Select multiple columns from Dataframe based on list of names

subsetDf = df.loc[:, ['Age', 'City', 'Name']]

print(subsetDf)

# Select multiple columns from Dataframe based on list of names
subsetDf = df.loc[:, ['Age', 'City', 'Name']]

print(subsetDf)

Output:

Age City Name

a 34 Sydeny jack

b 30 Delhi Riti

c 31 Mumbai Vikas

d 32 Bangalore Neelu

e 16 New York John

f 17 las vegas Mike

   Age       City   Name
a   34     Sydeny   jack
b   30      Delhi   Riti
c   31     Mumbai  Vikas
d   32  Bangalore  Neelu
e   16   New York   John
f   17  las vegas   Mike

It returned a subset of the Dataframe containing only three columns.

Select multiple columns from Dataframe based on name range

Pass an name range -> start:end in column segment of loc. It will return a subset of the Dataframe containing only the columns from name start to end, from the original dataframe. For example,

# Select multiple columns from Dataframe by name range

subsetDf = df.loc[:, 'Name' : 'City']

print(subsetDf)

# Select multiple columns from Dataframe by name range
subsetDf = df.loc[:, 'Name' : 'City']

print(subsetDf)

Output:

Name Age City

a jack 34 Sydeny

b Riti 30 Delhi

c Vikas 31 Mumbai

d Neelu 32 Bangalore

e John 16 New York

f Mike 17 las vegas

    Name  Age       City
a   jack   34     Sydeny
b   Riti   30      Delhi
c  Vikas   31     Mumbai
d  Neelu   32  Bangalore
e   John   16   New York
f   Mike   17  las vegas

It returned a subset of the Dataframe containing only three columns, i.e., ‘Name’ to ‘City’.

Select columns of Dataframe based on bool array

Pass a boolean array/list in the column segment of loc[]. It will return a subset of the Dataframe containing only the columns for which the corresponding value in the boolean array/list is True. For example,

# Select columns of Dataframe based on bool array

subsetDf = df.iloc[:, [True, True, False, False]]

print(subsetDf)

# Select columns of Dataframe based on bool array
subsetDf = df.iloc[:, [True, True, False, False]]

print(subsetDf)

Output:

Name Age

a jack 34

b Riti 30

c Vikas 31

d Neelu 32

e John 16

f Mike 17

    Name  Age
a   jack   34
b   Riti   30
c  Vikas   31
d  Neelu   32
e   John   16
f   Mike   17

Select a subset of Dataframe

Here we will provide the row and column segment arguments of the Dataframe.loc[]. It will return a subset of Dataframe based on the row and column names provided in row and column segments of loc[].

Select a Cell value from Dataframe

To select a single cell value from the dataframe, just pass the row and column name in the row and column segment of loc[]. For example,

# Select a Cell value from Dataframe by row and column name

cellValue = df.loc['c','Name']

print(cellValue)

# Select a Cell value from Dataframe by row and column name
cellValue = df.loc['c','Name']

print(cellValue)

Output:

Vikas

Vikas

It returned the cell value at (‘c’,’Name’).

Select subset of Dataframe based on row/column names in list

Select a subset of the dataframe. This subset should include the following rows and columns,

Rows with names ‘b’, ‘d’ and ‘f’
Columns with name ‘Name’ and ‘City’

# Select sub set of Dataframe based on row/column indices in list

subsetDf = df.loc[['b', 'd', 'f'],['Name', 'City']]

print(subsetDf)

# Select sub set of Dataframe based on row/column indices in list
subsetDf = df.loc[['b', 'd', 'f'],['Name', 'City']]

print(subsetDf)

Output:

Name City

b Riti Delhi

d Neelu Bangalore

f Mike las vegas

    Name       City
b   Riti      Delhi
d  Neelu  Bangalore
f   Mike  las vegas

It returned a subset from the calling dataframe object.

Select subset of Dataframe based on row/column name range

Select a subset of the dataframe. This subset should include the following rows and columns,

Rows from name ‘b’ to ‘e’
Columns from name ‘Name’ to ‘City’

# Select subset of Dataframe based on row and column label name range.

subsetDf = df.loc['b':'e', 'Name':'City']

print(subsetDf)

# Select subset of Dataframe based on row and column label name range.
subsetDf = df.loc['b':'e', 'Name':'City']

print(subsetDf)

Output:

Name Age City

b Riti 30 Delhi

c Vikas 31 Mumbai

d Neelu 32 Bangalore

e John 16 New York

    Name  Age       City
b   Riti   30      Delhi
c  Vikas   31     Mumbai
d  Neelu   32  Bangalore
e   John   16   New York

It returned a subset from the calling dataframe object.

Pro Tip: Changing the values of Dataframe using loc[]

loc[] returns a view object, so any changes made in the returned subset will be reflected in the original Dataframe object. For example, let’s select the row with label ‘c’ from the dataframe using loc[] and change its content,

print(df)

# Change the contents of row 'C' to 0

df.loc['c'] = 0

print(df)

print(df)

# Change the contents of row 'C' to 0
df.loc['c'] = 0

print(df)

Output:

Name Age City Country

a jack 34 Sydeny Australia

b Riti 30 Delhi India

c Vikas 31 Mumbai India

d Neelu 32 Bangalore India

e John 16 New York US

f Mike 17 las vegas US

Name Age City Country

a jack 34 Sydeny Australia

b Riti 30 Delhi India

c 0 0 0 0

d Neelu 32 Bangalore India

e John 16 New York US

f Mike 17 las vegas US

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US


    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c      0    0          0          0
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Changes made to view object returned by loc[], will also change the content of the original dataframe.

Summary:

We learned about how to use the Dataframe.loc[] with several examples.

Advertisements

Pandas Dataframe.loc[] – thisPointer

Pandas Dataframe.loc[] – Examples

Select a few rows from Dataframe

Select a single row of Dataframe

Select multiple rows from Dataframe based on list of names

Select multiple rows from Dataframe based on name range

Select rows of Dataframe based on bool array

Select rows of Dataframe based on Callable function

Select a few Columns from Dataframe

Select a single column of Dataframe

Select multiple columns from Dataframe based on list of names

Select multiple columns from Dataframe based on name range

Select columns of Dataframe based on bool array

Select a subset of Dataframe

Select a Cell value from Dataframe

Select subset of Dataframe based on row/column names in list

Select subset of Dataframe based on row/column name range

Pro Tip: Changing the values of Dataframe using loc[]

Recommend

Pandas Indexing: loc, iloc, and ix in Python

Pandas Series.nunique() – thisPointer.com

Pandas Series.unique() – thisPointer.com

Pandas Dataframe.iloc[] – thisPointer

Convert Pandas Dataframe To NumPy Array – thisPointer

Convert NumPy Array to Pandas Dataframe – thisPointer

Pretty Print a Pandas Dataframe – thisPointer

Convert JSON to a Pandas Dataframe – thisPointer

Export Pandas Dataframe to JSON – thisPointer

Drop Duplicate Rows from Pandas Dataframe – thisPointer

About Joyk