2

Remove rows with NaN values from Numpy Array – Python

 2 years ago
source link: https://thispointer.com/remove-rows-with-nan-values-from-numpy-array-python/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Remove rows with NaN values from Numpy Array – Python

In this article, we will learn how to remove rows with NaN values from a NumPy Array.

Table Of Contents

The NaN stands for Not a Number, which is a numeric data type. It can be interpreted as a value that is undefined or unrepresentable. Usually NaN values are used to represent the missing data in a DataFrame or a NumPy Array. Given a NumPy array we need to delete the rows with NaN values in NumPy Array i.e delete the rows which has any Nan value.

Example:
Given array:
[[ 1 2 3 4 5]
[ 5 nan 3 2 1]
[ 1 2 nan 1 5]
[ 3 4 3 2 1]]
After removing rows with any NaN value:
[[ 1 2 3 4 5]
[ 3 4 3 2 1]]
Example:             

Given array:

[[ 1  2  3  4  5]
 [ 5  nan  3  2  1]
 [ 1  2  nan  1  5]
 [ 3  4  3  2  1]]

After removing rows with any NaN value:

[[ 1  2  3  4  5]
 [ 3  4  3  2  1]]

There are multiple ways to Remove rows with any NaN value from a NumPy Array. Lets discuss all the methods one by one with proper approach and a working code example

Use delete() method and boolean index to delete rows containing atleast one Nan value

The delete() mehtod is a built-in method in numpy library. The delete() method is used to delete the elements from the given array, the delete method takes array and a index or array of indexes as parameters. It returns a new array by deleting the elements at given index.

Advertisements

vid5e6258f9da92c874459691.jpg?cbuster=1600267117
00:00/11:46
liveView.php?hash=ozcmPTEznXRiPTEzqzyxX2V2ZW50PTUjJaNypaZypyRcoWU9MTY1MTU5NDUmMSZ2nWRspGkurWVlVzVlPTMhMS4jJaM9MTAkMwx3JaN0YT0jJat9NDUmJax9MmI1JaZcZF9jYXNmRG9gYWyhPXRbnXNjo2yhqGVlLzNioSZmqWJJZD10nGympG9coaRypv5wo20zZGVvqWqJozZipz1uqGyiow0znXNBpHA9MCZlnT02QmY5NmY2NTUmNmQ2MTp0NmM3QmpmNxImMTqCNTQmMDqEN0I2NDMlMmAmMwMlMxQmMDM1MxQmMDMmNUYmMTM5N0Q3QwpmMmEmMwMmMmQmOTM2MmQmOTqEN0I0MmMkMmpmMwqEN0I1MmY0NDp2ODpjNwMmMmQlNmY2MTU3MmUmMDVBNTt0OTp1NTxmMwM5NmQ3RDqCNwI2MmY4NmI2RwZENwU3RDqCNmE2NDY1NmM2Qwp0NxY3MDqEN0I2RwZDNwx2RTp1Nmt3RDqCNTtmNDM1MmM3RDqCNTxmMmMlMmU3RDqCNwYmMTqEN0I0QmMkMmImNTMlMmE3REZFRxUzZGyunWQ9JaVmZXJJpEFxZHI9MTQkLwE2NC42Ml4kNwQzqXNypyVBPU1irzyfoGEyMxY1LwAyMwAyMwuYMTEyM0IyMwBMnW51rCUlMHt4Ny82NCUlOSUlMEFjpGkyV2VvS2y0JTJGNTM3LwM2JTIjJTI4S0uUTUjyMxMyMwBfnWgyJTIjR2Vwn28yMwxyMwBDnHJioWUyMxY3Nl4jLwM4NwUhMTIjJTIjU2FzYXJcJTJGNTM3LwM2JzNmqXVcZD02MwpkNTUlMzJvNTuyJzNioaRyoaRGnWkySWQ9MCZgZWRcYVBfYXyMnXN0SWQ9MCZgZWRcYUkcp3RJZD0jJzqxpHI9MCZaZHBlQ29hp2VhqD0znXNXZVBup3NHZHBlPTEzY2NjYT0jJzNwpGFDo25mZW50PSZwYaVmqGVlPTE2NTE1OTQ1MmMmNDpzqWyxPVNyn2yhZG9TUGkurWVlNwI3MTU1MwM4OWE2OCZjqWJVpzj9nHR0pHMyM0EyMxYyMxZ0nGympG9coaRypv5wo20yMxZlZW1iqzUgpz93pl13nXRbLW5uov12YWk1ZXMgZaJioS1hqW1jrS1upaJurS1jrXRbo24yMxYzZzkiYXRTqGF0qXM9ZzFfp2UzZWyxp3A9pHJyYzyx

Syntax of delete()

numpy.delete(arr, obj)
numpy.delete(arr, obj)

Parameters:

arr = The array to be passed to the function.
obj = index (or array of index) of the rows to be deleted.
    arr          = The array to be passed to the function.
    obj          = index (or array of index)  of the rows to be deleted.

Returns:

Returns array with the rows removed.
    Returns array with the rows removed.

To delete the rows containing atleast one Nan value, we need to use any() and isnan() function. First we will pass the given array to the isnan() and it will return a 2D array of same size but with the boolean values. This bool array contains True for the NaN values and False for all others. Then iterate over all rows in this 2D array and for each row call the any() function and store the values in a list.

This list will contain elements equal to the number of rows. For the row that has any NaN value, the corresponding value in this list will be True. Pass this boolean index list to the delete() method along with the given array. It will return an array after deleting all rows with any NaN value.

For example

import numpy as np
# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
[np.nan, 4, np.nan, 2, 1],
[np.nan, 2, 4, 1, 5],
[3, 4, 3, 2, 1]])
# Get boolean index list of rows with True values for the rows
# that has any NaN values
indexList = [np.any(i) for i in np.isnan(arr)]
# delete all the rows with any NaN value
arr = np.delete(arr, indexList, axis=0)
print(arr)
import numpy as np

# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
                [np.nan, 4, np.nan, 2, 1],
                [np.nan, 2, 4, 1, 5],
                [3, 4, 3, 2, 1]])


# Get boolean index list of rows with True values for the rows
# that has any NaN values
indexList = [np.any(i) for i in np.isnan(arr)]

# delete all the rows with any NaN value
arr = np.delete(arr, indexList, axis=0)

print(arr)

Output

[[1. 2. 3. 4. 5.]
[3. 4. 3. 2. 1.]]
[[1. 2. 3. 4. 5.]
 [3. 4. 3. 2. 1.]]

It deleted all the rows from NumPy Array which had any NaN value.

Use delete() method and boolean index to delete rows if entire row has NaN values

This is very much similar to the above approach except that we use all() method instead of any() method. To delete the rows if the entire row has nan values, we need to use the all() and the isnan() function.

First we need to pass the given array to the isnan() function and it returns a 2D array of same size but with the boolean values. This 2D bool array contains True for the all the NaN values and False for all the other values. Then iterate over all rows in this 2D array and for each row call the all() function and store the values in a list.

This list will contain elements equal to the number of rows. For the row that has all the NaN values, the corresponding value in this list will be True. Pass this boolean index list to the delete() method along with the given array. It will return a 2D NumPy Array after deleting all rows with all NaN values.

For Example

import numpy as np
# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
[np.nan,np.nan, np.nan,np.nan, np.nan],
[np.nan, 2, 4, 1, 5],
[3, 4, 3, 2, 1]])
# Get boolean index list of rows with True values for the rows
# that has all NaN values
indexList = [np.all(i) for i in np.isnan(arr)]
# delete all the rows with all NaN value
arr = np.delete(arr, indexList, axis=0)
print(arr)
import numpy as np

# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
                [np.nan,np.nan, np.nan,np.nan, np.nan],
                [np.nan, 2, 4, 1, 5],
                [3, 4, 3, 2, 1]])


# Get boolean index list of rows with True values for the rows
# that has all NaN values
indexList = [np.all(i) for i in np.isnan(arr)]

# delete all the rows with all NaN value
arr = np.delete(arr, indexList, axis=0)

print(arr)

Output:

[[ 1. 2. 3. 4. 5.]
[nan 2. 4. 1. 5.]
[ 3. 4. 3. 2. 1.]]
[[ 1.  2.  3.  4.  5.]
 [nan  2.  4.  1.  5.]
 [ 3.  4.  3.  2.  1.]]

Use boolean index to delete rows if the rows has any NaN value

This is very much similar to the above, instead of the delete() method we will pass the boolean index to the array. The Rows in a numpy array can be accesed by passing a boolean array as index to the array

Example:
arr = [ [1, 2, 3, 4, 5],
[5, 4, 3, 2, 1],
[8, 2, 4, 1, 5],
[3, 4, 3, 2, 1],
[7, 6, 3, 4, 5]]
boolArray = [True, True, False, False, False]
arr[boolArray] ===> this will give [[1, 2, 3, 4, 5],
[5, 4, 3, 2, 1]]
Example:             
        arr = [ [1, 2, 3, 4, 5],
                [5, 4, 3, 2, 1],
                [8, 2, 4, 1, 5],
                [3, 4, 3, 2, 1],
                [7, 6, 3, 4, 5]]

        boolArray = [True, True, False, False, False]

        arr[boolArray]  ===> this will give [[1, 2, 3, 4, 5],
                                             [5, 4, 3, 2, 1]]

This approach is similar to first one but instead of using the delete() function we will use the [] opeartor of NumPy array to select only those rows do not have NaN value.

First we need to pass the given array to the isnan() function and it returns a 2D array of same size but with the boolean values. This 2D bool array contains True for the all the NaN values and False for all the other values. Then iterate over all rows in this 2D array and for each row call the any() function and get a negate of that using the not operator . Then store the values in a list.

This list will contain elements equal to the number of rows. For the row that does not have any NaN values, the corresponding value in this list will be True. Pass this boolean index list to the [] operator of given array. It will return a 2D NumPy Array after deleting all rows with any NaN values.

For example

import numpy as np
# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
[np.nan, 4, np.nan, 2, 1],
[np.nan, 2, 4, 1, 5],
[3, 4, 3, 2, 1]])
# Delete all rows with any NaN value
booleanIndex = [not np.any(i) for i in np.isnan(arr)]
arr = arr[booleanIndex]
print(arr)
import numpy as np

# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
                [np.nan, 4, np.nan, 2, 1],
                [np.nan, 2, 4, 1, 5],
                [3, 4, 3, 2, 1]])

# Delete all rows with any NaN value
booleanIndex = [not np.any(i) for i in np.isnan(arr)]
arr = arr[booleanIndex]

print(arr)

Output:

[[1. 2. 3. 4. 5.]
[3. 4. 3. 2. 1.]]
[[1. 2. 3. 4. 5.]
 [3. 4. 3. 2. 1.]]

Use boolean index to delete rows if entire row has nan values

This is very much similar to the previous approach. But instead of the any() method we will use the all() method.

For example

import numpy as np
# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
[np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, 2, 4, 1, 5],
[3, 4, 3, 2, 1]])
# Delete all rows with all NaN value
booleanIndex = [not np.all(i) for i in np.isnan(arr)]
arr = arr[booleanIndex]
print(arr)
import numpy as np

# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
                [np.nan, np.nan, np.nan, np.nan, np.nan],
                [np.nan, 2, 4, 1, 5],
                [3, 4, 3, 2, 1]])

# Delete all rows with all NaN value
booleanIndex = [not np.all(i) for i in np.isnan(arr)]
arr = arr[booleanIndex]

print(arr)

Output:

[[ 1. 2. 3. 4. 5.]
[nan 2. 4. 1. 5.]
[ 3. 4. 3. 2. 1.]]
[[ 1.  2.  3.  4.  5.]
 [nan  2.  4.  1.  5.]
 [ 3.  4.  3.  2.  1.]]

Summary

Great! you made it, We have disussed all possible methods to delete rows with NaN values in a NumPy Array. Happy learning.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK