

Remove Columns with NaN values from a NumPy Array
source link: https://thispointer.com/remove-columns-with-nan-values-from-a-numpy-array/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this article, we will learn how to remove columns from a NumPy Array which contain NaN values.
Table Of Contents
What is NaN value?
The NaN stands for Not a Number, which is a numeric data type and it can be interpreted as a value that is undefined or unrepresentable. Usually the NaN values are used to represent the missing data in a DataFrame or a NumPy Array.
Given a NumPy array we need to Remove columns with nan values, from a 2D NumPy Array i.e delete the columns which has Nan values.
Example:
Advertisements

Given array : [[ 1 2 3 4 5] [nan, 4, nan, 2, 1], [nan, 2, 4, 1, 5], [ 3 4 3 2 1]] After removing columns with nan values : [[2. 4. 5.] [4. 2. 1.] [2. 1. 5.] [4. 2. 1.]]
There are multiple ways to remove columns with NaN values, from a NumPy Array. Lets discuss all the methods one by one with proper approach and a working code example
Delete columns containing atleast one NaN values using delete(), isnan() and any()
The delete() method is a built-in method in the numpy library. It is used to delete the elements from the given array. The delete() method takes an array and an index or array of indices as parameters. It returns a copy of array after deleting the elements at given index.
Syntax of delete()
numpy.delete(arr, obj, axis)
- Parameters:
- arr = The array from which we need to delete the elements.
- obj = index (or array of indices) of the columns to be deleted.
- axis = Axis along which elements needs to be deleted. For columns axis = 1.
- Returns:
- Returns a copy of array with the columns removed.
In this example, to delete the columns containing atleast one NaN value, we need to use any() function and isnan() function. First we will pass the given 2D NumPy Array to the isnan() function. It will return a 2D array of same size but with the boolean values. Each True value in this boolean array indicates that the corresponding value in original array is NaN.
Then pass this boolean array to the any() method. It will return an another boolean array but its length will be equal to the number of columns in original array. Each True value in this array indicates that the corresponding column in original array has any NaN value. Then pass this boolean array to the delete() method along with the given array, if the value in the boolean index is true then the corresponding column from array will be deleted.
Source Code
import numpy as np # creating numpy array arr = np.array([[1, 2, 3, 4, 5], [np.nan, 4, np.nan, 2, 1], [np.nan, 2, 4, 1, 5], [3, 4, 3, 2, 1]]) # Get an index of columns which has any NaN value index = np.isnan(arr).any(axis=0) # Delete columns with any NaN value from 2D NumPy Array arr = np.delete(arr, index,axis=1) print(arr)
Output:
[[2. 4. 5.] [4. 2. 1.] [2. 1. 5.] [4. 2. 1.]]
Delete columns containing all NaN values using delete(), isnan() and all()
This is very much similar to the above approach except that we use all() method instead of any() method.
In this example, to delete the columns containing all NaN values, we need to use all() function and isnan() function. First we will pass the given 2D NumPy Array to the isnan() function of numpy module. It will return a 2D NumPy array of equal size but with the bool values only. Each True value in this indicates that the corresponding value in original NumPy Array is NaN.
Then pass this boolean array to the all() method. It will return an another bool array containing elements equal to the number of columns in original array. Each True value in this array indicates that the corresponding column in original array has all NaN values in it. Then pass this boolean array to the delete() method along with the given array, if the value in the boolean index is True then the corresponding column from NumPy array will be deleted.
Source Code
import numpy as np # Creating numpy array arr = np.array([[np.nan, 2, 3, 4, 5], [np.nan, 4, 3, 2, 1], [np.nan, 2, 4, 1, 5], [np.nan, 4, 3, 2, 1]]) # Get an index of columns which has all NaN values index = np.isnan(arr).all(axis=0) # Delete columns with all NaN values from a 2D NumPy Array arr = np.delete(arr, index,axis=1) print(arr)
Output:
[[2. 3. 4. 5.] [4. 3. 2. 1.] [2. 4. 1. 5.] [4. 3. 2. 1.]]
Using boolean index to delete columns with any NaN value
This approach is very much similar to the previous one. Instead of the delete() method we will pass the boolean index to the array as index. The Columns in a numpy array can be accessed by passing a boolean array as index to the array.
Example
Given array : [[ 1, 2, 3, 4, 5] [ 5, 4, 3, 2, 1], [ 1, 2, 4, 1, 5], [ 3, 4, 3, 2, 1]] boolArray = [False, True, False, True, True] arr[: , boolArray] will be: [[2. 4. 5.] [4. 2. 1.] [2. 1. 5.] [4. 2. 1.]]
It selected all the columns for which index had True values.
Steps to remove columns with any NaN value:
- Import numpy library and create numpy array.
- Create a boolean array using any() and isnan() and negate it. True value in indicates the corresponding column has no NaN value
- Pass the boolean array as index to the array.
- This will return the array with the columns having NaN values deleted.
- Print the Array.
Source Code
import numpy as np # creating numpy array arr = np.array([[1, 2, 3, 4, 5], [np.nan, 4, np.nan, 2, 1], [np.nan, 2, 4, 1, 5], [3, 4, 3, 2, 1]]) # Get the indices of column with no NaN value booleanIndex = ~np.isnan(arr).any(axis=0) # Select columns which have no NaN value arr = arr[:,booleanIndex] print(arr)
Output:
[[2. 4. 5.] [4. 2. 1.] [2. 1. 5.] [4. 2. 1.]]
Using boolean index to delete columns with all nan values
This is very much similar to the approach 3, instead of the any() method we will use the all() method. The Columns in a numpy array can be accessed by passing a boolean array as index to the array
Example:
Given array : [[ 1, 2, 3, 4, 5] [ 5, 4, 3, 2, 1], [ 1, 2, 4, 1, 5], [ 3, 4, 3, 2, 1]] boolArray = [False, True, False, True, True] arr[: , boolArray] : [[2. 4. 5.] [4. 2. 1.] [2. 1. 5.] [4. 2. 1.]]
It selected all the columns for which index had True values.
Steps to remove columns with any NaN value:
- Import numpy library and create numpy array.
- Create a boolean array using all() and isnan() and negate it. False value in indicates the corresponding column has all NaN values
- Pass the boolean array as index to the array.
- This will return the array with the columns with all NaN values deleted.
- Print the Array.
Source Code
import numpy as np # creating numpy array arr = np.array([[np.nan, 2, 3, 4, 5], [np.nan, 4, np.nan, 2, 1], [np.nan, 2, 4, 1, 5], [np.nan, 4, 3, 2, 1]]) # Get the indices of columns in which all values are not NaN booleanIndex = ~np.isnan(arr).all(axis=0) # Select columns in which all values are not NaN arr = arr[:,booleanIndex] print(arr)
Output:
[[ 2. 3. 4. 5.] [ 4. nan 2. 1.] [ 2. 4. 1. 5.] [ 4. 3. 2. 1.]]
Summary
Great! you made it, We have discussed All possible methods to Remove Columns with NaN values in NumPy Array. Happy learning
Pandas Tutorials -Learn Data Analysis with Python
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.
Join a LinkedIn Community of Python Developers
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK