

Don’t Miss out on Missingno for Identifying Missing Data!
source link: https://mc.ai/dont-miss-out-on-missingno-for-identifying-missing-data/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Don’t Miss out on Missingno for Identifying Missing Data!
The library you should be using to visualize missing data
Missingno from Pokemon was a glitch in the game that allowed you to max out on items. Looking at it though, you can kind of think of it as a long data set with missing values. Data in the wild is not always complete. You need to pick up on that immediately, investigate why there is missing data, and potentially catch values that you can fill that missing data with or discard them. The filling in or discarding depends on the task. But again, you can’t do any of that without first identifying those missing values. That is where the Missingno library comes in that you will learn how to use here.
Missingno, the Python version
Missingno is a python library created by Aleksey Bilogur to visualize the missing values in your data. With it, you can get a quick sense of what data is missing from your data set. It is one of the first things I do with any data I get before performing any major tasks with it.
That is true. If your data is maybe at most 8 to 10 columns, you could take a quick glance. But what if you have data with 20 columns? 30 columns? 40 columns? Are you still going to be able to go through and read the total non-null value counts in each column faster than looking at a visualization of it like this?
Immediately your eye will move to the bars that have streaks of white. This is how quickly you can pick up which columns are missing data. This was also on a random data set with 23 columns by the way. Now, let’s get you using this library! Go type in pip install missingno
into your terminal and let’s go!
Setting Up
We of course have to download some libraries alongside Missingno. These are the default libraries I import for any data task. We will go ahead and upload our data also. The data is from this site after a quick search on Google’s Dataset Search engine for “missing values”.
Pro-tip, use sns.set()
. It gives your charts a seaborn background by default. This will be really useful when we do the bar chart. But first, we must view the matrix.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK