Pywedge For Automated EDA in Python

Pywedge is a python package that helps in data visualization, pre-processing, and creating baseline models as well. It is more or like an automated ML pipeline library in python. You can further fine-tune the baseline model to find the best fit for the case. In this article, we will be discussing how you can make use of Pywedge in your data-related works.

Also read: EDA – Exploratory Data Analysis: Using Python Functions

What is Pywedge?

Pywedge is a open-source python library, which helps in the data modeling and visualization process.
It creates awesome interactive visualizations for your EDA works.
On top of that, Pywedge helps in data preprocessing based on user preferred methods.
It helps in creating baseline models and therefore assists you by displaying the performance of created baseline models. You can select the best performed model.
This library provides 8 visualization types to explore your data before modeling.

Installing Pywedge

Run the below code to install and load the library into python using pip.

#Install and load the required library

pip install pywedge

import pywedge as pg

That’s perfect! Now, we are good to go further.

Data Visualization

As I already told you, this library will help you in visualizing your data by offering 8 chart types. I will use the titanic dataset to visualize the data using Pywedge charts.

Import the Data

#Load the data

import pandas as pd

df = pd.read_csv('titanic.csv')

You can see the Titanic dataset in the above picture.

As a first step, we have to set up the Pywedge charts with data and target attributes as shown in the code below. Then, you can call the make_charts() function, which in turn opens a dialog box for your data visualization. Let’s see how it works.

#Data visualization

viz = pw.Pywedge_Charts(df, c = None, y = 'Sex')

My_viz = viz.make_charts()

You can see the Pywedge dialog box in the above picture. It offers 8 different plots as shown. You can select the attributes for the X and Y axis with color and you are good to go.

I have added all 8 visualizations here for your reference. what I feel is, this will save some time and the interface is also easy to use with customizable options. So, don’t be shy of trying this library soon.

Scatter Plot in Python

Pie Chart in Python

Bar Plot in Python

Violin Plot in Python

Box Plot in Python

Dist Plot in Python

Histograms in Python

Correlation Plot in Python

Data Preprocessing in Python

This library also offers you the feature of Data pre-processing using the user preferred methods, which is awesome.

I am using a train and test dataset of titanic data. You can download them here. Run the below code, to begin with, data pre-processing.

#Preprocess the data for baseline model

blm = pw.baseline_model(train,test, c = None, y='Survived')

blm.classification_summary()

Here, I have selected minmax scaler.
I have set 20% test data size.
Categorical conversion as cat_nodes. You can go for get_dummies also.
After that, click on Run Baseline Model option.
You can see the data preprocessing report by Pywedge dashboard.

Predict Baseline Model

You have done the data visualization and plotted different graphs to understand your data better. On top of that, you have pre-processed the data and understood the feature importance of each feature in the data.

With that, you have standardized the data for the baseline models. Now, your model should be ready.

Pywedge runs different models with your data and gives the accuracy and other performance parameters of all the algorithms as shown below. You can choose the best one to predict the values.

You can see the performance of many different algorithms for our test data.

In the Pywedge dashboard, you have to click on Predict Baseline Model option.
Select the best performed algorithm.
Run the command – 'blm.predictions_baseline' to see the predicted values by your best algorithm.

For illustration purpose, I have selected Random forest and here are the predicted values with ~84% accuracy.

Wrapping Up

In conclusion, Pywedge is one of the amazing python libraries that I have ever come across. Above all, It offers many functions including data visualization, pre-processing, and creating baseline models, and predict the values. Therefore, you should definitely give it a try and I am sure you will enjoy it.

That’s all for now. Happy Python 🙂

More read: Pywedge documentation

What is Pywedge?

Installing Pywedge

Data Visualization

Scatter Plot in Python

Pie Chart in Python

Bar Plot in Python

Violin Plot in Python

Box Plot in Python

Dist Plot in Python

Histograms in Python

Correlation Plot in Python

Data Preprocessing in Python

Predict Baseline Model

Wrapping Up

Recommend

Introduction to Keywords via Understanding Searcher Psychology

滑雪品牌 Nobaday 母公司奥雪文化获数千万元 A 轮融资

Why CMDB Software is Critical to Business Operations

.NET Framework October 2021 Security and Quality Rollup

360数科与金蝶金融签订战略合作协议助力“专精特新”企业发展

aliveDomain - Advanced domain tracking tool for individuals and agencies | Produ...

An ode to oddly satisfying product experiences

一份被打回重写的招股书，看懂华英证券的“尴尬”

Create a sales kit, find salespeople, pay only for the deal

Top 24 Best Lock Screen Apps for Android

About Joyk