3

Pywedge For Automated EDA in Python

 2 years ago
source link: https://www.journaldev.com/53063/pywedge-automated-eda-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Pywedge is a python package that helps in data visualization, pre-processing, and creating baseline models as well. It is more or like an automated ML pipeline library in python. You can further fine-tune the baseline model to find the best fit for the case. In this article, we will be discussing how you can make use of Pywedge in your data-related works.

Also read: EDA – Exploratory Data Analysis: Using Python Functions


What is Pywedge?

  • Pywedge is a open-source python library, which helps in the data modeling and visualization process.
  • It creates awesome interactive visualizations for your EDA works.
  • On top of that, Pywedge helps in data preprocessing based on user preferred methods.
  • It helps in creating baseline models and therefore assists you by displaying the performance of created baseline models. You can select the best performed model.
  • This library provides 8 visualization types to explore your data before modeling.

Installing Pywedge

Run the below code to install and load the library into python using pip.

#Install and load the required library
pip install pywedge
import pywedge as pg

That’s perfect! Now, we are good to go further.


Data Visualization

As I already told you, this library will help you in visualizing your data by offering 8 chart types. I will use the titanic dataset to visualize the data using Pywedge charts.

Import the Data

#Load the data
import pandas as pd
df = pd.read_csv('titanic.csv')
Titanic 2

You can see the Titanic dataset in the above picture.

As a first step, we have to set up the Pywedge charts with data and target attributes as shown in the code below. Then, you can call the make_charts() function, which in turn opens a dialog box for your data visualization. Let’s see how it works.

#Data visualization
viz = pw.Pywedge_Charts(df, c = None, y = 'Sex')
My_viz = viz.make_charts()
Pywedge Dailog Box

You can see the Pywedge dialog box in the above picture. It offers 8 different plots as shown. You can select the attributes for the X and Y axis with color and you are good to go.

I have added all 8 visualizations here for your reference. what I feel is, this will save some time and the interface is also easy to use with customizable options. So, don’t be shy of trying this library soon.

Scatter Plot in Python

Pywedge Scatter Plot

Pie Chart in Python

Pywedge Pie Chart

Bar Plot in Python

Pywedge Bar Plot

Violin Plot in Python

Pywedge Violin Plot

Box Plot in Python

Pywedge Box Plot

Dist Plot in Python

Pywedge Dist Plot

Histograms in Python

Histogram

Correlation Plot in Python

Correlation Plot

Data Preprocessing in Python

This library also offers you the feature of Data pre-processing using the user preferred methods, which is awesome.

I am using a train and test dataset of titanic data. You can download them here. Run the below code, to begin with, data pre-processing. 

#Preprocess the data for baseline model
blm = pw.baseline_model(train,test, c = None, y='Survived')
blm.classification_summary()
Blm
  • Here, I have selected minmax scaler.
  • I have set 20% test data size.
  • Categorical conversion as cat_nodes. You can go for get_dummies also.
  • After that, click on Run Baseline Model option.
  • You can see the data preprocessing report by Pywedge dashboard.

Predict Baseline Model

You have done the data visualization and plotted different graphs to understand your data better. On top of that, you have pre-processed the data and understood the feature importance of each feature in the data.

With that, you have standardized the data for the baseline models. Now, your model should be ready.

Pywedge runs different models with your data and gives the accuracy and other performance parameters of all the algorithms as shown below. You can choose the best one to predict the values.

Baseline Model

You can see the performance of many different algorithms for our test data.

  • In the Pywedge dashboard, you have to click on Predict Baseline Model option.
  • Select the best performed algorithm.
  • Run the command – 'blm.predictions_baseline' to see the predicted values by your best algorithm.
Blm Predict
  • For illustration purpose, I have selected Random forest and here are the predicted values with ~84% accuracy.

Wrapping Up

In conclusion, Pywedge is one of the amazing python libraries that I have ever come across. Above all, It offers many functions including data visualization, pre-processing, and creating baseline models, and predict the values. Therefore, you should definitely give it a try and I am sure you will enjoy it.

That’s all for now. Happy Python 🙂

More read: Pywedge documentation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK