Pywedge For Automated EDA in Python
source link: https://www.journaldev.com/53063/pywedge-automated-eda-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Pywedge is a python package that helps in data visualization, pre-processing, and creating baseline models as well. It is more or like an automated ML pipeline library in python. You can further fine-tune the baseline model to find the best fit for the case. In this article, we will be discussing how you can make use of Pywedge in your data-related works.
Also read: EDA – Exploratory Data Analysis: Using Python Functions
What is Pywedge?
- Pywedge is a open-source python library, which helps in the data modeling and visualization process.
- It creates awesome interactive visualizations for your EDA works.
- On top of that, Pywedge helps in data preprocessing based on user preferred methods.
- It helps in creating baseline models and therefore assists you by displaying the performance of created baseline models. You can select the best performed model.
- This library provides 8 visualization types to explore your data before modeling.
Installing Pywedge
Run the below code to install and load the library into python using pip.
#Install and load the required library
pip install pywedge
import
pywedge as pg
That’s perfect! Now, we are good to go further.
Data Visualization
As I already told you, this library will help you in visualizing your data by offering 8 chart types. I will use the titanic dataset to visualize the data using Pywedge charts.
Import the Data
#Load the data
import
pandas as pd
df
=
pd.read_csv(
'titanic.csv'
)
You can see the Titanic dataset in the above picture.
As a first step, we have to set up the Pywedge charts with data and target attributes as shown in the code below. Then, you can call the make_charts() function, which in turn opens a dialog box for your data visualization. Let’s see how it works.
#Data visualization
viz
=
pw.Pywedge_Charts(df, c
=
None
, y
=
'Sex'
)
My_viz
=
viz.make_charts()
You can see the Pywedge dialog box in the above picture. It offers 8 different plots as shown. You can select the attributes for the X and Y axis with color and you are good to go.
I have added all 8 visualizations here for your reference. what I feel is, this will save some time and the interface is also easy to use with customizable options. So, don’t be shy of trying this library soon.
Scatter Plot in Python
Pie Chart in Python
Bar Plot in Python
Violin Plot in Python
Box Plot in Python
Dist Plot in Python
Histograms in Python
Correlation Plot in Python
Data Preprocessing in Python
This library also offers you the feature of Data pre-processing using the user preferred methods, which is awesome.
I am using a train and test dataset of titanic data. You can download them here. Run the below code, to begin with, data pre-processing.
#Preprocess the data for baseline model
blm
=
pw.baseline_model(train,test, c
=
None
, y
=
'Survived'
)
blm.classification_summary()
- Here, I have selected minmax scaler.
- I have set 20% test data size.
- Categorical conversion as cat_nodes. You can go for get_dummies also.
- After that, click on Run Baseline Model option.
- You can see the data preprocessing report by Pywedge dashboard.
Predict Baseline Model
You have done the data visualization and plotted different graphs to understand your data better. On top of that, you have pre-processed the data and understood the feature importance of each feature in the data.
With that, you have standardized the data for the baseline models. Now, your model should be ready.
Pywedge runs different models with your data and gives the accuracy and other performance parameters of all the algorithms as shown below. You can choose the best one to predict the values.
You can see the performance of many different algorithms for our test data.
- In the Pywedge dashboard, you have to click on Predict Baseline Model option.
- Select the best performed algorithm.
- Run the command –
'blm.predictions_baseline'
to see the predicted values by your best algorithm.
- For illustration purpose, I have selected Random forest and here are the predicted values with ~84% accuracy.
Wrapping Up
In conclusion, Pywedge is one of the amazing python libraries that I have ever come across. Above all, It offers many functions including data visualization, pre-processing, and creating baseline models, and predict the values. Therefore, you should definitely give it a try and I am sure you will enjoy it.
That’s all for now. Happy Python 🙂
More read: Pywedge documentation
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK