Visualizing Support Vector Machine Decision Boundary
source link: https://www.tuicool.com/articles/R3Q3q2Y
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Decision Boundary (Picture: Author’s Own Work, Saitama, Japan)
In a previous post I have described about principal component analysis (PCA) in detail and, the mathematics behind support vector machine (SVM) algorithm in another. Here, I will combine SVM, PCA, and Grid-search Cross-Validation to create a pipeline to find best parameters for binary classification and eventually plot a decision boundary to present how good our algorithm has performed. What you expect to learn/review in this post —
- Joint-plots and representing data in a meaningful way through Seaborn Library .
- If you have more than 2 components in principal component analysis, how to choose and represent which 2 components are more relevant than others?
- Creating a pipeline with PCA, and SVM to find best fit parameters through grid search cross-validation.
- Finally, we choose the 2 principal components to represent SVM decision boundary in a 3d/2d plot, drawn using Matplotlib .
1. Know the Data-Set Better: Joint-plots and Seaborn
Here, I have used scikit-learn cancer data-set , relatively easy data-set for studying binary classification, with 2 classes being Malignant and Benign. Let’s look at the few rows of the data-frames.
As we can see there are total 569 samples and 30 features in the data-set and, our task is to classify malignant samples from benign samples. After checking that there areno missing data, we check the feature names and check correlation plots of the mean features.
Below is the correlation plot of mean features potted using seaborn library . As expected ‘area’, ‘perimeter’, and ‘radius’ are highly correlated.
Fig. 1: Correlation plot of mean features.
We can use ‘ seaborn jointplot ’ to understand relationship between individual features. Let’s see 2 examples below, where as an alternative of scatter plots, I have opted for 2D density plots. On the right panel, I used ‘hex’ setting, where along with histograms, we can understand the concentration of number of points in a small hexagonal area. Darker the hexagon, more number of points (observations) fall in that region and this intuition can also be checked with the histograms plotted on the boundaries for the 2 features.
Fig. 2: Joint-plots can carry more info than simple scatter plots.
On the left, apart from the histogram of individual features that are plotted on the boundaries, the contours are representing the 2D kernel density estimation (KDE). Instead of just discrete histograms, KDE’s are often useful and, you can find one fantastic explanation here .
We can also plot some pair plots to study which features are kind of ‘ more relevant ’ to classify malignant from benign samples. Let’s see one example below —
Fig. 3: Pair plots of few features in Cancer data-set. Code can be found in my GitHub.
Once we have played enough with the data-set to explore and understand what we have got in hand, then, let’s move towards the main classification task.
Aggregate valuable and interesting links.
Joyk means Joy of geeK