3

Dimensionality Reduction with Python

 3 years ago
source link: https://towardsdatascience.com/dimensionality-reduction-with-python-d3fac4e57c71
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Dimensionality Reduction with Python

All you need to know about feature engineering

Image for post
Image for post
Source: https://unsplash.com/photos/Kp9z6zcUfGw

When building a machine learning model, most likely you will not use all the variables available in your training dataset. In fact, training datasets with hundreds or thousands of features are not uncommon, creating a problem for the beginner data scientist who might struggle to decide what variables to include in her model.

There are two main groups of techniques that we can use in dimensionality reduction: feature selection and feature extraction.

For example, a 256 × 256–pixel color image can be transformed into 196,608 features. Furthermore, because each of these pixels can take one of 256 possi‐ ble values, there ends up being 256196608 different configurations our observation can take.

This is problematic because we will practically never be able to collect enough observations to cover even a small fraction of those configurations and our learning algorithms do not have enough data to operate correctly.

Fortunately, not all features are created equal and the goal of feature extraction for dimensionality reduction is to transform our set of features in such a way that we end up with a smaller number of features, but still keeping much of the underlying information.

OK, so let’s see some hands-on Python examples starting with feature extraction techniques.

Feature Extraction

These methods work by creating new features with fewer dimensions than the original ones and similar predictive power.

Principal Components Analysis: PCA is a popular linear dimensionality reduction technique. PCA projects observations onto the (hopefully fewer) principal components of the feature matrix that retain the most variance. PCA is an unsupervised technique, meaning that it does not use the information from the target vector and instead only considers the feature matrix.

Let’s see an example of PCA with the digits dataset that you can load from Scikit Learn library, we can reduce dimensions just like this:

Creating a PCA that keeps 99% of the original variance

Now, in case your data is not linearly separable, you can use an extension of principal component analysis that uses kernels to allow for nonlinear dimensionality reduction. Let’s practice this with the make_circles dataset from Scikit Learn:

We apply Kernel PCA with radius basis function kernel (RBF)

This results in the number of features being reduced from 2 to 1.

Matrix Factorization: NMF is an unsupervised technique for linear dimensionality reduction that factorizes (i.e., breaks up into multiple matrices whose product approximates the original matrix) the feature matrix into matrices representing the latent relationship between observations and their features.

Reducing dimensions of digits dataset with NMF

Feature Selection

These methods work by selecting high-quality, informative features and dropping less useful features. This is called feature selection.

Highly Correlated Features: we can use a correlation matrix to check for highly correlated features. If highly correlated features exist, we drop one of the correlated features, just like in this example:

We create correlated values and then drop with correlation those higher than 0.95

Irrelevant Features in Classification: if the features are categorical, we can use the chi-square (χ2 ) statistic between each feature and the target vector, we can do this with the popular Iris dataset just like so:

We use Chi2 to select the 2 best features

Recursive Feature Elimination: Scikit-learn also has a method called RFECV to conduct recursive feature elimination (RFE) using cross-validation (CV). That is, repeatedly train a model, each time removing a feature until model performance (e.g., accuracy) becomes worse. The remaining features are the best, like in this example:

Using mean squared error to select the best features out of 100 different variables

Conclusion

Well, as we have seen both feature extraction and feature selection can help us reduce dimensionality. Feature selection (supervised or unsupervised) is for filtering irrelevant or redundant features from your dataset. The key difference between feature selection and extraction is that feature selection keeps a subset of the original features while feature extraction creates brand new ones.

Of course, there are some supervised algorithms that already have built-in feature selection, for example, you can use Regularized Regression or Random Forests and don’t need to care about dimensions.

However, if these algorithms don’t suit your particular problem and the number of features is large, you will want to use some of the techniques described in this post to reduce dimensionality. You can even combine multiple methods if needed.

Alright, so that was it for today, hope you enjoyed this post.

Happy coding!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK