Reunderstanding The Mathematics Behind Principal Component Analysis

As we all know, Principal Component Analysis (PCA) is a dimensionality reduction algorithm that can be used to significantly speed up your unsupervised feature learning algorithm. Most of us just know the procedure of PCA. In other words, we know how to use the algorithm but do not know how it comes. In this post, I summarize the procedure and mathematics of PCA based on materials of reference.

Procedure of PCA

Suppose we have a dataset x(1),x(2),…,x(m)x(1),x(2),…,x(m) with n dimension inputs. We want to reduce the data from nn dimension to kk dimension (k<<n)(k<<n) using PCA. The procedure of PCA is shown below (Owing to the mathematical formual rendering problem, I use picture to display the procedure):

Procedure%20of%20PCA.png

Mathematics behind PCA

In the previous section, I present the procedure of PCA. What a simple algorithm it looks like. However, do we actually know why we calculate the eigenvector of covariance matrix, getting the basis of new subspace.

The motivation of PCA is remaining as much information of raw data as possible by projecting raw data into lower subspace. In other words, we hope remain as much variance of raw data as possible in the new space.

In the formula (5), we see the projection of raw data. The length (variance) of the projection of xx onto uu is given by xTuxTu. I.e., if x(i)x(i) is a point in the dataset, then its projection onto uu is distance xTuxTu from the origin. Hence to maximize the variance, we can formulate it as an optimization problem:

max1mΣmi=1(x(i)Tu)2(6)(6)max1mΣi=1m(x(i)Tu)2

s.t.∥u∥=1(7)(7)s.t.‖u‖=1

This is a maximization problem with constraint. It can be reformulated as:

maxuT(1mΣmi=1x(i)x(i)T)u=uTΣu(8)(8)maxuT(1mΣi=1mx(i)x(i)T)u=uTΣu

uTu=1(9)(9)uTu=1

To get the (local) optimum of the constrained problem, we use the method of Lagrange multipliers:

L(u,λ)=uTΣu−λ(uTu−1)(10)(10)L(u,λ)=uTΣu−λ(uTu−1)

Note that λλ is the Lagrange multipliers not the eigenvalue of covariance matrix. Now let us take the partial derivatives of formula (10) with respect to uu and λλ. Then let the derivatives equal to zero:

∂L∂u=uTΣ−λu=0(11)(11)∂L∂u=uTΣ−λu=0

∂L∂λ=uTu−1=0(12)(12)∂L∂λ=uTu−1=0

We can see that the formula (11) is equivalent to formula (3). Thus choosing the new basis of lower space is equivalent to getting the top-k eigenvector of covariance matrix of raw data.

See! The dimensionality reduction problem is intrinsically a constrained maximization problem. We use the method of Lagrangian multiplier to solve the problem.

Reference

http://59.80.44.100/cs229.stanford.edu/notes/cs229-notes10.pdf

http://ufldl.stanford.edu/wiki/index.php/PCA

Procedure of PCA

Mathematics behind PCA

Reference

Recommend

D | DevFreeBooks

Resourse Hints 知多少

scikit-learn 笔记 - 非监督学习

AVX notes and links

浅析JSP型内存马

编辑器 (Vim)

AutoML Reading Note 2

Avoiding wildcard reply URLs with MSAL.js

怎样使用Python打造免杀payload

Install · wekan/wekan-snap Wiki · GitHub

About Joyk