4

What Is the Singular Value Decomposition?

 3 years ago
source link: https://nhigham.com/2020/10/13/what-is-the-singular-value-decomposition/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

What Is the Singular Value Decomposition?

A singular value decomposition (SVD) of a matrix A\in\mathbb{R}^{m\times n} is a factorization

\notag     A = U\Sigma V^T,

where U\in\mathbb{R}^{m\times m} and V\in\mathbb{R}^{n\times n} are orthogonal, \Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}, where p = \min(m,n), and \sigma_1\ge \sigma_2\ge \cdots \ge \sigma_p \ge 0.

Partition U =[ u_1,\dots,u_m] and V = [v_1,\dots, v_n]. The \sigma_i are called the singular values of A and the u_i and v_i are the left and right singular vectors. We have Av_i = \sigma_i u_i, i = 1 \colon p. The matrix \Sigma is unique but U and V are not. The form of \Sigma is

\notag  \Sigma =       \left[\begin{array}{ccc}\sigma_1&&\\ &\ddots&\\& &\sigma_n\\\hline         &\rule{0cm}{15pt} \text{\Large 0} &      \end{array}\right]      \mathrm{for}~ m \ge n, \quad    \Sigma =     \begin{bmatrix}         \begin{array}{ccc|c@{\mskip5mu}}\sigma_1&&\\ &\ddots&               & \text{\Large 0}           \\& &\sigma_m\end{array}\\      \end{bmatrix} \mathrm{for}~ m \le n

Here is an example, in which the entries of A have been specially chosen to give simple forms for the elements of the factors:

\notag A = \left[\begin{array}{rr} 0 & \frac{4}{3}\\[\smallskipamount]                        -1 & -\frac{5}{3}\\[\smallskipamount]                        -2 & -\frac{2}{3} \end{array}\right] = \underbrace{ \displaystyle\frac{1}{3} \left[\begin{array}{rrr} 1 & -2 & -2\\ -2 & 1 & -2\\ -2 & -2 & 1 \end{array}\right] }_U \mskip5mu \underbrace{ \left[\begin{array}{cc} 2\,\sqrt{2} & 0\\ 0 & \sqrt{2}\\ 0 & 0 \end{array}\right] }_{\Sigma} \mskip5mu \underbrace{ \displaystyle\frac{1}{\sqrt{2}} \left[\begin{array}{cc} 1 & 1\\ 1 & -1  \end{array}\right] }_{V^T}.

The power of the SVD is that it reveals a great deal of useful information about norms, rank, and subspaces of a matrix and it enables many problems to be reduced to a trivial form.

Since U and V are nonsingular, \mathrm{rank}(A) = \mathrm{rank}(\Sigma) = r, where r \le p is the number of nonzero singular values. Since the 2-norm and Frobenius norm are invariant under orthogonal transformations, \|A\| = \|\Sigma\| for both norms, giving

\notag   \|A\|_2 = \sigma_1, \quad   \|A\|_F = \Bigl(\displaystyle\sum_{i=1}^r \sigma_i^2\Bigr)^{1/2},

and hence \|A\|_2 \le \|A\|_F \le r^{1/2} \|A\|_2. The range space and null space of A are given in terms of the columns of U and V by

\notag \begin{aligned} \mathrm{null}(A)  &= \mathrm{span} \{ v_{r+1}, \dots,v_n \},\\ \mathrm{range}(A) &= \mathrm{span} \{u_1,u_2,\dots, u_r\}. \end{aligned}

We can write the SVD as

\notag \qquad\qquad     A = \begin{bmatrix} u_1, u_2 \dots, u_r \end{bmatrix}         \mathrm{diag}(\sigma_1,\dots, \sigma_r)        \begin{bmatrix} v_1^T\\ v_2^T\\ \vdots\\ v_r^T \end{bmatrix}     = \displaystyle\sum_{i=1}^{r} \sigma_i u_i v_i^T,  \qquad\qquad(*)

which expresses A as a sum of r rank-1 matrices, the ith of which has 2-norm \sigma_i. The famous Eckart–Young theorem (1936) says that

\notag  \min_{\mathrm{rank}(B) = k} \|A-B\|_q =  \begin{cases}      \sigma_{k+1},                                & q = 2,  \\      \Bigl(\sum_{i=k+1}^r \sigma_i^2\Bigr)^{1/2}, & q = F,   \end{cases}

and that the minimum is attained at

\notag    A_k = U D_k V^T, \quad    D_k = \mathrm{diag}(\sigma_1, \dots, \sigma_k, 0, \dots, 0).

In other words, truncating the sum (*) after k < r terms gives the best rank-k approximation to A in both the 2-norm and the Frobenius norm. In particular, this result implies that when A has full rank the distance from A to the nearest rank-deficient matrix is \sigma_r.

Relations with Symmetric Eigenvalue Problem

The SVD is not directly related to the eigenvalues and eigenvectors of A. However, for m\ge n, A = U \Sigma V^T implies

\notag  A^T\!A = V \mathrm{diag}(\sigma_1^2,\dots,\sigma_n^2) V^T, \quad  AA^T = U \mathrm{diag}(\sigma_1^2,\dots,\sigma_n^2,\underbrace{0,\dots,0}_{m-n}) U^T,

so the singular values of A are the square roots of the eigenvalues of the symmetric positive semidefinite matrices A^T\!A and AA^T (modulo m-n zeros in the latter case), and the singular vectors are eigenvectors. Moreover, the eigenvalues of the (m+n)\times (m+n) matrix

\notag     C = \begin{bmatrix}               0 & A \\               A^T & 0         \end{bmatrix}

are plus and minus the singular values of A, together with |m-n| additional zeros if m \ne n, and the eigenvectors of C and the singular vectors of A are also related.

Consequently, by applying results or algorithms for the eigensystem of a symmetric matrix to A^T\!A, AA^T, or C one obtains results or algorithms for the singular value decomposition of A.

Connections with Other Problems

The pseudoinverse of a matrix A\in\mathbb{R}^{n\times n} can be expressed in terms of the SVD as

\notag    A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^T.

The least squares problem \min_x \|b - Ax\|_2, where A\in\mathbb{R}^{m\times n} with m \ge n is solved by x = A^+b, and when A is rank-deficient this is the solution of minimum 2-norm. For m < n this is an underdetermined system and x = A^+b gives the minimum 2-norm solution.

We can write A = U\Sigma V^T = UV^T \cdot V \Sigma V^T \equiv PQ, where P is orthogonal and Q is symmetric positive semidefinite. This decomposition A = PQ is the polar decomposition and Q = (A^T\!A)^{1/2} is unique. This connection between the SVD and the polar decomposition is useful both theoretically and computationally.

Applications

The SVD is used in a very wide variety of applications—too many and varied to attempt to summarize here. We just mention two.

The SVD can be used to help identify to which letters vowels and consonants have been mapped in a substitution cipher (Moler and Morrison, 1983).

An inverse use of the SVD is to construct test matrices by forming a diagonal matrix of singular values from some distribution then pre- and post-multiplying by random orthogonal matrices. The result is matrices with known singular values and 2-norm condition number that are nevertheless random. Such “randsvd” matrices are widely used to test algorithms in numerical linear algebra.

History and Computation

The SVD was introduced independently by Beltrami in 1873 and Jordan in 1874. Golub popularized the SVD as an essential computational tool and developed the first reliable algorithms for computing it. The Golub–Reinsch algorithm, dating from the late 1960s and based on bidiagonalization and the QR algorithm, is the standard way to compute the SVD. Various alternatives are available; see the references.

References

This is a minimal set of references, which contain further useful references within.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK