search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

Comprehensive Guide on Singular Value Decomposition

schedule Aug 11, 2023
Last updated
local_offer
Linear Algebra
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Singular value decomposition, or SVD for short, is what many mathematicians agree to be the highlight of linear algebra. In this guide, we will go over the theory (including a rigorous proof 😺) behind this remarkable technique. In the next guide, we will apply SVD to perform data compression - trust me, the elegance of SVD will blow your mind 🤯.

Before we divide into the theory of SVD, let's first prove the following theorem that will be useful later.

Theorem.

Eigenvalues of the product between A transpose and A are real and non-negative

If $\boldsymbol{A}$ is an $m\times{n}$ matrix, then the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$ are real and non-negative.

Proof. By theoremlink, we know that $\boldsymbol{A}^T\boldsymbol{A}$ is symmetric. This means that the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$ are all real by propertylink. Great, we're half done!

Let's now show that the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$ are non-negative. Suppose $\lambda$ is an eigenvalue of $\boldsymbol{A}^T\boldsymbol{A}$ with an eigenvector of $\boldsymbol{v}$. By definitionlink of eigenvalues and eigenvectors, the following equation holds:

$$\begin{equation}\label{eq:WgjFuYwXGMjRhEdgy66} (\boldsymbol{A}^T\boldsymbol{A})\boldsymbol{v}= \lambda\boldsymbol{v} \end{equation}$$

Now, the squared magnitude of $\boldsymbol{Av}$ can be expressed as:

$$\begin{equation}\label{eq:IoKMQwz3N3snBKWh1lF} \begin{aligned}[b] \Vert\boldsymbol{Av}\Vert^2 &=(\boldsymbol{Av})^T(\boldsymbol{Av})\\ &=\boldsymbol{v}^T\boldsymbol{A}^T\boldsymbol{Av}\\ &=\boldsymbol{v}^T(\boldsymbol{A}^T\boldsymbol{Av})\\ &=\boldsymbol{v}^T(\lambda\boldsymbol{v})\\ &=\lambda\boldsymbol{v}^T\boldsymbol{v}\\ &=\lambda\Vert\boldsymbol{v}\Vert^2 \end{aligned} \end{equation}$$

Here:

Now, rearranging \eqref{eq:IoKMQwz3N3snBKWh1lF} to make $\lambda$ the subject gives:

$$\lambda= \frac{\Vert\boldsymbol{Av}\Vert^2} {\Vert\boldsymbol{v}\Vert^2}$$

Note the following:

  • the numerator is non-negative because $\boldsymbol{Av}$ could equal the zero vector, in which case the numerator becomes zero and thus $\lambda=0$.

  • the denominator cannot be zero because $\boldsymbol{v}\ne\boldsymbol{0}$ by definitionlink of eigenvector.

Therefore, we conclude that $\lambda\ge0$. This completes the proof.

Unlike other matrix decomposition techniques such as eigen-decompositionlink, QR and LU factorization, SVD works for matrices of any shape - not just for square matrices. The downside is that SVD is quite complicated and introduces new concepts such as singular values and singular matrices. Don't worry too much about them for now - we will go over them in the proof and also go through an example of actually computing them later!

Theorem.

Singular value decomposition

If $\boldsymbol{A}$ is an $m\times{n}$ matrix with ranklink $r$, then $\boldsymbol{A}$ can be expressed as:

$$\begin{equation}\label{eq:NkPUWCvElLUeZGmecmX} \boldsymbol{A}= \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T \end{equation}$$

Where:

We call \eqref{eq:NkPUWCvElLUeZGmecmX} the singular value decomposition (SVD) of $\boldsymbol{A}$.

More specifically, $\boldsymbol{V}$ is an $n\times{n}$ orthogonal matrix that orthogonally diagonalizeslink $\boldsymbol{A}^T\boldsymbol{A}$. Let the column vectors of $\boldsymbol{V}$ be $\boldsymbol{v}_1$, $\boldsymbol{v}_2$, $\cdots$, $\boldsymbol{v}_n$ and let the column vectors of $\boldsymbol{U}$ be $\boldsymbol{u}_1$, $\boldsymbol{u}_2$, $\cdots$, $\boldsymbol{u}_m$. The column vectors $\boldsymbol{u}_i$ for $i=1,2,\cdots,r$ are computed by:

$$\boldsymbol{u}_i=\frac{\boldsymbol{Av}_i}{\Vert \boldsymbol{Av}_i\Vert}$$

While the rest of the column vectors $\boldsymbol{u}_{r+1}$, $\boldsymbol{u}_{r+2}$, $\cdots$, $\boldsymbol{u}_m$ are chosen such that:

$$\{\boldsymbol{u}_1, \boldsymbol{u}_2,\cdots, \boldsymbol{u}_r,\boldsymbol{u}_{r+1}, \boldsymbol{u}_{r+2}, \cdots,\boldsymbol{u}_m\}\;\; \text{is an orthonormal basis for }\;\mathbb{R}^m$$

Next, $\boldsymbol{\Sigma}$ is called a singular matrix of $\boldsymbol{A}$ and has the following block formlink:

$$\boldsymbol{\Sigma}=\begin{pmatrix} \boldsymbol{D}_\boldsymbol{A}&\boldsymbol{0}_{r\times(n-r)}\\ \boldsymbol{0}_{(m-r)\times{r}}&\boldsymbol{0}_{(m-r)\times(n-r)} \end{pmatrix}$$

Where:

  • $\boldsymbol{D}_\boldsymbol{A}$ is a $r\times{r}$ diagonal matrixlink whose diagonal entries are the first $r$ singular values of $\boldsymbol{A}$ in descending order. A singular value is defined as the squared root of an eigenvalue of $\boldsymbol{A}^T\boldsymbol{A}$.

  • $\boldsymbol{0}$ is a zero matrix whose shape is stated in the subscript.

Note that the convention of our guides is typically to denote orthogonal matrices as $\boldsymbol{Q}$, but most textbooks use the letters $\boldsymbol{U}$, $\boldsymbol{\Sigma}$ and $\boldsymbol{V}$ for SVD so we shall respect that.

Proof. Let $\boldsymbol{A}$ be an $m\times{n}$ matrix. By theoremlink, $\boldsymbol{A}^T\boldsymbol{A}$ is symmetric, which means that by the principal axis theoremlink, $\boldsymbol{A}^T\boldsymbol{A}$ can be orthogonally diagonalized as:

$$\boldsymbol{A}^T\boldsymbol{A}= \boldsymbol{V} \boldsymbol{D} \boldsymbol{V}^T$$

Where:

  • $\boldsymbol{V}$ is an $n\times{n}$ orthogonal matrix whose column vectors are the eigenvectors $\boldsymbol{v}_1$, $\boldsymbol{v}_2$, $\cdots$, $\boldsymbol{v}_n$ of $\boldsymbol{A}^T\boldsymbol{A}$.

  • $\boldsymbol{D}$ is an $n\times{n}$ diagonal matrix whose diagonal entries are the eigenvalues $\lambda_1$, $\lambda_2$, $\cdots$, $\lambda_n$ of $\boldsymbol{A}^T\boldsymbol{A}$. These eigenvalues are real and non-negative by propertylink.

Now, let's assign the variable names such that:

$$\lambda_1\ge\lambda_2\ge \cdots\ge\lambda_r\gt0$$

While the rest of the eigenvalues are zero:

$$\begin{equation}\label{eq:RFQFFJxhYVO0VbmOC1k} \lambda_{r+1}=\lambda_{r+2}=\cdots=\lambda_{n}=0 \end{equation}$$

For instance, suppose $\boldsymbol{A}^T\boldsymbol{A}$ has three eigenvalues $5$, $0$ and $8$. In this case, we will assign $\lambda_1=8$, $\lambda_2=5$ and $\lambda_3=0$, while the value of $r$ would be $2$.

Relationship between rank of matrix and eigenvalues

We will now show that the value of $r$ is equal to the ranklink of $\boldsymbol{A}$. Suppose $\lambda_i$ and $\lambda_j$ are eigenvalues of $\boldsymbol{A}$ with associated eigenvectors $\boldsymbol{v}_i$ and $\boldsymbol{v}_j$ respectively. By definitionlink of eigenvalues and eigenvectors, we have that:

$$\begin{equation}\label{eq:KSjlGyJitkTnUxpHltU} (\boldsymbol{A}^T\boldsymbol{A}) \boldsymbol{v}_j= \lambda_j \boldsymbol{v}_j \end{equation}$$

Now, the dot product of $\boldsymbol{Av}_i$ and $\boldsymbol{Av}_j$ is:

$$\begin{equation}\label{eq:WnMilyxYTWTJFNMJXbk} \begin{aligned}[b] (\boldsymbol{Av}_i)\cdot (\boldsymbol{Av}_j)&= (\boldsymbol{Av}_i)^T (\boldsymbol{Av}_j)\\&= (\boldsymbol{v}_i^T\boldsymbol{A}^T) (\boldsymbol{Av}_j)\\&= \boldsymbol{v}_i^T\big((\boldsymbol{A}^T\boldsymbol{A}) \boldsymbol{v}_j\big)\\&= \boldsymbol{v}_i^T(\lambda_j\boldsymbol{v}_j) \\&=\lambda_j(\boldsymbol{v}_i^T\boldsymbol{v}_j) \\&=\lambda_j(\boldsymbol{v}_i\cdot\boldsymbol{v}_j) \end{aligned} \end{equation}$$

Here:

We now consider two cases - when $i=j$ and $i\ne{j}$. When $i=j$, \eqref{eq:WnMilyxYTWTJFNMJXbk} becomes:

$$\begin{equation}\label{eq:i2OnmX79bDTac17OpXD} \begin{aligned}[b] (\boldsymbol{Av}_i)\cdot (\boldsymbol{Av}_i)&= \lambda_i(\boldsymbol{v}_i\cdot\boldsymbol{v}_i)\\ \Vert\boldsymbol{Av}_i\Vert^2&= \lambda_i\Vert\boldsymbol{v}_i\Vert^2\\ \Vert\boldsymbol{Av}_i\Vert^2&=\lambda_i(1)^2\\ \Vert\boldsymbol{Av}_i\Vert^2&=\lambda_i, \;\;\;\;\; \mathrm{for }\;i=1,2,\cdots,n \end{aligned} \end{equation}$$

Here:

  • the 2nd equality holds by theoremlink.

  • the 3rd equality holds because $\boldsymbol{v}_i$ is a unit vector.

Since eigenvalues $\lambda_i=0$ for $i\gt{r}$ by \eqref{eq:RFQFFJxhYVO0VbmOC1k}, we have that \eqref{eq:i2OnmX79bDTac17OpXD} becomes:

$$\begin{equation}\label{eq:FNbXxNbL1tZ4IR2PotZ} \begin{aligned}[b] \Vert\boldsymbol{Av}_i\Vert^2&=0\\ \Vert\boldsymbol{Av}_i\Vert&=0\\ \boldsymbol{Av}_i&=\boldsymbol{0}, \;\;\;\;\;\;\text{for }\;i\gt{r} \\ \end{aligned} \end{equation}$$

To summarize, in the case when $i=j$, we have:

$$\begin{equation}\label{eq:P7Eg1JUr76KH2L9RVwJ} \begin{cases} \;\Vert\boldsymbol{Av}_i\Vert^2=\lambda_i,&\text{for }\;i=1,2,\cdots,r\\ \;\boldsymbol{Av}_i=\boldsymbol{0},&\text{for }\;i>r\\ \end{cases} \end{equation}$$

We now consider the case when $i\ne{j}$. Because $\{\boldsymbol{v}_1, \boldsymbol{v}_2,\cdots,\boldsymbol{v}_n\}$ is an orthonormal set, \eqref{eq:WnMilyxYTWTJFNMJXbk} becomes:

$$\begin{equation}\label{eq:KthnQldWF5qxzDqc8Rt} \begin{aligned} (\boldsymbol{Av}_i)\cdot (\boldsymbol{Av}_j)&= \lambda_j(\boldsymbol{v}_i\cdot\boldsymbol{v}_j)\\ &=\lambda_j(0)\\ &=0 \end{aligned} \end{equation}$$

Because every pair of $\boldsymbol{Av}_i$ and $\boldsymbol{Av}_j$ where $i\ne{j}$ is perpendicular, we have the following:

$$\begin{equation}\label{eq:Zb2WQDDVVHYSvKHRLXB} \{\boldsymbol{Av}_1,\boldsymbol{Av}_2,\cdots,\boldsymbol{Av}_r\} \;\text{ is an orthogonal set} \end{equation}$$

Recall that $\{\boldsymbol{v}_1,\boldsymbol{v}_2, \cdots,\boldsymbol{v}_n\}$ is an orthogonal set and is therefore a basislink for $\mathbb{R}^n$. This means that we can express any vector $\boldsymbol{x}$ in $\mathbb{R}^n$ using a linear combination of these orthogonal vectors:

$$\begin{equation}\label{eq:tN3giIYOyd8F0yp5Gok} \boldsymbol{x}= c_1\boldsymbol{v}_1+ c_2\boldsymbol{v}_2+ \cdots+ c_n\boldsymbol{v}_n \end{equation}$$

Where $c_1$, $c_2$, $\cdots$, $c_n$ are scalar coefficients. Multiplying both sides of \eqref{eq:tN3giIYOyd8F0yp5Gok} by $\boldsymbol{A}$ gives:

$$\begin{equation}\label{eq:ZGPPmu4lCwoPLgmCfiC} \begin{aligned}[b] \boldsymbol{Ax}&= \boldsymbol{A}(c_1\boldsymbol{v}_1+ c_2\boldsymbol{v}_2+ \cdots+ c_n\boldsymbol{v}_n)\\ &=c_1\boldsymbol{A}\boldsymbol{v}_1+ c_2\boldsymbol{A}\boldsymbol{v}_2+ \cdots+ c_n\boldsymbol{A}\boldsymbol{v}_n\\ &=c_1\boldsymbol{A}\boldsymbol{v}_1+ c_2\boldsymbol{A}\boldsymbol{v}_2+ \cdots+ c_r\boldsymbol{A}\boldsymbol{v}_r+ c_{r+1}\boldsymbol{A}\boldsymbol{v}_{r+1}+ \cdots+ c_n\boldsymbol{A}\boldsymbol{v}_n \end{aligned} \end{equation}$$

By \eqref{eq:FNbXxNbL1tZ4IR2PotZ}, we have that $\boldsymbol{Av}_i=\boldsymbol{0}$ for $i\gt{r}$. Therefore, \eqref{eq:ZGPPmu4lCwoPLgmCfiC} becomes:

$$\begin{equation}\label{eq:QSbgYKJpILVoI3RG6y1} \begin{aligned}[b] \boldsymbol{Ax} &=c_1\boldsymbol{A}\boldsymbol{v}_1+ c_2\boldsymbol{A}\boldsymbol{v}_2+ \cdots+ c_r\boldsymbol{A}\boldsymbol{v}_r+ c_{r+1}(\boldsymbol{0})+\cdots+c_n(\boldsymbol{0})\\ &=c_1\boldsymbol{A}\boldsymbol{v}_1+ c_2\boldsymbol{A}\boldsymbol{v}_2+ \cdots+c_r\boldsymbol{A}\boldsymbol{v}_r \end{aligned} \end{equation}$$

This means that any vector generated by $\boldsymbol{Ax}$ can be expressed as a linear combination of the vectors $\boldsymbol{Av}_1$, $\boldsymbol{Av}_2$, $\cdots$, $\boldsymbol{Av}_r$. This means that $\{\boldsymbol{Av}_1, \boldsymbol{Av}_2, \cdots,\boldsymbol{Av}_r\}$ is a basis for the column spacelink of $\boldsymbol{A}$. Now, recall that the ranklink of $\boldsymbol{A}$ is defined as the dimensionlink of the column space of $\boldsymbol{A}$, that is, the number of basis vectors for the column space of $\boldsymbol{A}$. Since the column space of $\boldsymbol{A}$ has $r$ basis vectors, the rank of $\boldsymbol{A}$ is $\boldsymbol{r}$.

* * *

Let's now define the so-called singular values.

Definition.

Singular values of a matrix

Let $\boldsymbol{A}$ be an $m\times{n}$ matrix. If $\lambda_1$, $\lambda_2$, $\cdots$, $\lambda_n$ are the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$, then the singular values $\sigma_1$, $ \sigma_2$, $\cdots$, $\sigma_n$ of $\boldsymbol{A}$ are defined as:

$$\begin{gather*} \sigma_1=\sqrt{\lambda_1}\\ \sigma_2=\sqrt{\lambda_2}\\ \vdots\\ \sigma_n=\sqrt{\lambda_n} \end{gather*}$$

In our case, the singular values of $\boldsymbol{A}$ would also be:

$$\begin{equation}\label{eq:Sjf1KMk4qGwHx7AgCU2} \begin{gathered} \sigma_1=\sqrt{\lambda_1}\\ \sigma_2=\sqrt{\lambda_2}\\ \vdots\\ \sigma_n=\sqrt{\lambda_n} \end{gathered} \end{equation}$$

Remember that the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$ have been rearranged such that:

$$\begin{equation}\label{eq:JpNrPF3PbBF7Ofd8jEc} \begin{cases} \;\lambda_1\ge\lambda_2\ge \cdots\ge\lambda_r\gt0\\ \;\lambda_{r+1}= \lambda_{r+2}= \cdots=\lambda_{n}=0 \end{cases} \end{equation}$$

Since the function $f(x)=\sqrt{x}$ is monotonically increasing, taking the square root of the eigenvalues will not change the inequality \eqref{eq:JpNrPF3PbBF7Ofd8jEc}. This means that the singular values will have the following ordering:

$$\begin{equation}\label{eq:GpOVpGoLvwRm2xIK8Tm} \begin{cases} \;\sigma_1\ge\sigma_2\ge \cdots\ge\sigma_r\gt0\\ \;\sigma_{r+1}=\sigma_{r+2}= \cdots=\sigma_{n}=0 \end{cases} \end{equation}$$

Now, recall from \eqref{eq:P7Eg1JUr76KH2L9RVwJ} the following:

$$\Vert\boldsymbol{Av}_i\Vert^2=\lambda_i, \;\;\;\text{for }\;i=1,2,\cdots,r$$

Taking the square root of both sides and making $\sqrt{\lambda_i}$ the subject gives:

$$\begin{equation}\label{eq:Cim4aRi3GxTLo9LVX9j} \sqrt{\lambda_i}=\Vert\boldsymbol{Av}_i\Vert, \;\;\;\; \text{for }\;i=1,2,\cdots,r \end{equation}$$

Remember, $\lambda_i$ is non-negative so we are allowed to take its square root here. Using the definition of singular values \eqref{eq:Sjf1KMk4qGwHx7AgCU2}, we can write \eqref{eq:Cim4aRi3GxTLo9LVX9j} as:

$$\sigma_i= \Vert\boldsymbol{Av}_i\Vert ,\;\;\;\;\text{for }\; i=1,2,\cdots,r$$

By \eqref{eq:GpOVpGoLvwRm2xIK8Tm}, we have that $\sigma_i=0$ for $i\gt{r}$. Let's summarize our result:

$$\begin{equation}\label{eq:WpNcxedNGzx7R6RkhNn} \begin{cases} \;\sigma_i=\Vert\boldsymbol{Av}_i\Vert, &\text{for }\; i=1,2,\cdots,r\\ \;\sigma_i=0,&\mathrm{for }\;i\gt{r} \end{cases} \end{equation}$$

We now define the so-called singular matrix.

Definition.

Singular matrix of a matrix

Let $\boldsymbol{A}$ be an $m\times{n}$ matrix of ranklink $r$ with singular values:

$$\begin{cases} \sigma_1\ge \sigma_2\ge\cdots\ge \sigma_r\gt0\\ \sigma_i=0,&\mathrm{for }\;i\gt{r} \end{cases}$$

Let diagonal matrixlink $\boldsymbol{D}_\boldsymbol{A}$ be defined as:

$$\boldsymbol{D}_\boldsymbol{A} =\begin{pmatrix} \sigma_{1}&0&\cdots&0\\ 0&\sigma_2&\cdots&0\\ \vdots&\vdots&\smash\ddots&\vdots\\ 0&0&\cdots&\sigma_r \end{pmatrix}$$

The singular matrix of $\boldsymbol{A}$, denoted as $\boldsymbol{\Sigma}_\boldsymbol{A}$ or just $\boldsymbol{\Sigma}$, is defined as the following $m\times{n}$ block matrixlink:

$$\boldsymbol{\Sigma}=\begin{pmatrix} \boldsymbol{D}_\boldsymbol{A}&\boldsymbol{0}_{r\times(n-r)}\\ \boldsymbol{0}_{(m-r)\times{r}}&\boldsymbol{0}_{(m-r)\times(n-r)} \end{pmatrix}$$

Where $\boldsymbol{0}_{r\times(n-r)}$ is a zero matrix with $r$ rows and $n-r$ columns.

Now, recall from \eqref{eq:Zb2WQDDVVHYSvKHRLXB} that $\{\boldsymbol{Av}_1,\boldsymbol{Av}_2,\cdots,\boldsymbol{Av}_r\}$ is an orthogonal setlink. We can convert each vector into a unit vector by methodlink to obtain the orthonormal setlink $\{\boldsymbol{u}_1,\boldsymbol{u}_2, \cdots,\boldsymbol{u}_r\}$ where:

$$\begin{equation}\label{eq:a4Ekl1HkJcaSL9jGZ0y} \boldsymbol{u}_i= \frac{1}{\Vert\boldsymbol{Av}_i\Vert} \boldsymbol{Av}_i,\;\;\;\; \text{ for }\;i=1,2,\cdots,r \end{equation}$$

We now extend this set by methodlink such that $\{\boldsymbol{u}_1,\boldsymbol{u}_2, \cdots,\boldsymbol{u}_r, \boldsymbol{u}_{r+1}, \cdots,\boldsymbol{u}_m\}$ becomes an orthonormal basislink for $\mathbb{R}^m$.

We now define an $m\times{m}$ orthogonal matrix $\boldsymbol{U}$ such that:

$$\begin{equation}\label{eq:X33q8I4RC6wN25CRmQB} \boldsymbol{U}=\begin{pmatrix} \vert&\vert&\cdots&\vert\\ \boldsymbol{u}_1&\boldsymbol{u}_2&\cdots&\boldsymbol{u}_m\\ \vert&\vert&\cdots&\vert\\ \end{pmatrix} \end{equation}$$

Recall from \eqref{eq:WpNcxedNGzx7R6RkhNn} that:

$$\begin{equation}\label{eq:ioSe2ViZ9mErlEyEsUL} \begin{cases} \;\sigma_i=\Vert\boldsymbol{Av}_i\Vert, &\text{for }\; i=1,2,\cdots,r\\ \;\sigma_i=0,&\mathrm{for }\;i\gt{r} \end{cases} \end{equation}$$

Multiplying both sides of the top equation by $\boldsymbol{u}_i$ gives:

$$\begin{equation}\label{eq:vmO3KfLfrAyzizbdMl7} \sigma_i\boldsymbol{u}_i= \Vert\boldsymbol{Av}_i\Vert\boldsymbol{u}_i ,\;\;\;\; \text{ for }\;i=1,2,\cdots,r \end{equation}$$

Now, if we multiply both sides of \eqref{eq:a4Ekl1HkJcaSL9jGZ0y} by $\Vert\boldsymbol{Av}_i\Vert$, we get:

$$\begin{equation}\label{eq:OVHV9f2yTu44jdbRKDW} \Vert\boldsymbol{Av}_i\Vert\boldsymbol{u}_i= \boldsymbol{Av}_i,\;\;\;\; \text{ for }\;i=1,2,\cdots,r \end{equation}$$

Substituting \eqref{eq:OVHV9f2yTu44jdbRKDW} into \eqref{eq:vmO3KfLfrAyzizbdMl7} and making $\boldsymbol{Av}_i$ the subject gives:

$$\begin{equation}\label{eq:xEkZKEaLdhb7rzU4vqm} \boldsymbol{Av}_i =\sigma_i\boldsymbol{u}_i ,\;\;\;\; \text{ for }\;i=1,2,\cdots,r \end{equation}$$

Recalling all the way back to \eqref{eq:P7Eg1JUr76KH2L9RVwJ}, we have that:

$$\begin{equation}\label{eq:B8RBJYWVYSEl3ljqapZ} \boldsymbol{Av}_i=\boldsymbol{0},\;\;\;\; \text{ for }\;i\gt{r} \end{equation}$$

Let's summarize \eqref{eq:xEkZKEaLdhb7rzU4vqm} and \eqref{eq:B8RBJYWVYSEl3ljqapZ} as:

$$\begin{equation}\label{eq:RvMbvxleHxW8j8jJnDL} \begin{cases} \boldsymbol{Av}_i =\sigma_i\boldsymbol{u}_i ,&\text{for }\;i=1,2,\cdots,r\\ \boldsymbol{Av}_i=\boldsymbol{0}, &\text{for }\;i\gt{r} \end{cases} \end{equation}$$

Now, using theoremlink and \eqref{eq:RvMbvxleHxW8j8jJnDL}, we can express the matrix product $\boldsymbol{AV}$ as:

$$\begin{equation}\label{eq:kngBv52Kwm7GUderB5t} \begin{aligned}[b] \boldsymbol{AV}&= \boldsymbol{A}\begin{pmatrix} \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \boldsymbol{v}_1&\boldsymbol{v}_2&\cdots& \boldsymbol{v}_r&\boldsymbol{v}_{r+1}& \cdots&\boldsymbol{v}_n\\ \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \end{pmatrix}\\&= \begin{pmatrix} \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \boldsymbol{A}\boldsymbol{v}_1&\boldsymbol{A}\boldsymbol{v}_2&\cdots& \boldsymbol{A}\boldsymbol{v}_r&\boldsymbol{A}\boldsymbol{v}_{r+1}& \cdots&\boldsymbol{A}\boldsymbol{v}_n\\ \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \end{pmatrix} \end{aligned} \end{equation}$$

By \eqref{eq:xEkZKEaLdhb7rzU4vqm} and \eqref{eq:B8RBJYWVYSEl3ljqapZ}, we can express the columns of $\boldsymbol{AV}$ as:

$$\begin{equation}\label{eq:yq1WwOcMw0iWRvuVHFe} \boldsymbol{AV}=\begin{pmatrix} \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \sigma_1\boldsymbol{u}_1& \sigma_2\boldsymbol{u}_2&\cdots& \sigma_r\boldsymbol{u}_r &\boldsymbol{0}&\cdots&\boldsymbol{0}\\ \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \end{pmatrix} \end{equation}$$

Now, consider the matrix product $\boldsymbol{U}\boldsymbol{\Sigma}$ where $\boldsymbol{\Sigma}$ is the singular matrixlink of $\boldsymbol{A}$ defined earlier:

$$\begin{equation}\label{eq:eAtcXAwjLqeefFE3XXh} \begin{aligned}[b] \boldsymbol{U} \boldsymbol{\Sigma}&= \begin{pmatrix}\vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \boldsymbol{u}_1&\boldsymbol{u}_2&\cdots&\boldsymbol{u}_r&\boldsymbol{u}_{r+1}& \cdots&\boldsymbol{u}_m\\\vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \end{pmatrix} \begin{pmatrix} \sigma_{1}&0&\cdots&0&0&\cdots&0\\ 0&\sigma_2&\cdots&0&0&\cdots&0\\ \vdots&\vdots&\smash\ddots&\vdots&\vdots&\smash\ddots&\vdots\\ 0&0&\cdots&\sigma_r&0&\cdots&0\\ 0&0&\cdots&0&0&\cdots&0\\ \vdots&\vdots&\smash\ddots&\vdots&\vdots&\smash\ddots&\vdots\\ 0&0&\cdots&0&0&\cdots&0\end{pmatrix} \\&= \begin{pmatrix}\vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \sigma_1\boldsymbol{u}_1&\sigma_2\boldsymbol{u}_2&\cdots& \sigma_r\boldsymbol{u}_r &\boldsymbol{0}&\cdots&\boldsymbol{0}\\ \vert&\vert&\cdots&\vert&\vert&\cdots&\vert\\ \end{pmatrix} \end{aligned} \end{equation}$$

Here, the second equality holds by a similar logic to that of the propertylink of diagonal matrices. Amazingly, \eqref{eq:eAtcXAwjLqeefFE3XXh} matches \eqref{eq:yq1WwOcMw0iWRvuVHFe}, which means:

$$\begin{equation}\label{eq:PgmN8RDyxJjeHavKhPb} \boldsymbol{AV}= \boldsymbol{U\Sigma} \end{equation}$$

By propertylink, since $\boldsymbol{V}$ is an orthogonal matrix, $\boldsymbol{V}$ is invertible and $\boldsymbol{V}^{-1}=\boldsymbol{V}^T$. Therefore, \eqref{eq:PgmN8RDyxJjeHavKhPb} becomes:

$$\boldsymbol{A}= \boldsymbol{U\Sigma}\boldsymbol{V}^T$$

This completes the proof.

We typically wouldn't compute the SVD of a matrix by hand because of how tedious the calculations are. Instead, most numerical programming libraries offer a method to find the SVD of any matrix efficiently. Nonetheless, we will still demonstrate how to perform SVD by hand for learning purposes!

Example.

Performing singular value decomposition on a 3x2 matrix

Consider the following matrix:

$$\boldsymbol{A}= \begin{pmatrix} 1&0\\1&1\\0&1 \end{pmatrix}$$

Find the singular value decomposition of $\boldsymbol{A}$.

Solution. To find the singular value decomposition of $\boldsymbol{A}$, we must find three matrices $\boldsymbol{U}$, $\boldsymbol{\Sigma}$ and $\boldsymbol{V}$. We typically find these matrices in the following order:

  1. $\boldsymbol{\Sigma}$ can be found by obtaining the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$.

  2. $\boldsymbol{V}$ can be found by obtaining the unit eigenvectors of $\boldsymbol{A}^T\boldsymbol{A}$.

  3. $\boldsymbol{U}$ can be found by using the column vectors of $\boldsymbol{V}$.

Finding the singular values of A

Let's start by finding the singular valueslink of $\boldsymbol{A}$, which are equal to the square root of the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$. Firstly, the matrix $\boldsymbol{A}^T\boldsymbol{A}$ is:

$$\boldsymbol{A}^T\boldsymbol{A}= \begin{pmatrix} 1&1&0\\0&1&1 \end{pmatrix} \begin{pmatrix} 1&0\\1&1\\0&1 \end{pmatrix}= \begin{pmatrix} 2&1\\1&2 \end{pmatrix}$$

The characteristic polynomiallink of $\boldsymbol{A}^T\boldsymbol{A}$ is:

$$\begin{align*} \det(\boldsymbol{A}-\lambda\boldsymbol{I}_2) &=\begin{vmatrix}2-\lambda&1\\1&2-\lambda\end{vmatrix}\\ &=(2-\lambda)(2-\lambda)-(1)(1)\\ &=\lambda^2-4\lambda+4-1\\ &=\lambda^2-4\lambda+3\\ &=(\lambda-3)(\lambda-1) \end{align*}$$

Therefore, the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$ are $\lambda_1=3$ and $\lambda_2=1$. Note that singular value decomposition requires that we assign the larger eigenvalue for $\lambda_1$, so setting $\lambda_1=1$ and $\lambda_2=3$ will be incorrect here.

The singular values of $\boldsymbol{A}$ are therefore:

$$\begin{align*} \sigma_1&=\sqrt{\lambda_1}=\sqrt{3}\\ \sigma_2&=\sqrt{\lambda_2}=1 \end{align*}$$

Finding the singular matrix of A

With the singular values computed, we can now easily find the $3\times2$ singular matrixlink $\boldsymbol{\Sigma}$ of $\boldsymbol{A}$ like so:

$$\begin{equation}\label{eq:UMDfNzrY9ipxYWqfkF4} \begin{aligned}[b] \boldsymbol{\Sigma}= \begin{pmatrix} \sigma_1&0\\0&\sigma_2\\0&0 \end{pmatrix}= \begin{pmatrix} \sqrt3&0\\0&1\\0&0 \end{pmatrix} \end{aligned} \end{equation}$$

Finding the orthogonal matrix V

The matrix $\boldsymbol{V}$ orthogonally diagonalizeslink $\boldsymbol{A}^T\boldsymbol{A}$, which means that the column vectors of $\boldsymbol{V}$ are the (unit) eigenvectors of $\boldsymbol{A}^T\boldsymbol{A}$. We've already found the eigenvalues of $\boldsymbol{A}^T\boldsymbol{A}$ - they are $\lambda_1=3$ and $\lambda_2=1$, so let's now find the corresponding eigenvectors!

For your reference, here's $\boldsymbol{A}^T\boldsymbol{A}$ again:

$$\boldsymbol{A}^T\boldsymbol{A}= \begin{pmatrix} 2&1\\1&2 \end{pmatrix}$$

To get an eigenvector associated with $\lambda_1=3$, we solve the following:

$$\begin{align*} (\boldsymbol{A}^T\boldsymbol{A}-\lambda_1\boldsymbol{I}_2) \boldsymbol{v}_1&=\boldsymbol{0}\\ \begin{pmatrix}2-3&1\\1&2-3\end{pmatrix} \begin{pmatrix}v_{11}\\v_{12}\end{pmatrix} &=\begin{pmatrix}0\\0\end{pmatrix}\\ \begin{pmatrix}-1&1\\1&-1\end{pmatrix} \begin{pmatrix}v_{11}\\v_{12}\end{pmatrix}&= \begin{pmatrix}0\\0\end{pmatrix}\\ \begin{pmatrix}1&-1\\0&0\end{pmatrix} \begin{pmatrix}v_{11}\\v_{12}\end{pmatrix}&= \begin{pmatrix}0\\0\end{pmatrix}\\ \end{align*}$$

Here, $v_{12}$ is a free variablelink so let $v_{12}=t$ where $t$ is any scalar. The first row then yields $v_{11}=t$. Therefore, an eigenvector corresponding to $\lambda_1=3$ can be expressed as:

$$\boldsymbol{v}_1= \begin{pmatrix} t\\t \end{pmatrix}= \begin{pmatrix} 1\\1 \end{pmatrix}t$$

For simplicity, we can let $t=1$. Since $\boldsymbol{V}$ is an orthogonal matrix whose columns are unit eigenvectors, we need to convert $\boldsymbol{v}_1$ into a unit vector:

$$\begin{equation}\label{eq:HapfPURNLkxYe31TtoQ} \boldsymbol{v}_1= \frac{1}{\sqrt{1^2+1^2}} \begin{pmatrix}1\\1\end{pmatrix}= \begin{pmatrix}1/\sqrt2\\1/\sqrt2\end{pmatrix} \end{equation}$$

We now need to repeat the same process for $\lambda_2=1$ by solving:

$$\begin{align*} (\boldsymbol{A}^T\boldsymbol{A}-\lambda\boldsymbol{I}_2) \boldsymbol{v}_2&=\boldsymbol{0}\\ \begin{pmatrix}2-1&1\\1&2-1\end{pmatrix} \begin{pmatrix}v_{21}\\v_{22}\end{pmatrix} &=\begin{pmatrix}0\\0\end{pmatrix}\\ \begin{pmatrix}1&1\\0&0\end{pmatrix} \begin{pmatrix}v_{21}\\v_{22}\end{pmatrix}&= \begin{pmatrix}0\\0\end{pmatrix}\\ \end{align*}$$

Here, $v_{22}$ is a free variablelink so let $v_{22}=t$ where $t$ is any scalar. The first row then yields $v_{21}=-t$. Therefore, an eigenvector associated with $\lambda_2=1$ can be expressed as:

$$\boldsymbol{v}_2= \begin{pmatrix} -t\\t \end{pmatrix}= \begin{pmatrix} -1\\1 \end{pmatrix}t$$

Again, we can let $t=1$ for simplicity. We then convert $\boldsymbol{v}_2$ into a unit eigenvector:

$$\begin{equation}\label{eq:f5YcLVz0suU2EaAtms4} \boldsymbol{v}_2= \frac{1}{\sqrt{(-1)^2+1^2}} \begin{pmatrix}-1\\1\end{pmatrix}= \begin{pmatrix}-1/\sqrt2\\1/\sqrt2\end{pmatrix} \end{equation}$$

Great, we've now managed to find $\boldsymbol{V}$ as below:

$$\begin{equation}\label{eq:DKLgvGWs4ckCDnFExOi} \boldsymbol{V}=\begin{pmatrix} \vert&\vert\\ \boldsymbol{v}_1&\boldsymbol{v}_2\\ \vert&\vert \end{pmatrix}= \begin{pmatrix} 1/\sqrt2&-1/\sqrt2\\ 1/\sqrt2&1/\sqrt2 \end{pmatrix} \end{equation}$$

Finding the orthogonal matrix U

Let's now move on to finding the $3\times3$ orthogonal matrix $\boldsymbol{U}$. The first two column vectors $\boldsymbol{u}_1$ and $\boldsymbol{u}_2$ are computed by:

$$\begin{equation}\label{eq:Tle0jXlo6zq2thkLdtd} \boldsymbol{u}_1= \frac{1}{\Vert\boldsymbol{Av}_1\Vert} \boldsymbol{Av}_1,\;\;\;\;\;\;\; \boldsymbol{u}_2= \frac{1}{\Vert\boldsymbol{Av}_2\Vert} \boldsymbol{Av}_2 \end{equation}$$

Where $\boldsymbol{v}_1$ and $\boldsymbol{v}_2$ are given by \eqref{eq:HapfPURNLkxYe31TtoQ} and \eqref{eq:f5YcLVz0suU2EaAtms4} respectively. Let's first work out $\boldsymbol{Av}_1$ and $\boldsymbol{Av}_2$ like so:

$$\begin{align*} \boldsymbol{Av}_1&= \begin{pmatrix}1&0\\1&1\\0&1\end{pmatrix} \begin{pmatrix}1/\sqrt2\\1/\sqrt2\end{pmatrix} =\begin{pmatrix}1/\sqrt2\\2/\sqrt2\\1/\sqrt2\end{pmatrix}\\ \boldsymbol{Av}_2&= \begin{pmatrix}1&0\\1&1\\0&1\end{pmatrix} \begin{pmatrix}-1/\sqrt2\\1/\sqrt2\end{pmatrix} =\begin{pmatrix}-1/\sqrt2\\0\\1/\sqrt2\end{pmatrix} \end{align*}$$

The magnitudes of $\boldsymbol{Av}_1$ and $\boldsymbol{Av}_2$ are:

$$\begin{align*} \Vert\boldsymbol{Av}_1\Vert&= \big((1/\sqrt2)^2+(2/\sqrt2)^2+(1/\sqrt2)^2\big)^{1/2}\\ &=\big((1/2)+(4/2)+(1/2)\big)^{1/2}\\ &=\sqrt3\\\\ \Vert\boldsymbol{Av}_2\Vert&= \big((-1/\sqrt2)^2+(0)^2+(1/\sqrt2)^2\big)^{1/2}\\ &=\big((1/2)+(1/2)\big)^{1/2}\\ &=1 \end{align*}$$

We can now compute the unit column vectors $\boldsymbol{u}_1$ and $\boldsymbol{u}_2$ using \eqref{eq:Tle0jXlo6zq2thkLdtd} to get:

$$\boldsymbol{u}_1= \begin{pmatrix} 1/\sqrt6\\2/\sqrt6\\1/\sqrt6 \end{pmatrix},\;\;\;\;\;\; \boldsymbol{u}_2= \begin{pmatrix}-1/\sqrt2\\0\\1/\sqrt2\end{pmatrix}$$

Remember, the theory of SVD guarantees that $\boldsymbol{u}_1$ and $\boldsymbol{u}_2$ are orthogonal.

We now must find a vector $\boldsymbol{u}_3$ that is orthogonal to both $\boldsymbol{u}_1$ and $\boldsymbol{u}_2$. Instead of using unit vectors, let's multiply $\boldsymbol{u}_1$ by $\sqrt6$ and $\boldsymbol{u}_2$ by $\sqrt2$ so that we can remove the fractions:

$$\boldsymbol{u}_1'= \begin{pmatrix}1\\2\\1 \end{pmatrix},\;\;\;\;\;\; \boldsymbol{u}_2'= \begin{pmatrix}-1\\0\\1\end{pmatrix}$$

Note that multiplying vectors by scalars does not change their direction, so a vector that is orthogonal to these simplified vectors $\boldsymbol{u}'_1$ and $\boldsymbol{u}'_2$ will also be orthogonal to the original $\boldsymbol{u}_1$ and $\boldsymbol{u}_2$.

We can use methodlink to find a vector that is orthogonal to both $\boldsymbol{u}_1'$ and $\boldsymbol{u}_2'$. This involves solving the following homogeneous linear system:

$$\begin{pmatrix} 1&2&1\\ -1&0&1\\ \end{pmatrix} \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} =\begin{pmatrix}0\\0\end{pmatrix}$$

Let's use Gaussian elimination to solve the system:

$$\begin{pmatrix} 1&2&1\\ -1&0&1\\ \end{pmatrix}\sim \begin{pmatrix} 1&2&1\\ 0&2&2\\ \end{pmatrix} \sim \begin{pmatrix} 1&2&1\\ 0&1&1\\ \end{pmatrix}\sim \begin{pmatrix} 1&1&0\\ 0&1&1\\ \end{pmatrix}$$

If we let $x_3=t$ where $t$ is any scalar, then:

  • the second row gives us $x_2=-t$.

  • the first row gives us $x_1=t$.

Therefore, the solution of the homogenous linear system can be expressed as:

$$\begin{pmatrix} x_1\\x_2\\x_3 \end{pmatrix}= \begin{pmatrix} t\\-t\\t \end{pmatrix}= \begin{pmatrix} 1\\-1\\1 \end{pmatrix}t$$

For simplicity, let's pick $t=1$. Remember, this vector is orthogonal to $\boldsymbol{u}_1$ and $\boldsymbol{u}_2$. We convert this vector into a unit vector to obtain $\boldsymbol{u}_3$ like so:

$$\begin{align*} \boldsymbol{u}_3= \frac{1}{\sqrt{(1)^2+(-1)^2+(1)^2}} \begin{pmatrix}1\\-1\\1\end{pmatrix}= \begin{pmatrix}1/\sqrt3\\-1/\sqrt3\\1/\sqrt3\end{pmatrix}\\ \end{align*}$$

We now have an orthonormal setlink $\{\boldsymbol{u}_1,\boldsymbol{u}_2,\boldsymbol{u}_3\}$ which forms the column vectors of orthogonal matrix $\boldsymbol{U}$, that is:

$$\begin{equation}\label{eq:E7TMPcrLQZTZ4Gwvh4l} \boldsymbol{U}= \begin{pmatrix} \vert&\vert&\vert\\ \boldsymbol{u}_1& \boldsymbol{u}_2& \boldsymbol{u}_3\\ \vert&\vert&\vert \end{pmatrix}= \begin{pmatrix} 1/\sqrt6&-1/\sqrt2&1/\sqrt3\\ 2\sqrt6&0&-1/\sqrt3\\ 1/\sqrt6&1/\sqrt2&1/\sqrt3 \end{pmatrix} \end{equation}$$

We now have all the factors $\boldsymbol{U}$, $\boldsymbol{\Sigma}$ and $\boldsymbol{V}$ as \eqref{eq:E7TMPcrLQZTZ4Gwvh4l}, \eqref{eq:UMDfNzrY9ipxYWqfkF4} and \eqref{eq:DKLgvGWs4ckCDnFExOi} respectively. The singular value decomposition of $\boldsymbol{A}$ is therefore:

$$\boldsymbol{A}= \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T= \begin{pmatrix} 1/\sqrt6&-1/\sqrt2&1/\sqrt3\\ 2\sqrt6&0&-1/\sqrt3\\ 1/\sqrt6&1/\sqrt2&1/\sqrt3 \end{pmatrix} \begin{pmatrix} \sqrt3&0\\0&1\\0&0 \end{pmatrix} \begin{pmatrix} 1/\sqrt2&-1/\sqrt2\\ 1/\sqrt2&1/\sqrt2 \end{pmatrix}$$

We can verify that the decomposition is correct by explicitly multiplying the three matrices.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...