We examine properties of a matrix in a inner product space which guarantee it is diagonalizable. We also lay the ground work for singular value decomposition of an arbitrary matrix.
While such a result is remarkable in and of itself since these properties must somehow guarantee that for such matrices each eigenspace has
geometric multiplicity equal to its
algebraic multiplicity, it leads us to discover an important result about the representation of any real or complex
\(m\times n\) matrix
\(A.\) The key is that for any such matrix, both
\(A^*A\) and
\(AA^*\) are
Hermitian matrices. What is even more interesting is that diagonalization of
\(A^*A\) still tells us very important information about the original matrix
\(A.\)
Subsection 5.4.2 A closer look at matrices \(A^*A\) and \(AA^*\text{.}\)
In
Corollary 5.3.2, we have seen both of these products of matrices when the columns of
\(A\) are orthonormal; one product producing an identity matrix, the other the matrix of the orthogonal projection into the column space of
\(A.\) But what can we say in general (when the columns are not orthonormal vectors)?
Proposition 5.4.2.
Let \(A\) be any \(m\times n\) matrix with real or complex entries. Then
\begin{equation*}
\rank A = \rank(A^*A) = \rank(AA^*).
\end{equation*}
Proof.
We first show that
\(\rank A = \rank(A^*A)\text{.}\) Since
\(A\) is
\(m\times n\) and
\(A^*A\) is
\(n\times
n\text{,}\) both matrices represent linear transformations from a domain of dimension
\(n.\) As such, the
rank-nullity theorem says that
\begin{equation*}
n = \rank A + \nullity A = \rank(A^*A)+
\nullity(A^*A).
\end{equation*}
We show that the two nullspaces (kernels) are equal, hence have the same dimension, and the statement about ranks will follow.
Since any linear map takes 0 to 0, it is clear that
\(\ker A \subseteq \ker A^*A.\) Conversely, suppose that
\(x\in \ker A^*A.\) Then
\(A^*Ax=0\text{,}\) hence
\(\la x, A^*Ax\ra =0\text{,}\) so by
Proposition 5.2.16,
\begin{equation*}
0=\la x, A^*Ax\ra = \la Ax,Ax\ra
\end{equation*}
which implies \(Ax=0\) by the positivity of the inner product. Thus \(\ker A^*A \subseteq \ker A\text{,}\) giving us the desired equality.
To show that
\(\rank A = \rank AA^*,\) we show equivalently (see
Proposition 5.4.1) that
\(\rank A^* =
\rank AA^*.\) We showed above that for any matrix
\(B\text{,}\) \(\rank B = \rank B^*B,\) so letting
\(B =
A^*,\) we conclude
\begin{equation*}
\rank A^*= \rank ((A^*)^*A^*) = \rank (AA^*).
\end{equation*}
Let us note another common property of \(AA^*\) and \(A^*A\text{.}\)
Proposition 5.4.3.
Let \(A\) be any \(m\times n\) matrix with real or complex entries. Then the nonzero eigenvalues of \(A^*A\) and \(AA^*\) are the same. Note that zero may be an eigenvalue of one product, but not the other.
Proof.
This result is fairly general. Suppose that \(A,B\) are two matrices for which both products \(AB\) and \(BA\) are defined, and suppose that \(\lambda\) is a nonzero eigenvalue for \(AB.\) This implies there exists a nonzero vector \(v\) for which \(AB v = \lambda v.\) Multiplying both sides by \(B\) and noting multiplication by \(B\) is a linear map, we conclude that
\begin{equation*}
(BA)Bv = \lambda (Bv),
\end{equation*}
which shows that \(\lambda\) is an eigenvalue of \(BA\) so long as \(Bv\ne 0\) (eigenvectors need to be nonzero). But if \(Bv=0\text{,}\) then \(AB v = \lambda
v=0\) which implies \(\lambda =0\text{,}\) contrary to assumption.
For the eigenvalue \(\lambda=0\text{,}\) the situation can be (and often is) different. Let \(A=\ba{rr}1\amp 1\ea\text{,}\) and consider \(B=A^T.\) Then
\begin{equation*}
AB = \ba{r}2\ea \text{ while } BA = \ba{rr}1\amp
1\\1\amp 1\ea.
\end{equation*}
The matrix \(AB\) is clearly non-singular, while the rank of \(BA\) is one, hence having a non-trivial nullspace.
Before proceeding, we need to make a few more definitions and raise one cautionary note. For the caution, observe that in general results we state for complex matrices
\(A\) hold analogously for real matrices, replacing
\(A^*\) by
\(A^T\text{.}\) The
Spectral Theorem for complex matrices and the
Spectral Theorem for real matrices have distinctly different hypotheses, and we want to spend a bit of time explaining why.
While all the terms we list below are defined in the section on definitions, it is useful for comparison to list them explicitly here. Let \(A \in M_n(\C)\) and \(B\in M_n(\R)\)
\(A\) is normal if \(AA^* = A^*A\text{.}\)
\(A\) is unitary if \(AA^* =
A^*A=I_n\)
\(A\) is Hermitian if \(A=A^*\text{.}\)
\(B\) is normal if \(BB^T = B^TB\text{.}\)
\(B\) is orthogonal if \(BB^T =
B^TB=I_n\)
\(B\) is symmetric if \(B=B^T\text{.}\)
Note that both Hermitian and unitary matrices are normal, though for example a Hermitian matrix is unitary only if
\(A^2 = I_n.\) Analogous observations are true for real matrices. The point here is that the
complex Spectral Theorem holds for the broad class of normal matrices, but the
real Spectral Theorem holds only for the narrower class of real symmetric matrices. We still need to understand why.
We first consider some properties of real
orthogonal matrices and complex
unitary matrices.
Proposition 5.4.4.
Let \(P\in M_n(\R)\) (resp. \(U\in M_n(\C)\)). The following statements are equivalent:
\(P\) is an orthogonal matrix (resp. \(U\) is a unitary matrix).
\(\displaystyle P^TP = I_n \text{ (resp. } U^*U = I_n \text{)}.\)
\(\displaystyle PP^T = I_n \text{ (resp. } UU^* = I_n\text{)}.\)
\(\displaystyle P^{-1} = P^T \text{ (resp. } U^{-1}=U^*\text{)}\)
\(\la Pv,Pw\ra = \la v,w\ra\) for all \(v,w \in
\R^n\) (resp. \(\la Uv, Uw\ra = \la v,w\ra\) for all \(v,w\in \C^n.\))
Proof.
As a sample consider the case where \(A^*A = I_n\text{.}\) This says that \(A\) has a left inverse, but since \(A\) is a square matrix, it also has a right one and they are equal.
For the last statement, recall from
Proposition 5.2.16 that for any matrix
\(A\in M_n(C)\text{,}\)
\begin{equation*}
\la Av,w\ra = \la v,A^*w\ra
\end{equation*}
for all \(v,w \in
\C^n.\) It follows that for an orthogonal/unitary matrix
\begin{equation*}
\la Av, Aw\ra = \la v,A^*Aw\ra = \la v,w\ra.
\end{equation*}
Below we state some simple versions of the spectral theorems.
Theorem 5.4.5. The Spectral Theorem for normal matrices.
If \(A \in M_n(\C)\) is a normal matrix, then there is a unitary matrix \(U\) and complex scalars \(\lambda_1,
\dots, \lambda_n\) so that
\begin{equation*}
\diag(\lambda_1,
\dots, \lambda_n) = U^*AU.
\end{equation*}
In particular, any complex normal matrix can be unitarily diagonalized. The columns of \(U\) are eigenvectors for \(A\) and form an orthonormal basis for \(\C^n.\)
Theorem 5.4.6. The Spectral Theorem for real symmetric matrices.
If \(A\in M_n(\R)\) is a symmetric matrix, then there exists an orthogonal matrix \(P\) and real scalars \(\lambda_1,
\dots, \lambda_n\) so that
\begin{equation*}
\diag(\lambda_1,
\dots, \lambda_n) = P^TAP.
\end{equation*}
In particular, any real symmetric matrix can be orthogonally diagonalized. The columns of \(P\) are eigenvectors for \(A\) and form an orthonormal basis for \(\R^n.\)
We state the following proposition for complex Hermitian matrices, but it also applies to real symmetric matrices since for a real matrix, \(A^T = A^*\text{.}\) Also note that every real or complex matrix has all its eigenvalues in \(\C.\)
Proposition 5.4.8.
Let \(A\) be a complex Hermitian matrix, and \(\lambda\) an eigenvalue for \(A.\) Then \(\lambda\) is necessarily a real number.
Proof.
Let \(\lambda\) be an eigenvalue of \(A\text{,}\) and let \(v\) be an eigenvector for \(\lambda.\) Then
\begin{equation*}
\lambda \la v,v\ra = \la \lambda v,v\ra=\la Av,v\ra=
\la v,A^*v\ra = \la v,Av\ra = \la v,\lambda v\ra =
\overline\lambda\la v,v\ra,
\end{equation*}
where we have used the Hermitian property (\(A^* = A\)) and the sesquilinearity of the inner product. Since \(v\ne 0\text{,}\) we know that \(\la v,v\ra \ne 0\text{,}\) from which we conclude \(\overline \lambda = \lambda\text{,}\) hence \(\lambda\) is real.
Proposition 5.4.9.
Let \(A \in M_n(\C)\) be Hermitian matrix. Then eigenspaces corresponding to distinct eigenvalues are orthogonal.
Proof.
Suppose that \(\lambda\) and \(\mu\) are distinct eigenvalues for \(A.\) Let \(v\) be an eigenvector with eigenvalue \(\lambda\) and \(w\) be an eigenvector with eigenvalue \(\mu.\) Then
\begin{align*}
\lambda \la v,w\ra \amp= \la \lambda v,w\ra = \la Av,w\ra \stackrel{(1)}{=} \la v, A^*w\ra\\
\amp\stackrel{(2)}{=}\la v, Aw\ra = \la v,\mu w \ra =
\overline\mu\la v,w\ra \stackrel{(3)}{=} \mu\la v,w\ra,
\end{align*}
where (1) is true by
Proposition 5.2.16, (2) is true since
\(A\) is Hermitian, (3) is true by
Proposition 5.4.8 and the remaining equalities hold using standard properties of the inner product. Rewriting the expression, we have
\begin{equation*}
(\lambda-\mu) \la v,w \ra =0,
\end{equation*}
and since \(\lambda\ne \mu\text{,}\) we conclude \(\la v,w\ra =0\) as desired.
The proof of the spectral theorems is rather involved. Of course any matrix over
\(\C\) will have the property that its characteristic polynomial splits, but we have also shown this for real symmetric matrices. The hard part is showing that each eigenspace has dimension equal to the algebraic multiplicity of the eigenvalue. For this something like Schur’s theorem is used as a starting point. See Theorem 6.21 of
[1].
We would like to use the spectral theorems to advance the proof of the singular value decomposition (SVD) of a matrix, though it is interesting to note that other authors do the reverse, see section 5.4 of
[3].
We restrict our attention to
\(n\times n\) matrices
\(A\) over the real or complex number which are Hermitian (i.e., symmetric for a real matrix), and consequently for which all the eigenvalues are real. We list the eigenvalues
\(\lambda_1, \dots, \lambda_n,\) though this does not mean they need be all distinct. By
Theorem 5.4.5, there exists a unitary matrix
\(U\) whose columns
\(\{u_1, \dots, u_n\}\) form an orthonormal basis of
\(\C^n\) consisting of eigenvectors for
\(A\) so that
\begin{equation*}
\diag(\lambda_1, \dots, \lambda_n) = U A
U^*.
\end{equation*}
In the discussion preceding
Corollary 5.3.2 we used the column-row rule for matrix multiplication to show that
\begin{equation*}
UU^* = u_1 u_1^*
+ \cdots + u_n u_n^*
\end{equation*}
which is the orthogonal projection into the column space of \(A\) (all of \(\C^n\) in this case), but viewed as the sum of one-dimensional orthogonal projections onto the spaces spanned by each \(u_i.\) It follows that
Proposition 5.4.11. Spectral decomposition of a Hermitian matrix.
Let \(A \in M_n(\C)\) be a Hermitian matrix with (real) eigenvalues \(\lambda_1, \dots, \lambda_n\text{.}\) Let \(U\) be the unitary matrix whose orthonormal columns \(u_i\) are eigenvectors for the \(\lambda_i.\) Then
\begin{equation*}
A = U\diag(\lambda_1, \dots, \lambda_n)U^* =
\lambda_1u_1u_1^* + \cdots + \lambda_n u_n u_n^*.
\end{equation*}
Proof.
\begin{equation*}
\diag(\lambda_1, \dots, \lambda_n) = U^*A U
\end{equation*}
or
\begin{equation*}
A = (U^*)^{-1}\diag(\lambda_1, \dots,
\lambda_n)U^{-1} = U\diag(\lambda_1, \dots,
\lambda_n)U^*,
\end{equation*}
since \(U\) is unitary, so \(U^{-1} = U^*\text{,}\) and the result follows.