Skip to main content
Logo image

Section 3.4 Diagonalization of matrices in Inner Product Spaces

We examine properties of a matrix in a inner product space which guarantee it is diagonalizable. We also lay the ground work for singular value decomposition of an arbitrary matrix.
In particular, we shall show that a real symmetric matrix and a complex unitary matrix can always be diagonalized.
While such a result is remarkable in and of itself since these properties must somehow guarantee that for such matrices each eigenspace has geometric multiplicity equal to its algebraic multiplicity, it leads us to discover an important result about the representation of any real or complex \(m\times n\) matrix \(A.\) The key is that for any such matrix, both \(A^*A\) and \(AA^*\) are Hermitian matrices. What is even more interesting is that diagonalization of \(A^*A\) still tells us very important information about the original matrix \(A.\)

Subsection 3.4.1 Some relations between \(A\) and \(A^*\)

Let’s begin with some simple properties concerning the rank of a matrix.

Proof of (1).

See Theorem 3.4 of [1].

Proof of (2).

Recall the the number of pivots is equal to the row and column rank, so consider the reduced row-echelon form of the matrix, noting that elementary row operations do not change the row space nor the dimension of the column space.

Proof of (3).

The difference between \(A^*\) and \(A^T\) is simply that the entries of \(A^T\) have been replaced by their complex conjugates, so if there were a linear dependence among the rows of (say) \(A^*\text{,}\) conjugating that relation would produce a linear dependence among the rows of \(A^T.\)

Subsection 3.4.2 A closer look at matrices \(A^*A\) and \(AA^*\text{.}\)

In Corollary 3.3.2, we have seen both of these products of matrices when the columns of \(A\) are orthonormal; one product producing an identity matrix, the other the matrix of the orthogonal projection into the column space of \(A.\) But what can we say in general (when the columns are not orthonormal vectors)?

Proof.

We first show that \(\rank A = \rank(A^*A)\text{.}\) Since \(A\) is \(m\times n\) and \(A^*A\) is \(n\times n\text{,}\) both matrices represent linear transformations from a domain of dimension \(n.\) As such, the rank-nullity theorem says that
\begin{equation*} n = \rank A + \nullity A = \rank(A^*A)+ \nullity(A^*A). \end{equation*}
We show that the two nullspaces (kernels) are equal, hence have the same dimension, and the statement about ranks will follow.
Since any linear map takes 0 to 0, it is clear that \(\ker A \subseteq \ker A^*A.\) Conversely, suppose that \(x\in \ker A^*A.\) Then \(A^*Ax=0\text{,}\) hence \(\la x, A^*Ax\ra =0\text{,}\) so by Proposition 3.2.16,
\begin{equation*} 0=\la x, A^*Ax\ra = \la Ax,Ax\ra \end{equation*}
which implies \(Ax=0\) by the positivity of the inner product. Thus \(\ker A^*A \subseteq \ker A\text{,}\) giving us the desired equality.
To show that \(\rank A = \rank AA^*,\) we show equivalently (see Proposition 3.4.1) that \(\rank A^* = \rank AA^*.\) We showed above that for any matrix \(B\text{,}\) \(\rank B = \rank B^*B,\) so letting \(B = A^*,\) we conclude
\begin{equation*} \rank A^*= \rank ((A^*)^*A^*) = \rank (AA^*). \end{equation*}
Let us note another common property of \(AA^*\) and \(A^*A\text{.}\)

Proof.

This result is fairly general. Suppose that \(A,B\) are two matrices for which both products \(AB\) and \(BA\) are defined, and suppose that \(\lambda\) is a nonzero eigenvalue for \(AB.\) This implies there exists a nonzero vector \(v\) for which \(AB v = \lambda v.\) Multiplying both sides by \(B\) and noting multiplication by \(B\) is a linear map, we conclude that
\begin{equation*} (BA)Bv = \lambda (Bv), \end{equation*}
which shows that \(\lambda\) is an eigenvalue of \(BA\) so long as \(Bv\ne 0\) (eigenvectors need to be nonzero). But if \(Bv=0\text{,}\) then \(AB v = \lambda v=0\) which implies \(\lambda =0\text{,}\) contrary to assumption.
For the eigenvalue \(\lambda=0\text{,}\) the situation can be (and often is) different. Let \(A=\ba{rr}1\amp 1\ea\text{,}\) and consider \(B=A^T.\) Then
\begin{equation*} AB = \ba{r}2\ea \text{ while } BA = \ba{rr}1\amp 1\\1\amp 1\ea. \end{equation*}
The matrix \(AB\) is clearly non-singular, while the rank of \(BA\) is one, hence having a non-trivial nullspace.
Before proceeding, we need to make a few more definitions and raise one cautionary note. For the caution, observe that in general results we state for complex matrices \(A\) hold analogously for real matrices, replacing \(A^*\) by \(A^T\text{.}\) The Spectral Theorem for complex matrices and the Spectral Theorem for real matrices have distinctly different hypotheses, and we want to spend a bit of time explaining why.
While all the terms we list below are defined in the section on definitions, it is useful for comparison to list them explicitly here. Let \(A \in M_n(\C)\) and \(B\in M_n(\R)\)
  • \(A\) is normal if \(AA^* = A^*A\text{.}\)
  • \(A\) is unitary if \(AA^* = A^*A=I_n\)
  • \(A\) is Hermitian if \(A=A^*\text{.}\)
  • \(B\) is normal if \(BB^T = B^TB\text{.}\)
  • \(B\) is orthogonal if \(BB^T = B^TB=I_n\)
  • \(B\) is symmetric if \(B=B^T\text{.}\)
Note that both Hermitian and unitary matrices are normal, though for example a Hermitian matrix is unitary only if \(A^2 = I_n.\) Analogous observations are true for real matrices. The point here is that the complex Spectral Theorem holds for the broad class of normal matrices, but the real Spectral Theorem holds only for the narrower class of real symmetric matrices. We still need to understand why.
We first consider some properties of real orthogonal matrices and complex unitary matrices.

Proof.

As a sample consider the case where \(A^*A = I_n\text{.}\) This says that \(A\) has a left inverse, but since \(A\) is a square matrix, it also has a right one and they are equal.
For the last statement, recall from Proposition 3.2.16 that for any matrix \(A\in M_n(C)\text{,}\)
\begin{equation*} \la Av,w\ra = \la v,A^*w\ra \end{equation*}
for all \(v,w \in \C^n.\) It follows that for an orthogonal/unitary matrix
\begin{equation*} \la Av, Aw\ra = \la v,A^*Aw\ra = \la v,w\ra. \end{equation*}
Below we state some simple versions of the spectral theorems.

Remark 3.4.7.

To gain some appreciation of why there is a difference in hypotheses between the real and complex versions of the spectral theorem, consider the matrix \(A = \ba{rr}0\amp 1\\-1\amp 0\ea\text{,}\) and note that \(A\) is orthogonal (hence normal), but not symmetric. One immediately checks that that characteristic polynomial of \(A\text{,}\) \(\chi_A = x^2 + 1\text{,}\) has no real roots which means \(A\) has no real eigenvalues so cannot possibly be diagonalized to say nothing of orthogonally diagonalized. Clearly one important element of the spectral theorem is that the characteristic polynomial must split completely (factor into all linear factors) over the field. This is given for the complex numbers since they are algebraically closed, but not so for the real numbers. So in the real case, we must somehow guarantee that a real symmetric matrix has only real eigenvalues.
We state the following proposition for complex Hermitian matrices, but it also applies to real symmetric matrices since for a real matrix, \(A^T = A^*\text{.}\) Also note that every real or complex matrix has all its eigenvalues in \(\C.\)

Proof.

Let \(\lambda\) be an eigenvalue of \(A\text{,}\) and let \(v\) be an eigenvector for \(\lambda.\) Then
\begin{equation*} \lambda \la v,v\ra = \la \lambda v,v\ra=\la Av,v\ra= \la v,A^*v\ra = \la v,Av\ra = \la v,\lambda v\ra = \overline\lambda\la v,v\ra, \end{equation*}
where we have used the Hermitian property (\(A^* = A\)) and the sesquilinearity of the inner product. Since \(v\ne 0\text{,}\) we know that \(\la v,v\ra \ne 0\text{,}\) from which we conclude \(\overline \lambda = \lambda\text{,}\) hence \(\lambda\) is real.
Analogous to Proposition 1.5.5 for arbitrary matrices, we have

Proof.

Suppose that \(\lambda\) and \(\mu\) are distinct eigenvalues for \(A.\) Let \(v\) be an eigenvector with eigenvalue \(\lambda\) and \(w\) be an eigenvector with eigenvalue \(\mu.\) Then
\begin{align*} \lambda \la v,w\ra \amp= \la \lambda v,w\ra = \la Av,w\ra \stackrel{(1)}{=} \la v, A^*w\ra\\ \amp\stackrel{(2)}{=}\la v, Aw\ra = \la v,\mu w \ra = \overline\mu\la v,w\ra \stackrel{(3)}{=} \mu\la v,w\ra, \end{align*}
where (1) is true by Proposition 3.2.16, (2) is true since \(A\) is Hermitian, (3) is true by Proposition 3.4.8 and the remaining equalities hold using standard properties of the inner product. Rewriting the expression, we have
\begin{equation*} (\lambda-\mu) \la v,w \ra =0, \end{equation*}
and since \(\lambda\ne \mu\text{,}\) we conclude \(\la v,w\ra =0\) as desired.
The proof of the spectral theorems is rather involved. Of course any matrix over \(\C\) will have the property that its characteristic polynomial splits, but we have also shown this for real symmetric matrices. The hard part is showing that each eigenspace has dimension equal to the algebraic multiplicity of the eigenvalue. For this something like Schur’s theorem is used as a starting point. See Theorem 6.21 of [1].
We would like to use the spectral theorems to advance the proof of the singular value decomposition (SVD) of a matrix, though it is interesting to note that other authors do the reverse, see section 5.4 of [3].

Remark 3.4.10.

We conclude this section with another interpretation of the spectral theorem, giving a spectral decomposition which will be mirrored in the next section on the singular value decomposition.
We restrict our attention to \(n\times n\) matrices \(A\) over the real or complex number which are Hermitian (i.e., symmetric for a real matrix), and consequently for which all the eigenvalues are real. We list the eigenvalues \(\lambda_1, \dots, \lambda_n,\) though this does not mean they need be all distinct. By Theorem 3.4.5, there exists a unitary matrix \(U\) whose columns \(\{u_1, \dots, u_n\}\) form an orthonormal basis of \(\C^n\) consisting of eigenvectors for \(A\) so that
\begin{equation*} \diag(\lambda_1, \dots, \lambda_n) = U A U^*. \end{equation*}
In the discussion preceding Corollary 3.3.2 we used the column-row rule for matrix multiplication to show that
\begin{equation*} UU^* = u_1 u_1^* + \cdots + u_n u_n^* \end{equation*}
which is the orthogonal projection into the column space of \(A\) (all of \(\C^n\) in this case), but viewed as the sum of one-dimensional orthogonal projections onto the spaces spanned by each \(u_i.\) It follows that

Proof.

\begin{equation*} \diag(\lambda_1, \dots, \lambda_n) = U^*A U \end{equation*}
or
\begin{equation*} A = (U^*)^{-1}\diag(\lambda_1, \dots, \lambda_n)U^{-1} = U\diag(\lambda_1, \dots, \lambda_n)U^*, \end{equation*}
since \(U\) is unitary, so \(U^{-1} = U^*\text{,}\) and the result follows.