LAC Eigenvalues, eigenvectors, diagonalization

Section 4.4 Eigenvalues, eigenvectors, diagonalization

Subsection 4.4.1 An overview

Given a linear operator \(T:V \to V\) on a finite-dimensional vector space \(V\text{,}\) \(T\) is said to be diagonalizable if there exists a basis \(\cB = \{v_1, \dots, v_n\}\) of \(V\) so that the matrix of \(T\) with respect to \(\cB\) is diagonal:

\begin{equation*} [T]_{\cB}=\begin{bmatrix} \lambda_1&0&\cdots&0\\ 0&\lambda_2&\cdots&0\\ 0&0&\ddots&0\\ 0&0&0&\lambda_n \end{bmatrix} \end{equation*}

where the \(\lambda_i\) are scalars in \(F\text{,}\) not necessarily distinct. A trivial example is the identity linear operator which is diagonalizable with respect to any basis and its matrix is the \(n\times n\) identity matrix.

Note that the diagonal form of the matrix above encodes the information, \(T(v_i) = \lambda_i v_i\) for \(i=1, \dots, n.\)

In general, given a linear map \(T:V\to V\) on a vector space \(V\) over a field \(F\text{,}\) one can ask whether for a given scalar \(\lambda\in F\text{,}\) there exist nonzero vectors \(v\in V\text{,}\) so that \(T(v) = \lambda v.\) If they exist, \(\lambda\) is called an eigenvalue of \(T,\) and \(v\ne 0\) an eigenvector for \(T\) corresponding to the eigenvalue \(\lambda.\) Thus \(T\) is diagonalizable if and only if there is a basis for \(V\) consisting of eigenvectors for \(T.\)

Remark 4.4.1.

While at first glance this may appear an odd notion, consider the case of \(\lambda = 0.\) Asking for a nonzero vector \(v\) so that \(T(v) = 0 v = 0\) is simply asking whether \(T\) has a nontrivial kernel. Indeed, looking for eigenvalues and eigenvectors is a simple generalization of that idea.

Let’s look at several examples. Let \(U= \R[x]\) be the vector space of all polynomials with coefficients in \(\R,\) and let \(V=C^\infty(\R)\) be the vector space of all functions which are infinitely differentiable. Note that \(U\) is a subspace of \(V\text{.}\)

Example 4.4.2. \(T:\R[x]\to\R[x]\) given by \(T(f) = f'\).

Let \(T:\R[x]\to\R[x]\) be the linear map which takes a polynomial to its first derivative, \(T(f) =f'.\) Does \(T\) have any eigenvectors or eigenvalues?

We must ask how is it possible that

\begin{equation*} T(f) = f' = \lambda f \end{equation*}

for a nonzero polynomial \(f?\)

If \(\lambda\ne 0\text{,}\) there can be no nonzero \(f\) since the degrees of \(f'\) and \(\lambda f\) differ by one. So the only possibility left is \(\lambda = 0.\) Do we know any nonzero polynomials \(f\) so that \(T(f) = f' = 0\cdot f = 0?\) Calculus tells us that the only solution to the problem are the constant polynomials. Well maybe not so interesting, but still instructive.

Example 4.4.3. \(T:C^\infty(\R)\to C^\infty(\R)\) given by \(T(f) = f'\).

Next consider \(T:C^\infty(\R)\to C^\infty(\R)\) to be the same derivative map, but now on the vector space \(V=C^\infty(\R).\) We consider the same problem of finding scalars \(\lambda\) and nonzero functions \(f\) so that

\begin{equation*} f' = \lambda f. \end{equation*}

Once again, calculus solves this problem completely as the functions \(f\) are simply the solutions to the first order homogeneous linear differential equation \(y' - \lambda y = 0,\) the solutions to which are all of the form \(f(x) = Ce^{\lambda x}.\) Note this includes \(\lambda=0\) from the previous case.

Example 4.4.4. \(S:C^\infty(\R)\to C^\infty(\R)\) given by \(S(f) = f''\).

Finally consider the map \(S:C^\infty(\R)\to C^\infty(\R)\) given by \(S(f) = f''\text{,}\) the second derivative map, so now we seek functions for which \(S(f) = f'' = \lambda f,\) or in calculus terms solutions to the second order homogeneous differential equation

\begin{equation*} y'' - \lambda y = 0\text{.} \end{equation*}

This is an interesting example since the answer depends on the sign of \(\lambda.\) For \(\lambda =0\text{,}\) the fundamental theorem of calculus tells us that solutions are all linear polynomials \(f(x) = ax + b.\)

For \(\lambda <0,\) we can write \(\lambda = -\omega^2.\) We see that \(\sin(\omega x)\) and \(\cos(\omega x)\) are eigenvectors for \(S\) with eigenvalue \(\lambda = -\omega^2.\) Indeed every eigenvector with eigenvalue \(\lambda = -\omega^2 < 0\) is a linear combination of these two.

For \(\lambda > 0\text{,}\) we write \(\lambda = \omega^2\text{,}\) we see that \(e^{\pm \omega x}\) are solutions and as above every eigenvector with eigenvalue \(\lambda=\omega^2 > 0\) is a linear combination of these two.

With a few examples under our belt, we return to the problem of finding a systematic way to determine eigenvalues and eigenvectors. The condition \(T(v) = \lambda v\) is the same as the condition that \((T-\lambda I)v = 0\text{,}\) where \(I\) is the identity linear operator (\(I(v) = v\)) on \(V.\) So let’s put

\begin{equation*} E_\lambda= \{v\in V\mid T(v) = \lambda v\}. \end{equation*}

Then as we just said, \(E_\lambda = \ker(T-\lambda I),\) so we know that \(E_\lambda\) (being the kernel of a linear map) is a subspace of \(V,\) called the \(\lambda-\)eigenspace of \(T.\)

Since \(E_\lambda\) is a subspace of \(V,\) 0 is always an element, but \(T(0) = \lambda 0=0\) for any \(\lambda\) which is not terribly discriminating, and our goal is to find a basis of the space consisting of eigenvectors, so the zero vector must be excluded.

On a finite-dimensional vector space, finding the eigenvalues and a basis for the corresponding eigenspace is rather algorithmic, at least in principle. Let \(A\) be the matrix of \(T\) with respect to any basis \(\cB\) (it does not matter which). By Corollary 4.3.8, since \(T(v) = \lambda v\) if and only if

\begin{equation*} A[v]_\cB = [T]_\cB[v]_\cB = [T(v)]_\cB = [\lambda v]_\cB = \lambda [v]_\cB, \end{equation*}

we can simply describe how to find eigenvalues of the matrix \(A.\)

So now we are looking for scalars \(\lambda\) for which there are nonzero vectors \(v\in F^n\) with \(Av = \lambda v.\) As before, it is more useful to phrase this as seeking values of \(\lambda\) for which \((A-\lambda I_n)\) has a nontrivial kernel. But now remember that \((A-\lambda I_n):F^n \to F^n\) is a linear operator on \(F^n\text{,}\) so it has a nontrivial kernel if and only if it is not invertible, and invertibility can be detected with the determinant. Thus \(E_\lambda \ne 0\) if and only if \(\det(A-\lambda I) = 0\text{.}\)

Remark 4.4.5.

Since for any \(n\times n\) matrix \(B,\) \(\det(-B) = (-1)^n\det B\text{,}\) we have \(\det(A-\lambda I_n) = 0\) if and only if \(\det(\lambda I_n-A) = 0\text{.}\) One of these expressions is more convenient for the theory, while the other one is more convenient for computation.

Since we want to find all values of \(\lambda\) with \(\det(\lambda I_n-A) = 0,\) we set the problem up with a variable and define the function

\begin{equation*} \chi_A(x) := \det(xI-A). \end{equation*}

One proves that \(\chi_A\) is a monic polynomial of degree \(n\text{,}\) called the characteristic polynomial of \(A.\) The roots of this polynomial are the eigenvalues of \(A,\) so the first part of the algorithm is to find the roots of the characteristic polynomial. In particular, an \(n\times n\) matrix can have at most \(n\) eigenvalues in \(F,\) counted with multiplicity.

Now for each eigenvalue \(\lambda\text{,}\) there is a corresponding eigenspace, \(E_\lambda\) which is the kernel of \(\lambda I_n - A\text{,}\) or equivalently of \(A-\lambda I_n.\) Finding the kernel is simply finding the solutions for the system of homogeneous linear equations \((A-\lambda I_n)X = 0,\) which one can easily do via row reduction.

Subsection 4.4.2 Taking stock of where we are

Given a matrix \(A \in M_n(F)\text{,}\) we consider the characteristic polynomial \(\chi_A(x) = \det(xI-A)\) which is a monic polynomial of degree \(n\) in \(F[x].\) When \(F= \C\) (or any algebraically closed field), \(\chi_A\) is guaranteed to have all of its roots in \(F,\) but not so otherwise. For example, if \(F=\R\) and

\begin{equation*} A=\ba{rr} 0&1\\ -1&0\\ \ea, B=\ba{rrr} 4&0&0\\ 0&0&1\\ 0&-1&0\\ \ea, \text{ and } C=\begin{bmatrix} 4&0&0\\ 0&0&1\\ 0&1&0\\ \end{bmatrix}, \end{equation*}

then \(\chi_A(x) = x^2 + 1\) and \(\chi_B(x)=(x-4)(x^2+1)\text{,}\) so neither \(A\) nor \(B\) has all its eigenvalues in \(F=\R.\) On the other hand, \(\chi_C(x) = (x-4)(x-1)(x+1)\) does have all its eigenvalues in \(F=\R.\)
So in the general case, a matrix \(A \in M_n(F)\) will have a characteristic polynomial \(\chi_A\) exhibiting a factorization of the form:

\begin{equation*} \chi_A(x) = (x-\lambda_1)^{m_1} \cdots(x-\lambda_r)^{m_r}q(x), \end{equation*}

where either \(q(x)\) is the constant 1 or is a polynomial of degree \(\ge 2\) with no roots in \(F.\) It will follow that if \(q(x) \ne 1,\) then \(A\) cannot be diagonalized, though something can still be said.
Let’s assume that

\begin{equation*} \chi_A(x) = (x-\lambda_1)^{m_1} \cdots (x-\lambda_r)^{m_r}, \end{equation*}

with \(\lambda_1, \dots, \lambda_r\) the distinct eigenvalues of \(A\) in \(F.\) The exponents \(m_i\) are called the algebraic multiplicities of the corresponding eigenvalues.

By comparing degrees, we see that

\begin{equation*} n = m_1+ \cdots + m_r. \end{equation*}

Moreover since the \(\lambda_k\) are roots of the characteristic polynomial, we know that \(\det(A-\lambda_k I) = 0\text{,}\) which guarantees that \(E_{\lambda_k} \ne \{0\}.\) Indeed, it is not hard to show that

\begin{equation} 1 \le \dim E_{\lambda_k} \le m_k, \text{ for } k = 1, \dots, r.\tag{4.4.1} \end{equation}

Another important result is the

Proposition 4.4.6.

Suppose that the matrix \(A\) has distinct eigenvalues \(\lambda_1, \dots, \lambda_r\text{,}\) and that the eigenspace \(E_{\lambda_k}\) has basis \(\cB_k\text{,}\) \(k = 1, \dots, r.\) Then \(\cB=\cB_1 \cup \cB_2\cup \cdots \cup \cB_r\) is a linearly independent set.

Now recall that a linear operator \(T:V\to V\) (resp. square matrix \(A\in M_n(F)\)) is diagonalizable if and only if there is a basis of \(V\) (resp. \(F^n\)) consisting of eigenvectors for \(T\) (resp. \(A\)). From the proposition above, the largest linearly independent set of eigenvectors which can be constructed has size

\begin{align*} |\cB| \amp = \dim E_{\lambda_1} + \cdots + \dim(E_{\lambda_r})\\ \amp \le m_1 + \cdots + m_r = n = \dim V. \end{align*}

We summarize our results as

Theorem 4.4.7. Diagonalizability criterion.

A matrix \(A \in M_n(F)\) is diagonalizable if and only if

The characteristic polynomial \(\chi_A\) factors into linear factors over \(F\text{:}\)
\begin{equation*} \chi_A(x) = (x-\lambda_1)^{m_1} \cdots (x-\lambda_r)^{m_r} \end{equation*}
with the \(\lambda_i\) distinct, and
\(\dim E_{\lambda_i} = m_i\text{,}\) for \(i=1, \dots, r.\)

Corollary 4.4.8. A sufficient condition for diagonalizability.

Suppose the matrix \(A \in M_n(F)\) has characteristic polynomial which factors into distinct linear factors over \(F\text{:}\)

\begin{equation*} \chi_A(x) = (x-\lambda_1) \cdots (x-\lambda_n) \end{equation*}

with the \(\lambda_i\) distinct. Then \(A\) is diagonalizable.

Proof.

We know that there are \(n\) eigenspaces each with dimension at least one which gives at least \(n\) linearly independent eigenvectors. As \(F^n\) is \(n\)-dimensional, these form a basis for the space, so \(A\) is diagonalizable.

Subsection 4.4.3 An alternate characterization of diagonalizable

We want to make sense of an alternate definition that an \(n\times n\) matrix \(A\in M_n(F)\)is diagonalizable if there is an invertible matrix \(P\in M_n(F)\text{,}\) so that \(D=P^{-1}AP\) is a diagonal matrix. Recall that in this setting we say that the matrix \(A\) is similar to a diagonal matrix.

Suppose that the matrix \(A\) is given to us as the matrix of a linear transformation \(T:V\to V\) with respect to a basis \(\cB\) for \(V\text{,}\) \(A = [T]_\cB.\) Now \(T\) is diagonalizable if and only if there is a basis \(\cE\) of \(V\) consisting of eigenvectors for \(T.\) We know that \([T]_\cE\) is diagonal. But we recall from Theorem 4.3.12 that

\begin{equation*} [T]_\cE = [I]_{\cB}^{\cE}[T]_\cB [I]_\cE^\cB= P^{-1}A P, \end{equation*}

where \(P = [I]_\cE^\cB\) is the invertible matrix. Also note that when \(\cB\) is a standard basis, the columns of \(P = [I]_\cE^\cB\) are simply the coordinate vectors of the eigenvector basis \(\cE.\) This is quite a mouthful, so we should look at some examples.

Example 4.4.9. A simple example to start.

Let \(A=\begin{bmatrix} 5 & 6 & 0 \\ 0 & 5 & 8 \\ 0 & 0 & 9 \end{bmatrix}\text{.}\) Then \(\chi_A(x) = (x-5)^2(x-9)\text{,}\) so we have two eigenvalues 5 and 9. We need to compute the corresponding eigenspaces.

For each eigenvalue \(\lambda\text{,}\) we compute \(\ker(A-\lambda I_3)\text{,}\) that is find all solutions to \((A-\lambda I_3)x = \0.\)

\begin{equation*} A-9I= \ba{rrr} -4 & 6 & 0 \\ 0 & -4 & 8 \\ 0 & 0 & 0 \ea \stackrel{RREF}{\mapsto} \ba{rrr} 1 & 0 & -3 \\ 0 & 1 & -2 \\ 0 & 0 & 0 \ea, \end{equation*}

so \(E_9(A) = \ker (A-9I) = \Span\left\{ \ba{r}3\\2\\1\ea \right\}\text{.}\) Similarly,

\begin{equation*} A-5I= \ba{rrr} 0 & 6 & 0 \\ 0 & 0 & 8 \\ 0 & 0 & 4 \ea \stackrel{RREF}{\mapsto} \ba{rrr} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \ea, \end{equation*}

so \(E_5(A) = \ker(A-5I) = \Span\left\{\ba{r}1\\0\\0\ea\right\}\text{.}\)

But \(\left\{\ba{r}3\\2\\1\ea, \ba{r}1\\0\\0\ea \right\}\) is not a basis for \(\R^3\text{,}\) so \(A\) is not diagonalizable.

Remark 4.4.10.

It is important to note in the example above that if we simply wanted to know whether \(A\) is diagonalizable or not, we did not have to do all of this work. Diagonalizability is possible if and only if the algebraic multiplicity of each eigenvalue equals the dimension of the corresponding eigenspace. An eigenvalue with algebraic multiplicity one (a simple root of \(\chi_A\)) will always have a one-dimensional eigenspace, so the issue for us was discovering that \(\dim E_5(A) =1\) while the algebraic multiplicity of \(\lambda=5\) is 2.

Example 4.4.11. A more involved example.

Let \(A=\ba{rrrr} 3 & 0 & 2 & 0 \\ 1 & 3 & 1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 4 \ea.\) Think of \(A\) as \(A=[T]_\cB\text{,}\) the matrix of the linear transformation \(T:\R^4\to\R^4\) with respect to the standard basis \(\cB\) of \(\R^4.\) Then \(A \) has characteristic polynomial \(\chi_A(x) = x^4 - 11 x^3 + 42 x^2 - 64 x + 32= (x -1)(x -2)(x-4)^2\text{.}\)

We know that the eigenspaces \(E_1\) and \(E_2\) will each have dimension one, so are no obstruction to diagonalizability, but since we want to do a bit more with this example, we compute bases for the eigenspaces. If we let \(\cE_\lambda\) denote a basis for the eigenspace \(E_\lambda=\ker (A-\lambda I),\) then as in the previous example via row reduction, we find \(\cE_1 = \left\{v_1=\ba{r}1\\0\\-1\\0\ea\right\}\) and \(\cE_2 = \left\{v_2=\ba{r}2\\-1\\-1\\0\ea\right\}\text{.}\)

By Equation (4.4.1), we know that \(1 \le \dim E_4 \le 2.\) If the dimension is 1, then \(A\) is not diagonalizable. As it turns out the dimension is 2, and \(\cE_4 = \{v_3, v_4\}=\left\{\ba{r}2\\3\\1\\0\ea,\ba{r}0\\0\\0\\1\ea \right\}\) is a basis for \(E_4.\)

Let \(\cE = \cE_1\cup \cE_2\cup\cE_4 = \{v_1, v_2, v_3, v_4\}\) be the basis of eigenvectors. Then

\begin{equation*} D = [T]_\cE = \ba{rrrr} 1&0&0&0\\0&2&0&0\\0&0&4&0\\0&0&0&4 \ea = P^{-1}AP, \end{equation*}

where

\begin{equation*} P = [I]_\cE^\cB= \ba{rrrr} 1&2&2&0\\0&-1&3&0\\-1&-1&1&0\\0&0&0&1\ea. \end{equation*}

Note how the columns of \(P\) are the (coordinate vectors of) the eigenvector basis.

Prev Top Next