Subsection 4.4.1 An overview
Given a linear operator \(T:V \to V\) on a finite-dimensional vector space \(V\text{,}\) \(T\) is said to be diagonalizable if there exists a basis \(\cB = \{v_1, \dots, v_n\}\) of \(V\) so that the matrix of \(T\) with respect to \(\cB\) is diagonal:
\begin{equation*}
[T]_{\cB}=\begin{bmatrix}
\lambda_1&0&\cdots&0\\
0&\lambda_2&\cdots&0\\
0&0&\ddots&0\\
0&0&0&\lambda_n
\end{bmatrix}
\end{equation*}
where the \(\lambda_i\) are scalars in \(F\text{,}\) not necessarily distinct. A trivial example is the identity linear operator which is diagonalizable with respect to any basis and its matrix is the \(n\times n\) identity matrix.
Note that the diagonal form of the matrix above encodes the information, \(T(v_i) = \lambda_i v_i\) for \(i=1,
\dots, n.\)
In general, given a linear map \(T:V\to V\) on a vector space \(V\) over a field \(F\text{,}\) one can ask whether for a given scalar \(\lambda\in F\text{,}\) there exist nonzero vectors \(v\in
V\text{,}\) so that \(T(v) = \lambda v.\) If they exist, \(\lambda\) is called an eigenvalue of \(T,\) and \(v\ne 0\) an eigenvector for \(T\) corresponding to the eigenvalue \(\lambda.\) Thus \(T\) is diagonalizable if and only if there is a basis for \(V\) consisting of eigenvectors for \(T.\)
Let’s look at several examples. Let \(U= \R[x]\) be the vector space of all polynomials with coefficients in \(\R,\) and let \(V=C^\infty(\R)\) be the vector space of all functions which are infinitely differentiable. Note that \(U\) is a subspace of \(V\text{.}\)
Example 4.4.2. \(T:\R[x]\to\R[x]\) given by \(T(f) =
f'\).
Let \(T:\R[x]\to\R[x]\) be the linear map which takes a polynomial to its first derivative, \(T(f) =f'.\) Does \(T\) have any eigenvectors or eigenvalues?
We must ask how is it possible that
\begin{equation*}
T(f) = f' = \lambda f
\end{equation*}
for a nonzero polynomial \(f?\)
If \(\lambda\ne 0\text{,}\) there can be no nonzero \(f\) since the degrees of \(f'\) and \(\lambda f\) differ by one. So the only possibility left is \(\lambda = 0.\) Do we know any nonzero polynomials \(f\) so that \(T(f) = f' = 0\cdot f = 0?\) Calculus tells us that the only solution to the problem are the constant polynomials. Well maybe not so interesting, but still instructive.
Example 4.4.3. \(T:C^\infty(\R)\to C^\infty(\R)\) given by \(T(f) =
f'\).
Next consider \(T:C^\infty(\R)\to C^\infty(\R)\) to be the same derivative map, but now on the vector space \(V=C^\infty(\R).\) We consider the same problem of finding scalars \(\lambda\) and nonzero functions \(f\) so that
\begin{equation*}
f' = \lambda f.
\end{equation*}
Once again, calculus solves this problem completely as the functions \(f\) are simply the solutions to the first order homogeneous linear differential equation \(y' - \lambda y =
0,\) the solutions to which are all of the form \(f(x) =
Ce^{\lambda x}.\) Note this includes \(\lambda=0\) from the previous case.
Example 4.4.4. \(S:C^\infty(\R)\to C^\infty(\R)\) given by \(S(f) =
f''\).
Finally consider the map \(S:C^\infty(\R)\to C^\infty(\R)\) given by \(S(f)
= f''\text{,}\) the second derivative map, so now we seek functions for which \(S(f) = f'' = \lambda f,\) or in calculus terms solutions to the second order homogeneous differential equation
\begin{equation*}
y'' - \lambda y = 0\text{.}
\end{equation*}
This is an interesting example since the answer depends on the sign of \(\lambda.\) For \(\lambda =0\text{,}\) the fundamental theorem of calculus tells us that solutions are all linear polynomials \(f(x) = ax + b.\)
For \(\lambda <0,\) we can write \(\lambda = -\omega^2.\) We see that \(\sin(\omega x)\) and \(\cos(\omega x)\) are eigenvectors for \(S\) with eigenvalue \(\lambda = -\omega^2.\) Indeed every eigenvector with eigenvalue \(\lambda = -\omega^2 < 0\) is a linear combination of these two.
For \(\lambda > 0\text{,}\) we write \(\lambda = \omega^2\text{,}\) we see that \(e^{\pm \omega x}\) are solutions and as above every eigenvector with eigenvalue \(\lambda=\omega^2 > 0\) is a linear combination of these two.
With a few examples under our belt, we return to the problem of finding a systematic way to determine eigenvalues and eigenvectors. The condition \(T(v) = \lambda v\) is the same as the condition that \((T-\lambda I)v = 0\text{,}\) where \(I\) is the identity linear operator (\(I(v) = v\)) on \(V.\) So let’s put
\begin{equation*}
E_\lambda= \{v\in V\mid T(v) =
\lambda v\}.
\end{equation*}
Then as we just said, \(E_\lambda =
\ker(T-\lambda I),\) so we know that \(E_\lambda\) (being the kernel of a linear map) is a subspace of \(V,\) called the \(\lambda-\)eigenspace of \(T.\)
Since \(E_\lambda\) is a subspace of \(V,\) 0 is always an element, but \(T(0) = \lambda 0=0\) for any \(\lambda\) which is not terribly discriminating, and our goal is to find a basis of the space consisting of eigenvectors, so the zero vector must be excluded.
On a finite-dimensional vector space, finding the eigenvalues and a basis for the corresponding eigenspace is rather algorithmic, at least in principle. Let
\(A\) be the matrix of
\(T\) with respect to any basis
\(\cB\) (it does not matter which). By
Corollary 4.3.8, since
\(T(v) = \lambda v\) if and only if
\begin{equation*}
A[v]_\cB = [T]_\cB[v]_\cB = [T(v)]_\cB =
[\lambda v]_\cB = \lambda [v]_\cB,
\end{equation*}
we can simply describe how to find eigenvalues of the matrix \(A.\)
So now we are looking for scalars \(\lambda\) for which there are nonzero vectors \(v\in F^n\) with \(Av = \lambda v.\) As before, it is more useful to phrase this as seeking values of \(\lambda\) for which \((A-\lambda I_n)\) has a nontrivial kernel. But now remember that \((A-\lambda
I_n):F^n \to F^n\) is a linear operator on \(F^n\text{,}\) so it has a nontrivial kernel if and only if it is not invertible, and invertibility can be detected with the determinant. Thus \(E_\lambda \ne 0\) if and only if \(\det(A-\lambda I) = 0\text{.}\)
Since we want to find all values of \(\lambda\) with \(\det(\lambda I_n-A) = 0,\) we set the problem up with a variable and define the function
\begin{equation*}
\chi_A(x) :=
\det(xI-A).
\end{equation*}
One proves that \(\chi_A\) is a monic polynomial of degree \(n\text{,}\) called the characteristic polynomial of \(A.\) The roots of this polynomial are the eigenvalues of \(A,\) so the first part of the algorithm is to find the roots of the characteristic polynomial. In particular, an \(n\times n\) matrix can have at most \(n\) eigenvalues in \(F,\) counted with multiplicity.
Now for each eigenvalue \(\lambda\text{,}\) there is a corresponding eigenspace, \(E_\lambda\) which is the kernel of \(\lambda I_n - A\text{,}\) or equivalently of \(A-\lambda
I_n.\) Finding the kernel is simply finding the solutions for the system of homogeneous linear equations \((A-\lambda
I_n)X = 0,\) which one can easily do via row reduction.
Subsection 4.4.3 An alternate characterization of diagonalizable
We want to make sense of an alternate definition that an
\(n\times
n\) matrix
\(A\in M_n(F)\)is
diagonalizable if there is an invertible matrix
\(P\in M_n(F)\text{,}\) so that
\(D=P^{-1}AP\) is a diagonal matrix. Recall that in this setting we say that the matrix
\(A\) is
similar to a diagonal matrix.
Suppose that the matrix
\(A\) is given to us as the matrix of a linear transformation
\(T:V\to V\) with respect to a basis
\(\cB\) for
\(V\text{,}\) \(A = [T]_\cB.\) Now
\(T\) is diagonalizable if and only if there is a basis
\(\cE\) of
\(V\) consisting of eigenvectors for
\(T.\) We know that
\([T]_\cE\) is diagonal. But we recall from
Theorem 4.3.12 that
\begin{equation*}
[T]_\cE =
[I]_{\cB}^{\cE}[T]_\cB [I]_\cE^\cB= P^{-1}A P,
\end{equation*}
where \(P =
[I]_\cE^\cB\) is the invertible matrix. Also note that when \(\cB\) is a standard basis, the columns of \(P =
[I]_\cE^\cB\) are simply the coordinate vectors of the eigenvector basis \(\cE.\) This is quite a mouthful, so we should look at some examples.
Example 4.4.9. A simple example to start.
Let \(A=\begin{bmatrix} 5 & 6 & 0 \\ 0 & 5 & 8 \\
0 & 0 & 9 \end{bmatrix}\text{.}\) Then \(\chi_A(x) =
(x-5)^2(x-9)\text{,}\) so we have two eigenvalues 5 and 9. We need to compute the corresponding eigenspaces.
For each eigenvalue \(\lambda\text{,}\) we compute \(\ker(A-\lambda
I_3)\text{,}\) that is find all solutions to \((A-\lambda I_3)x = \0.\)
\begin{equation*}
A-9I= \ba{rrr}
-4 & 6 & 0 \\
0 & -4 & 8 \\
0 & 0 & 0
\ea
\stackrel{RREF}{\mapsto}
\ba{rrr}
1 & 0 & -3 \\
0 & 1 & -2 \\
0 & 0 & 0
\ea,
\end{equation*}
so \(E_9(A) = \ker (A-9I) = \Span\left\{
\ba{r}3\\2\\1\ea \right\}\text{.}\) Similarly,
\begin{equation*}
A-5I= \ba{rrr}
0 & 6 & 0 \\
0 & 0 & 8 \\
0 & 0 & 4 \ea
\stackrel{RREF}{\mapsto}
\ba{rrr}
0 & 1 & 0 \\
0 & 0 & 1 \\
0 & 0 & 0 \ea,
\end{equation*}
so \(E_5(A) = \ker(A-5I) =
\Span\left\{\ba{r}1\\0\\0\ea\right\}\text{.}\)
But \(\left\{\ba{r}3\\2\\1\ea, \ba{r}1\\0\\0\ea \right\}\) is not a basis for \(\R^3\text{,}\) so \(A\) is not diagonalizable.
Example 4.4.11. A more involved example.
Let \(A=\ba{rrrr} 3 & 0 & 2 & 0 \\ 1 & 3 & 1 & 0
\\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 4 \ea.\) Think of \(A\) as \(A=[T]_\cB\text{,}\) the matrix of the linear transformation \(T:\R^4\to\R^4\) with respect to the standard basis \(\cB\) of \(\R^4.\) Then \(A \) has characteristic polynomial \(\chi_A(x) = x^4 - 11 x^3 + 42
x^2 - 64 x + 32= (x -1)(x -2)(x-4)^2\text{.}\)
We know that the eigenspaces \(E_1\) and \(E_2\) will each have dimension one, so are no obstruction to diagonalizability, but since we want to do a bit more with this example, we compute bases for the eigenspaces. If we let \(\cE_\lambda\) denote a basis for the eigenspace \(E_\lambda=\ker (A-\lambda I),\) then as in the previous example via row reduction, we find \(\cE_1 =
\left\{v_1=\ba{r}1\\0\\-1\\0\ea\right\}\) and \(\cE_2 =
\left\{v_2=\ba{r}2\\-1\\-1\\0\ea\right\}\text{.}\)
By Equation
(4.4.1), we know that
\(1 \le \dim E_4 \le 2.\) If the dimension is 1, then
\(A\) is not diagonalizable. As it turns out the dimension is 2, and
\(\cE_4 = \{v_3,
v_4\}=\left\{\ba{r}2\\3\\1\\0\ea,\ba{r}0\\0\\0\\1\ea
\right\}\) is a basis for
\(E_4.\)
Let \(\cE = \cE_1\cup \cE_2\cup\cE_4 = \{v_1, v_2, v_3,
v_4\}\) be the basis of eigenvectors. Then
\begin{equation*}
D = [T]_\cE = \ba{rrrr}
1&0&0&0\\0&2&0&0\\0&0&4&0\\0&0&0&4 \ea = P^{-1}AP,
\end{equation*}
where
\begin{equation*}
P = [I]_\cE^\cB=
\ba{rrrr} 1&2&2&0\\0&-1&3&0\\-1&-1&1&0\\0&0&0&1\ea.
\end{equation*}
Note how the columns of \(P\) are the (coordinate vectors of) the eigenvector basis.