Skip to main content
Logo image

Section 3.2 Orthogonality and applications

Throughout all vector spaces are inner product spaces over the field \(F = \R \text{ or } \C\) with inner product \(\la\cdot,\cdot\ra.\) Generally the vector spaces are finite-dimensional unless noted.

Subsection 3.2.1 Orthogonal and Orthonormal Bases

Recall that a set \(S\) of vectors is orthogonal if every pair of distinct vectors in \(S\) is orthogonal, and the set is orthonormal if \(S\) is an orthogonal set of unit vectors.

Example 3.2.1. The standard basis in \(F^n\).

Let \(\cE=\{e_1, e_2, \dots, e_n\}\) be the standard basis in \(F^n\) (\(e_i\) has a one in the \(i\)th coordinate and zeros elsewhere). It is immediate to check that this is an orthonormal basis for \(F^n.\)
We first make a very simple observation about an orthogonal set of nonzero vectors; they are linearly independent.

Proof.

Suppose that \(S\) is a linearly dependent set. Then there exist vectors \(v_{i_1}, \dots, v_{i_k} \in S\) and scalars \(a_{i_j}\) not all zero so that
\begin{equation*} v= a_{i_1}v_{i_1} + \cdots + a_{i_k}v_{i_k} = 0. \end{equation*}
Indeed, there is no loss to assume all the coefficients are nonzero, so let’s say \(a_{i_1} \ne 0.\) We know that since \(v=0,\) \(\la v,v_{i_1}\ra=0,\) but we now compute it differently and see
\begin{equation*} 0 = \la v,v_{i_1}\ra = \sum_{j=1}^k a_{i_j}\la v_{i_j},v_{i_1}\ra = a_{i_1}\la v_{i_1},v_{i_1}\ra = a_{i_1}\|v_{i_1}\|^2. \end{equation*}
But \(v_{i_1}\ne 0,\) so its length is nonzero, forcing \(a_{i_1}=0,\) a contradiction.
Orthonormal bases offer distinct advantages in terms of representing coordinate vectors or the matrix of a linear map. For example if \(\cB=\{v_1, \dots, v_n\}\) is a basis for a vector space \(V,\) we know that every \(v\in V\) has a unique representation as \(v = a_1v_1 + \cdots + a_nv_n\) the coefficients of which provide the coordinate vector \([v]_\cB.\) But determining the coordinates is often a task which requires some work. With an orthonormal basis, this process is completely mechanical.

Proof of (1).

Write \(v = a_1e_1 + \cdots + a_n e_n\text{.}\) Then using the linearity of the inner product in the first variable and \(\la e_i,e_j\ra = \delta_{ij},\) the Kronecker delta, we have
\begin{equation*} \la v, e_j\ra = \sum_{i=1}^n a_i\la e_i,e_j\ra = a_j. \end{equation*}

Proof of (2).

In Subsection 1.4.2, we saw that the matrix of \(T\) is given by \(A = [T]_{\cB_V}^{\cB_W}\) where
\begin{equation*} T(e_j) = \sum_{k=1}^m A_{kj}f_k. \end{equation*}
We now compute
\begin{equation*} \la T(e_j),f_i\ra = \la \sum_{k=1}^m A_{kj}f_k, f_i\ra =\sum_{k=1}^m A_{kj}\la f_k,f_i\ra = A_{ij}. \end{equation*}
It is clear that orthonormal bases have distinct advantages and there is a standard algorithm to produce one from an arbitrary basis, but to understand why the algorithm should work, we need to review projections.
From applications of vector calculus, one recalls the orthogonal projection of a vector \(v\) onto the line spanned by a vector \(u\text{.}\) The projection is a vector parallel to \(u,\) so is of the form \(\lambda u\) for some scalar \(\lambda.\) Referring to the figure below, if \(\theta\) is the angle between the vectors \(u\) and \(v\text{,}\) then the length of \(\proj_u v\) is \(\|v\| \cos\theta\) (technically its absolute value). But \(\cos\theta = \la u,v\ra/(\|u\|\|v\|)\text{,}\) and the direction of \(u\) is given by the unit vector, \(\ds\frac{u}{\|u\|},\) parallel to \(u\text{,}\) so putting things together we see that
\begin{equation*} \proj_u v = (\|v\|\cos\theta) \frac{u}{\|u\|}= \|v\| \frac{\la u,v\ra}{\|u\|\|v\|}\frac{u}{\|u\|} = \frac{\la u,v\ra}{\|u\|^2}u, \end{equation*}
so the scalar \(\lambda\) referred to above is \(\ds\frac{\la u,v\ra}{\|u\|^2}.\) We also note that the vector \(w:= v - \proj_u v\) is orthogonal to \(u.\)
Now the key to an algorithm which takes an arbitrary basis to an orthogonal one is the above construction. Note that in the figure below, the vectors \(u\) and \(v\) are not parallel, so form a linearly independent set. The vectors \(u\) and \(w\) are orthogonal (hence linearly independent) and have the same span as the original vectors. Thus we have turned an arbitrary basis of two elements into an orthogonal one. The Gram-Schmidt process below extends this idea inductively.
Figure 3.2.4. Orthogonal projection of vector \(v\) onto \(u\)
We note that the first two steps of the Gram-Schmidt process are exactly what we did above with the orthogonal projection.

Subsection 3.2.2 Orthogonal complements and projections

Let \(V\) be an inner product space and \(W\) a subspace. Define
\begin{equation*} W^\perp =\{v\in V\mid \la v,W\ra = 0\}. \end{equation*}
The set \(W^\perp\) is called the orthogonal complement of \(W\) in \(V.\) The notation \(\la v,W\ra =0\) means that \(\la v,w\ra = 0\) for all \(w\in W\text{,}\) so every vector in \(W^\perp\) is orthogonal to every vector of \(W\text{.}\)

Example 3.2.6. The orthogonal complement of a plane.

For example, if \(V=\R^3\text{,}\) and \(W\) is a line through the origin, then \(W^\perp\text{,}\) the orthogonal complement of \(W\text{,}\) is a plane through the origin for which the line defines the normal vector.

Checkpoint 3.2.7. Is the orthogonal complement a subspace?

If \(W\) is a subspace of a vector space \(V\text{,}\) is \(W^\perp\) necessarily a subspace of \(V?\)
Hint.
How do we check? Is \(0 \in W^\perp\) (why?). If \(u_1, u_2 \in W^\perp\) what about \(u_1+u_2\) and \(\lambda u_1\text{?}\) (why?)}
If may occur to you that the task of finding a vector in \(W^\perp\) could be daunting since you have to check it is orthogonal to every vector in \(W\text{.}\) Or do you?

Checkpoint 3.2.8. How do we check if a vector is in the orthogonal complement?

Let \(S\) be a set of vectors in a vector space \(V,\) and \(W = \Span(S)\text{.}\) Show that a vector \(v \in W^\perp\) if and only if \(\la v,s\ra = 0\) for every \(s\in S.\) This means there is only a finite amount of work for any subspace with a finite basis.
Moreover, we know that \(W^\perp\) is a subspace of \(V,\) but what you have shown is that \(S^\perp = W^\perp\) is also.
Hint.
Everything in \(\Span(S)\) is a linear combination of the elements of \(S,\) and we know how to expand \(\la v, \sum_{k=1}^m \lambda_i s_i \ra.\)
We shall see below that if \(V\) is an inner product space and \(W\) a finite-dimensional subspace, then every vector in \(V\) can be written uniquely as \(v = w + w^\perp\text{,}\) i.e., for unique \(w\in W\) and \(w^\perp \in W^\perp.\) In different notation, that will say that \(V = W \oplus W^\perp,\) that \(V\) is the direct sum of \(W\) and \(W^\perp.\)
For now let us verify only the simple part of showing it is a direct sum, showing that \(W\cap W^\perp = \{0\}.\)

Proof.

Let \(w \in W\cap W^\perp.\) If \(w\ne 0\text{,}\) then by the properties of an inner product \(\la w,w\ra\ne 0.\) But since \(w\in W^\perp,\) the vector \(w\) is orthogonal to every vector in \(W,\) in particular to \(w,\) a contradiction.

Subsection 3.2.3 What good is an orthogonal complement anyway?

Let’s say that after a great deal of work we have obtained an \(m\times n\) matrix \(A\) and column vector \(b\text{,}\) and desperately want to solve the linear system \(Ax=b.\)
We know that the system is solvable if and only if \(b\) is in \(C(A)\text{,}\) the column space of \(A\text{.}\) But what if \(b\) is not is the column space? We want to solve this problem, right? Should we just throw up our hands?
This dilemma is not dissimilar from trying to find a rational number equal to \(\sqrt 2.\) It cannot be done. But there are rational numbers arbitrarily close to \(\sqrt 2.\) Perhaps an approximation to a solution would be good enough.
So now let’s make the problem geometric. Suppose we have a plane \(P\) in \(\R^3\) and a point \(x\) not on the plane. How would we find the point on \(P\) closest to the point \(x\text{?}\) Intuitively, we might “drop a perpendicular” from the point to the plane and the point \(x_0\) where it intersects would be the desired closest point.
This is correct and gives us the intuition to develop the notion of an orthogonal projection. To apply it to our inconsistent linear system, we want to find a column vector \(\hat b\) (in the column space of \(A\)) closest to \(b\text{.}\) We then check (see Corollary 3.2.15) that the solution \(\hat x\) to \(Ax=\hat b\) satisfies the property that
\begin{equation*} \| A\hat x -b\| \le \|Ax -b\| \text{ for any } x\in \R^n. \end{equation*}
Since the original system \(Ax=b\) is not solvable, we know that \(\|Ax-b\| >0\) for every \(x\text{,}\) and that difference is an error term given by the distance between \(Ax\) and \(b.\) The value \(\hat x\) minimizes the error, and is called the least squares solution to \(Ax=b\) (since there is no exact solution). We shall explore this in more detail a bit later.

Subsection 3.2.4 Orthogonal Projections

Now we want to take our intuitive example of “dropping a perpendicular” and develop it into a formal tool for inner product spaces.
Let \(V\) be an inner product space and \(W\) be a finite-dimensional subspace. Since \(W\) has a basis, we can use the Gram-Schmidt process to produce and orthogonal basis \(\{w_1, \dots , w_r\}\) for \(W\text{.}\)

Proof.

Certainly \(w\) as defined is an element of \(W\text{,}\) and to see that \(w^\perp = v-w\) is orthogonal to \(W\text{,}\) it is sufficient by Checkpoint 3.2.8 to verify that \(\la w^\perp, w_i\ra = 0\) for each \(i=1, \dots, r.\)
Using the definition of \(w^\perp\) and bilinearity of the inner product we have
\begin{equation*} \la w^\perp, w_i\ra = \la v-w,w_i\ra = \la v,w_i\ra - \la w,w_i\ra, \end{equation*}
and since the\(\{w_j\}\) form an orthogonal basis, the expression for \(w\) in (3.2.1) gives
\begin{equation*} \la w,w_i\ra = \la\frac{\la v,w_i\ra}{\la w_i,w_i\ra} w_i, w_i\ra = \la\frac{\la v,w_i\ra}{\la w_i,w_i\ra} \la w_i,w_i\ra = \la v,w_i\ra. \end{equation*}
It is now immediate from the first displayed equation that
\begin{equation*} \la w^\perp, w_i\ra = \la v-w,w_i\ra = \la v,w_i\ra - \la w,w_i\ra = \la v,w_i\ra - \la v,w_i\ra =0, \end{equation*}
as desired.
Finally to see that \(w^\perp\) and \(w\) are uniquely determined by these conditions, suppose that as above \(v= w^\perp + w\text{,}\) and also \(v = w_1^\perp + w_1\) with \(w_1\in W\) and \(w_1^\perp \in W^\perp\text{.}\)
Setting the two expressions equal to each other and solving gives that
\begin{equation*} w-w_1 = w_1^\perp - w^\perp. \end{equation*}
But the left hand side is an element of \(W\) while the right hand side is an element of \(W^\perp,\) so by Proposition 3.2.9, both expressions equal zero, which gives the uniqueness.
Another useful property of the orthogonal complement is

Proof.

Recall that
\begin{equation*} W^\perp = \{v \in V\mid \la v,W\ra =0\}, \end{equation*}
so
\begin{equation*} (W^\perp)^\perp = \{v \in V\mid \la v,W^\perp\ra = 0. \end{equation*}
In particular, every \(w \in W\) is orthogonal to all of \(W^\perp,\) so that \(W \subseteq (W^\perp)^\perp.\) The other containment takes a bit more care.
Let \(v \in (W^\perp)^\perp.\) Since \(W\) is finite-dimensional, Theorem 3.2.10 says that \(v\) can be written uniquely as
\begin{equation*} v = w^\perp + w \end{equation*}
where \(w\in W\) and \(w^\perp \in W^\perp.\) The goal is to show that \(w^\perp = 0.\)
Consider \(w^\perp = v-w.\) Since \(v \in (W^\perp)^\perp\text{,}\) and \(w\in W \subseteq (W^\perp)^\perp,\) we conclude \(w^\perp \in (W^\perp)^\perp,\) so \(\la w^\perp , W^\perp\ra = 0.\) But \(w^\perp \in W^\perp\) by the theorem, so \(\la w^\perp,w^\perp\ra =0\) implying that \(w^\perp =0\) by the axioms for an inner product. Thus \(v = w \in W,\) meaning \((W^\perp)^\perp \subseteq W,\) giving us the desired equality.

Definition 3.2.13.

If \(V\) is an inner product space and \(W\) a finite-dimensional subspace with orthogonal basis \(\{w_1, \dots , w_r\}\text{,}\) then the orthogonal projection of a vector \(v\) onto the subspace \(W\) is given by the expression in Theorem 3.2.10:
\begin{equation*} \proj_W v := \frac{\la v,w_1\ra}{\la w_1,w_1\ra} w_1 + \cdots+ \frac{\la v,w_r\ra}{\la w_r,w_r\ra} w_r. \end{equation*}

Proof.

Combining Theorem 3.2.10 with the definition of projection, we know that \(w\) can be written uniquely as \(w=w^\perp+\proj_W w,\) where \(w^\perp \in W^\perp.\) But \(w = 0+w\text{,}\) so \(w^\perp = 0\) and \(w = \proj_W w\text{.}\)
To complete our formalization of the idea of dropping a perpendicular, we now show that the projection \(\proj_W v\) of a vector \(v\) is the unique vector in \(W\) closest to \(v.\)

Proof.

By Corollary 3.2.14, we may assume that \(v \notin W,\) so consider any \(w \in W\) with \(w \ne \proj_W v.\) We certainly know that
\begin{equation*} v - w = v -\proj_W v + \proj_W v - w, \end{equation*}
and we know that \(\proj_W v - w \in W\) while by Theorem 3.2.10 we know that \(v -\proj_W v \in W^\perp.\) Thus the vectors \(v-w\text{,}\) \(v -\proj_W v\) and \(\proj_W v -w\) form a right triangle whose lengths satisfy the Pythagorean identity:
\begin{equation*} \|v - w\|^2 = \|v -\proj_W v \|^2 + \|\proj_W v -w\|^2. \end{equation*}
It follows that if \(w \ne \proj_W v\text{,}\) that \(\|\proj_W v -w\|>0\text{,}\) so that \(\|v - w\| > \|v -\proj_W v\|.\)

Subsection 3.2.5 A first look at the four fundamental subspaces

While in the previous section, we have seen how orthogonal projections and complements are related, there is another prominent place in which orthogonal complements arise naturally.
Let \(A \in M_{m\times n}(\C)\text{.}\) Associated to \(A\) we have a linear transformation \(L_A: \C^n \to \C^m\) given by left multiplication by \(A\text{.}\) To obviate the need to introduce \(L_A\text{,}\) we often write \(\ker A\) for \(\ker L_A\text{,}\) and \(\range A\) for \(\range L_A\) which we know is the column space, \(C(A)\text{,}\) of \(A\text{.}\)
Additionally, we also have a linear transformation \(L_{A^*}: \C^m \to \C^n\) given by left multiplication by \(A^*\text{.}\) We have the following very useful property relating \(A\) and \(A^*\text{:}\)

Proof.

Recall the inner product \(\la v,w\ra\) in \(\C^\ell\) is \(w^* v\) the matrix product of a \(1\times \ell\) row vector with an \(\ell\times 1\) column vector. Thus
\begin{equation*} \la Ax,y\ra_m = y^* Ax = (A^*y)^* x = \la x, A^* y\ra_n. \end{equation*}
Many authors, e.g., [2] and [3], define the four fundamental subspaces. For complex matrices, these are most easily described by the kernel and range of \(A\) and \(A^*.\) For real matrices, the same identities can be rewritten in terms of the row and column spaces of \(A\) and \(A^T.\) The significance of these four subspaces will be evident when we discuss the singular value decomposition of a matrix in Section 3.6, but for now we reveal their basic relations.

Proof.

Let \(w\in \ker A^*\text{.}\) Then \(A^*w = 0,\) hence \(\la A^*w , v\ra = 0\) for all \(v\in \C^n.\) By taking complex conjugates in Proposition 3.2.16,
\begin{equation*} 0 = \la A^*w , v\ra = \la w, Av\ra, \end{equation*}
so \(w\) is orthogonal to everything in \(\range(A)=C(A).\) This gives the inclusion \(\ker(A^*) \subseteq \range(A)^\perp.\)
Conversely, if \(w \in \range(A)^\perp,\) then for all \(v \in \C^n\text{,}\)
\begin{equation*} 0= \la w, Av \ra = \la A^* w,v\ra. \end{equation*}
In particular, taking \(v=A^*w\text{,}\) we have \(\la A^* w, A^*w\ra=0\) which means that \(A^*w =0,\) showing that \(\range(A)^\perp \subseteq \ker(A^*)\text{,}\) giving us the first equality.
Since the first equality is valid for any matrix \(A,\) we replace \(A\) by \(A^*\text{,}\) and use that \(A^{**} = A\) to conclude that
\begin{equation*} \ker(A) = \range(A^*)^\perp\text{.} \end{equation*}
Using Corollary 3.2.12 yields
\begin{equation*} \ker(A)^\perp = \range(A^*). \end{equation*}
For real matrices, these become

Proof.

The first statement is immediate from the previous theorem since \(\range(A) = C(A).\) For the second, we had deduced above that \(\ker(A) = \range(A^*)^\perp\text{.}\) Now if \(A\) is a real matrix,
\begin{equation*} \range(A^*) = \range(A^T) = C(A^T) = R(A) \end{equation*}
which finishes the proof.