Jordan normal form – Generalised Eigenspaces

by Arjun Jain


It’s been a long time since I last posted. I’ve been studying about the Jordan Normal Form recently, and found some good but incomplete websites elaborating this interesting construction. The first time I read about this, I was very impressed- these are representative matrices for a whole families of similar matrices. Moreover, so many results about matrices requiring them to be diagonalizable can be generalised using the normal forms- the next best things to a diagonal representation.

As I searched for clear proofs surrounding this, I found about Sheldon Axler‘s book Linear Algebra Done Right, and the article The Jordan Canonical Form: an Old Proof by Richard A. Brualdi. The first one does it by introducing generalised eigenspaces, while the second one uses the language of graph theory. Investigating the relation between these two approaches will be an excellent thing to do, I think.

In this first part, I’ll outline the usual generalised eigenspaces approach, as described in Sheldon Axler’s book.


We want to describe an operator by decomposing it’s domain into invariant subspaces (If W is an invariant subspace of V, where T:V\to V, then if v\in W, T(v)\in W also). If the operator is diagonalizable, these invariant subspaces are the eigenspaces, and V= null(T-\lambda_1 I)\oplus null(T-\lambda_2 I)\oplus.....

As normally there aren’t enough eigenvectors, we want to show that in general, V= null(T-\lambda_1 I)^{dimV}\oplus null(T-\lambda_2 I)^{dimV}\oplus...., where \lambda_1, \lambda_2... are the distinct eigenvalues of T.

Let’s start by studying the nullspaces of powers of an operator.
First of all \{0\}\subset null(T)\subset null(T^2)\subset null(T^3)....

Of course, as V is finite dimensional, this can’t go on forever. So, we must have null(T^{dimV})=null(T^{dimV+1})=null(T^{dimV+2})..., because if we don’t, the dimensions of the nullspaces, which are subspaces of V, will go on increasing.

A similar thing is true for ranges of powers of operators. Here, V=ran(T^0)\supset ran(T)\supset ran(T^2)\supset .... Again this stops at T^{dimV} after which all subsequent ranges are equal.

Now, Schur’s lemma in linear algebra says that every square complex matrix is unitarily triangularizable (T=Q\Lambda Q^T). The diagonal elements of \Lambda are the eigenvalues of T. We can prove this by seeing that if \lambda is an eigenvalue of T, then det(T-\lambda I)=det(Q\Lambda Q^T -\lambda I)=det (Q (\Lambda-\lambda I) Q^T)=0 which shows that \lambda is one of the diagonal elements of \Lambda. The same reasoning in reverse can be used to prove the converse.

If T has dimV distinct eigenvalues, all of these are the diagonal elements of \Lambda. If not an eigenvalue \lambda is repeated dim(null(T-\lambda I)^{dimV})) times.

To prove this, consider the case of \lambda =0. We prove by induction.

For dimV=n,

if n=1, the fact is clearly true.

We assume that it is true for dimV=n-1.

Now, suppose that (v_1,v_2,v_3,...) is a basis for which T has an upper triangular representation \Lambda with \lambda_1, \lambda_2...\lambda_n as diagonal elements. If U=span(v_1,v_2,...,v_{n-1}), then U is clearly invariant due to \Lambda being triangular.

The matrix for T|U with the basis v_1,v_2,...v_{n-1} is \Lambda without the last row and column. By the induction hypothesis, 0 appears dim(null(T|U)^{n-1})) times. As null(T|U)^{n-1}=null(T|U)^{n} , 0 appears dim(null(T|U)^{n})) times.

Now for the remaining matrix, which has \lambda_n as the diagonal element, we consider two cases:

1. \lambda_n\neq 0: If \Lambda has \lambda_1, \lambda_2...\lambda_n as the diagonal elements, then the matrix representation of T^n is the upper triangular \Lambda^n with \lambda_1^n, \lambda_2^n...\lambda_n^n as diagonal elements. So T^n(v_n)=u+\lambda_n^n(v_n) for some u\in U. Now suppose that p\in null(T^n). Then p=u'+av_n where u'\in U and a\in F (field of V), which gives T^n(p)=0=T^n(u')+a(u)+a(\lambda_n^n(v_n)). The first two terms are in U, but the third one is not. So, a=0, meaning that p\in U. As a result null(T^n)\subset U, giving null (T^n)=null(T|U)^n. 0 appears dim(null(T)^{n})) times.

2. \lambda_n=0: If \lambda_n=0, then T(v_n)\in U giving T^n(v_n)=T^{n-1}(T(v_n))\in ran(T|U)^{n-1}=ran(T|U)^{n}. So, we can construct a vector p=u'+v_n such that p\not\in U and T^n(p)=0. Now,  dim(null(T^n))=dim(null(T|U)^n)+dim(U+null(T^n))-dimU. Here, as dim(U)<dim(U+null(T^n))\leq dim(V), and as dim(U)=n-1, therefore, dim(null(T^n))=dim(null(T|U)^n)+1, which is as desired.

Statement proved.

Note that the multiplicity corresponding to an eigenvalue \lambda is defined as dim(null(T-\lambda I)^{dimV}), i.e. the dimension of the associated generalised eigenspace. The sum of these multiplicities is equal to dimV as all the diagonal elements of \Lambda are eigenvalues of T. The characteristic polynomial associated with a matrix is (z-\lambda_1)^{d_1}.(z-\lambda_2)^{d_2}..., where d_1,d_2.. are the multiplicities.

Using Schur’s theorem as above, we can also prove the Cayley- Hamilton theorem very easily, through induction. The theorem states that if q is the characteristic polynomial of T, then q(T)=0. We need only show that q(T)v_j=0 for all basis vectors of T, v_j, where the v_js are basis vectors for which the matrix of T is upper triangular, as in Schur’s theorem.

Suppose that j=1. Using the triangular form of T, we have (T-\lambda_1)v_1=0.

Now, assume that for j between 1 to n :

0=(T-\lambda_1 I)v_1=(T-\lambda_1 I)(T-\lambda_2 I)v_2=(T-\lambda_1 I)(T-\lambda_2 I)(T-\lambda_3 I)v_3=...=(T-\lambda_1 I)(T-\lambda_2 I)....(T-\lambda_{j-1} I)v_{j-1}.

Now, because of the triangular form of T, (T-\lambda_j I)v_j\in span(v_1,v_2,...v_{j-1}). So (T-\lambda_1 I)(T-\lambda_2 I)....(T-\lambda_{j-1} I)(T-\lambda_{j} I)v_j=0.

Now come the main theorems leading up to Jordan’s form.

If T is an operator on a complex vector space V with distinct eigenvalues \lambda_1, \lambda_2, ... \lambda_m, with corresponding subspaces of generalised eigenvectors U_1, U_2, ...U_m, then V=U_1\oplus U_2\oplus ...\oplus U_m. For the proof, we have already seen that dimV=dimU_1+dimU_2+...dimU_m. Now we can see that each U_j is invariant under T, as if (T-\lambda_jI)^{d_j}x=0, so is (T(T-\lambda_jI)^{d_j})x=0=(T-\lambda_jI)^{d_j}Tx. As a result, if we consider T|U, where U=U_1+U_2+....U_m, then T|U has the same eigenvalues and multiplicities as T. We therefore get the desire result.

Note that the generalised eigenspaces are disjoint, as they should be. Generalised eigenspaces of T are invariant under T. Consider the eigenvalues \lambda_1 and \lambda_2 as examples. So if \nu\in null(T-\lambda_1 I)^{d_1}, then so do T(\nu) and c\nu, where c is any scalar. As a result, even (T-\lambda_2 I)^{d_2}\nu\in null(T-\lambda_1 I)^{d_1}. Now assume that \nu \in null(T-\lambda_2 I)^{d_2}. Then (T-\lambda_2 I)^{d_2}\nu=(T-\lambda_2I)(T-\lambda_2I)^{d_2-1}\nu=0. Therefore T(T-\lambda_2 I)^{d_2-1}\nu=\lambda_2(T-\lambda_2 I)^{d_2-1}\nu. So now (T-\lambda_1 I)^{d_1}(T-\lambda_2 I)^{d_2-1}\nu=(\lambda_2-\lambda_1)^{d_1}(T-\lambda_2 I)^{d_2-1}\nu. So if (T-\lambda_2 I)^{d_2-1}\nu\neq 0, then \lambda_1=\lambda_2. If \nu \in null((T-\lambda_2 I)^{d_2})\cap null((T-\lambda_1 I)^{d_1}) , then \nu \in null((T-\lambda_1 I)^{d_1-1}). Applying the same argument as above, we get that if \lambda_1\neq\lambda_2, then \nu \in null((T-\lambda_1 I)^{d_1-2}) an so on till we reach \nu \in null(I), giving \nu=0.

Now, if N\in L(V) is nilpotent, there exist vectors (\nu_1,\nu_2...\nu_k\in V), such that (\nu_1,N\nu_1,...,N^{m(\nu_1)}\nu_1,.....\nu_k,N\nu_k,...,N^{m(\nu_k)}\nu_k) is a basis of V, and (N^{m(\nu_1)}\nu_1,N^{m(\nu_2)}\nu_2,....,N^{m(\nu_k)}\nu_k) is a basis of null(N), where m(\nu_j) is the largest non negative integer such that N^{m(\nu_j)}\nu_j\neq 0.

For the proof, we use induction. As N is nilpotent, dim(ran(N))<dim(V). Assume that the claim holds for all vector spaces of lesser dimensions. So, (u_1,Nu_1,...,N^{m(u_1)}u_1,.....u_j,Nu_j,...,N^{m(u_j)}u_j) is a basis of ran(N) and (N^{m(u_1)}u_1,N^{m(u_2)}\nu_2,....,N^{m(u_j)}u_j) is a basis of null(N)\cap ran(N).

As each u_r \in ran(N), we can choose a corresponding \nu_r\in V, such that N\nu_r=u_r for each r. Therefore, m(\nu_r)=m(u_r)+1. Now we choose a subspace W of null(N) such that null(N)=(null(N)\cap ran(N))\oplus W, and then a basis (\nu_{j+1},\nu_{j+2}...\nu_k). As these are in null(N), m(\nu_{j+1},m(\nu_{j+2})...=0.

To show that the basis for V in the statement is linearly independent, suppose that \displaystyle{ \sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^s(\nu_r)=0}. Then \displaystyle{ \sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^{s+1}(\nu_r)=0=\sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^{s}(u_r)}.

Now by the induction hypothesis, a_{r,s}=0 for 1\leq r\leq j. Also, a_{1,m}(\nu_1)N^{m(\nu_1)}\nu_1+ ... + a_{j,m}(\nu_j)N^{m(\nu_j)}\nu_j=0 as (N^{m(u_1)}u_1,N^{m(u_2)}\nu_2,....,N^{m(u_j)}u_j) is a basis of null(N)\cap ran(N) and a_{j+1,o}\nu_{j+1}+....+a_{k,0}\nu_k=0 as (\nu_{j+1},\nu_{j+2}...\nu_k) is a basis of W.

Now, as assumed, dim(null(n)\cap ran(N))=j and dim(null(N))=k.  It can be seen that \displaystyle{ \sum_{r=0}^{k}(m(\nu_r)+1)= dim(V)}. Therefore the set of vectors under consideration is indeed a basis of V. Also, (N^{m(\nu_1)}\nu_1,N^{m(\nu_2)}\nu_2,....,N^{m(\nu_k)}\nu_k )

= (N^{m(u_1)}u_1,N^{m(u_2)}u_2,....,N^{m(u_k)}u_k, \nu_{j+1},\nu_{j+2},...,\nu_k) is a basis of null(N).

To get to the Jordan Canonical form, consider a nilpotent operator N\in L(V). Then for the vectors (N^{m(\nu_j)}\nu_j,N^{m(\nu_j)-1}\nu_j,....,N\nu_j). With these, N(first vector)=0, N(second vector)=(first vector). The resultant block has 0s on the diagonal and 1s on the super diagonal.

For a T \in L(V), with distinct eigenvalues \lambda_1, \lambda_2,...\lambda_m, as V=U_1\oplus U_2 \oplus ...\oplus U_m where each (T-\lambda_iI)|U_i is nilpotent, we have our Jordan basis, giving T the form as in the opening image. The exact structure of the Jordan form depends not only on the arithmetic and geometric multiplicities of the eigenvalues, but also the dimensions of the powers of (T-\lambda_iI), with dim(null(T-\lambda_iI)^{k+1}-dim(null(T-\lambda_iI)^{k} being the number of Jordan blocks of size >k corresponding to the eigenvalue \lambda.

Note that two matrices are conjugate if and only if they have the same Jordan canonical forms, up to a permutation of the Jordan Blocks.