### Jordan normal form – Generalised Eigenspaces

It’s been a long time since I last posted. I’ve been studying about the Jordan Normal Form recently, and found some good but incomplete websites elaborating this interesting construction. The first time I read about this, I was very impressed- these are representative matrices for a whole families of similar matrices. Moreover, so many results about matrices requiring them to be diagonalizable can be generalised using the normal forms- the next best things to a diagonal representation.

As I searched for clear proofs surrounding this, I found about Sheldon Axler‘s book Linear Algebra Done Right, and the article The Jordan Canonical Form: an Old Proof by Richard A. Brualdi. The first one does it by introducing generalised eigenspaces, while the second one uses the language of graph theory. Investigating the relation between these two approaches will be an excellent thing to do, I think.

In this first part, I’ll outline the usual generalised eigenspaces approach, as described in Sheldon Axler’s book.

->

We want to describe an operator by decomposing it’s domain into invariant subspaces (If W is an invariant subspace of V, where $T:V\to V$, then if $v\in W$, $T(v)\in W$ also). If the operator is diagonalizable, these invariant subspaces are the eigenspaces, and $V= null(T-\lambda_1 I)\oplus null(T-\lambda_2 I)\oplus....$.

As normally there aren’t enough eigenvectors, we want to show that in general, $V= null(T-\lambda_1 I)^{dimV}\oplus null(T-\lambda_2 I)^{dimV}\oplus....$, where $\lambda_1, \lambda_2...$ are the distinct eigenvalues of T.

Let’s start by studying the nullspaces of powers of an operator.
First of all $\{0\}\subset null(T)\subset null(T^2)\subset null(T^3)...$.

Of course, as V is finite dimensional, this can’t go on forever. So, we must have $null(T^{dimV})=null(T^{dimV+1})=null(T^{dimV+2})...$, because if we don’t, the dimensions of the nullspaces, which are subspaces of V, will go on increasing.

A similar thing is true for ranges of powers of operators. Here, $V=ran(T^0)\supset ran(T)\supset ran(T^2)\supset ...$. Again this stops at $T^{dimV}$ after which all subsequent ranges are equal.

Now, Schur’s lemma in linear algebra says that every square complex matrix is unitarily triangularizable ($T=Q\Lambda Q^T$). The diagonal elements of $\Lambda$ are the eigenvalues of T. We can prove this by seeing that if $\lambda$ is an eigenvalue of T, then $det(T-\lambda I)=det(Q\Lambda Q^T -\lambda I)=det (Q (\Lambda-\lambda I) Q^T)=0$ which shows that $\lambda$ is one of the diagonal elements of $\Lambda$. The same reasoning in reverse can be used to prove the converse.

If T has $dimV$ distinct eigenvalues, all of these are the diagonal elements of $\Lambda$. If not an eigenvalue $\lambda$ is repeated $dim(null(T-\lambda I)^{dimV}))$ times.

To prove this, consider the case of $\lambda =0$. We prove by induction.

For $dimV=n$,

if $n=1$, the fact is clearly true.

We assume that it is true for $dimV=n-1$.

Now, suppose that $(v_1,v_2,v_3,...)$ is a basis for which T has an upper triangular representation $\Lambda$ with $\lambda_1, \lambda_2...\lambda_n$ as diagonal elements. If $U=span(v_1,v_2,...,v_{n-1})$, then $U$ is clearly invariant due to $\Lambda$ being triangular.

The matrix for $T|U$ with the basis $v_1,v_2,...v_{n-1}$ is $\Lambda$ without the last row and column. By the induction hypothesis, 0 appears $dim(null(T|U)^{n-1}))$ times. As $null(T|U)^{n-1}=null(T|U)^{n}$ , 0 appears $dim(null(T|U)^{n}))$ times.

Now for the remaining matrix, which has $\lambda_n$ as the diagonal element, we consider two cases:

1. $\lambda_n\neq 0$: If $\Lambda$ has $\lambda_1, \lambda_2...\lambda_n$ as the diagonal elements, then the matrix representation of $T^n$ is the upper triangular $\Lambda^n$ with $\lambda_1^n, \lambda_2^n...\lambda_n^n$ as diagonal elements. So $T^n(v_n)=u+\lambda_n^n(v_n)$ for some $u\in U$. Now suppose that $p\in null(T^n)$. Then $p=u'+av_n$ where $u'\in U$ and $a\in F$ (field of V), which gives $T^n(p)=0=T^n(u')+a(u)+a(\lambda_n^n(v_n))$. The first two terms are in $U$, but the third one is not. So, $a=0$, meaning that $p\in U$. As a result $null(T^n)\subset U$, giving $null (T^n)=null(T|U)^n$. 0 appears $dim(null(T)^{n}))$ times.

2. $\lambda_n=0$: If $\lambda_n=0$, then $T(v_n)\in U$ giving $T^n(v_n)=T^{n-1}(T(v_n))\in ran(T|U)^{n-1}=ran(T|U)^{n}$. So, we can construct a vector $p=u'+v_n$ such that $p\not\in U$ and $T^n(p)=0$. Now,  $dim(null(T^n))=dim(null(T|U)^n)+dim(U+null(T^n))-dimU$. Here, as $dim(U), and as $dim(U)=n-1$, therefore, $dim(null(T^n))=dim(null(T|U)^n)+1$, which is as desired.

Statement proved.

Note that the multiplicity corresponding to an eigenvalue $\lambda$ is defined as $dim(null(T-\lambda I)^{dimV})$, i.e. the dimension of the associated generalised eigenspace. The sum of these multiplicities is equal to $dimV$ as all the diagonal elements of $\Lambda$ are eigenvalues of $T$. The characteristic polynomial associated with a matrix is $(z-\lambda_1)^{d_1}.(z-\lambda_2)^{d_2}...$, where $d_1,d_2..$ are the multiplicities.

Using Schur’s theorem as above, we can also prove the Cayley- Hamilton theorem very easily, through induction. The theorem states that if q is the characteristic polynomial of $T$, then $q(T)=0$. We need only show that $q(T)v_j=0$ for all basis vectors of $T$, $v_j$, where the $v_j$s are basis vectors for which the matrix of T is upper triangular, as in Schur’s theorem.

Suppose that $j=1$. Using the triangular form of $T$, we have $(T-\lambda_1)v_1=0$.

Now, assume that for $j$ between $1$ to $n$ :

$0=(T-\lambda_1 I)v_1=(T-\lambda_1 I)(T-\lambda_2 I)v_2=(T-\lambda_1 I)(T-\lambda_2 I)(T-\lambda_3 I)v_3=...=(T-\lambda_1 I)(T-\lambda_2 I)....(T-\lambda_{j-1} I)v_{j-1}$.

Now, because of the triangular form of $T$, $(T-\lambda_j I)v_j\in span(v_1,v_2,...v_{j-1})$. So $(T-\lambda_1 I)(T-\lambda_2 I)....(T-\lambda_{j-1} I)(T-\lambda_{j} I)v_j=0$.

Now come the main theorems leading up to Jordan’s form.

If $T$ is an operator on a complex vector space $V$ with distinct eigenvalues $\lambda_1, \lambda_2, ... \lambda_m$, with corresponding subspaces of generalised eigenvectors $U_1, U_2, ...U_m$, then $V=U_1\oplus U_2\oplus ...\oplus U_m$. For the proof, we have already seen that $dimV=dimU_1+dimU_2+...dimU_m$. Now we can see that each $U_j$ is invariant under T, as if $(T-\lambda_jI)^{d_j}x=0$, so is $(T(T-\lambda_jI)^{d_j})x=0=(T-\lambda_jI)^{d_j}Tx$. As a result, if we consider $T|U$, where $U=U_1+U_2+....U_m$, then $T|U$ has the same eigenvalues and multiplicities as T. We therefore get the desire result.

Note that the generalised eigenspaces are disjoint, as they should be. Generalised eigenspaces of T are invariant under T. Consider the eigenvalues $\lambda_1$ and $\lambda_2$ as examples. So if $\nu\in null(T-\lambda_1 I)^{d_1}$, then so do $T(\nu)$ and $c\nu$, where c is any scalar. As a result, even $(T-\lambda_2 I)^{d_2}\nu\in null(T-\lambda_1 I)^{d_1}$. Now assume that $\nu \in null(T-\lambda_2 I)^{d_2}$. Then $(T-\lambda_2 I)^{d_2}\nu=(T-\lambda_2I)(T-\lambda_2I)^{d_2-1}\nu=0$. Therefore $T(T-\lambda_2 I)^{d_2-1}\nu=\lambda_2(T-\lambda_2 I)^{d_2-1}\nu$. So now $(T-\lambda_1 I)^{d_1}(T-\lambda_2 I)^{d_2-1}\nu=(\lambda_2-\lambda_1)^{d_1}(T-\lambda_2 I)^{d_2-1}\nu$. So if $(T-\lambda_2 I)^{d_2-1}\nu\neq 0$, then $\lambda_1=\lambda_2$. If $\nu \in null((T-\lambda_2 I)^{d_2})\cap null((T-\lambda_1 I)^{d_1})$, then $\nu \in null((T-\lambda_1 I)^{d_1-1})$. Applying the same argument as above, we get that if $\lambda_1\neq\lambda_2$, then $\nu \in null((T-\lambda_1 I)^{d_1-2})$ an so on till we reach $\nu \in null(I)$, giving $\nu=0$.

Now, if $N\in L(V)$ is nilpotent, there exist vectors ($\nu_1,\nu_2...\nu_k\in V$), such that ($\nu_1,N\nu_1,...,N^{m(\nu_1)}\nu_1,.....\nu_k,N\nu_k,...,N^{m(\nu_k)}\nu_k$) is a basis of $V$, and ($N^{m(\nu_1)}\nu_1,N^{m(\nu_2)}\nu_2,....,N^{m(\nu_k)}\nu_k$) is a basis of $null(N)$, where $m(\nu_j)$ is the largest non negative integer such that $N^{m(\nu_j)}\nu_j\neq 0$.

For the proof, we use induction. As $N$ is nilpotent, $dim(ran(N)). Assume that the claim holds for all vector spaces of lesser dimensions. So, ($u_1,Nu_1,...,N^{m(u_1)}u_1,.....u_j,Nu_j,...,N^{m(u_j)}u_j$) is a basis of $ran(N)$ and ($N^{m(u_1)}u_1,N^{m(u_2)}\nu_2,....,N^{m(u_j)}u_j$) is a basis of $null(N)\cap ran(N)$.

As each $u_r \in ran(N)$, we can choose a corresponding $\nu_r\in V$, such that $N\nu_r=u_r$ for each r. Therefore, $m(\nu_r)=m(u_r)+1$. Now we choose a subspace $W$ of $null(N)$ such that $null(N)=(null(N)\cap ran(N))\oplus W$, and then a basis ($\nu_{j+1},\nu_{j+2}...\nu_k$). As these are in $null(N)$, $m(\nu_{j+1},m(\nu_{j+2})...=0$.

To show that the basis for V in the statement is linearly independent, suppose that $\displaystyle{ \sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^s(\nu_r)=0}$. Then $\displaystyle{ \sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^{s+1}(\nu_r)=0=\sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^{s}(u_r)}$.

Now by the induction hypothesis, $a_{r,s}=0$ for $1\leq r\leq j$. Also, $a_{1,m}(\nu_1)N^{m(\nu_1)}\nu_1+ ... + a_{j,m}(\nu_j)N^{m(\nu_j)}\nu_j=0$ as ($N^{m(u_1)}u_1,N^{m(u_2)}\nu_2,....,N^{m(u_j)}u_j$) is a basis of $null(N)\cap ran(N)$ and $a_{j+1,o}\nu_{j+1}+....+a_{k,0}\nu_k=0$ as ($\nu_{j+1},\nu_{j+2}...\nu_k$) is a basis of $W$.

Now, as assumed, $dim(null(n)\cap ran(N))=j$ and $dim(null(N))=k$.  It can be seen that $\displaystyle{ \sum_{r=0}^{k}(m(\nu_r)+1)= dim(V)}$. Therefore the set of vectors under consideration is indeed a basis of $V$. Also, ($N^{m(\nu_1)}\nu_1,N^{m(\nu_2)}\nu_2,....,N^{m(\nu_k)}\nu_k$)

= ($N^{m(u_1)}u_1,N^{m(u_2)}u_2,....,N^{m(u_k)}u_k, \nu_{j+1},\nu_{j+2},...,\nu_k$) is a basis of $null(N)$.

To get to the Jordan Canonical form, consider a nilpotent operator $N\in L(V)$. Then for the vectors ($N^{m(\nu_j)}\nu_j,N^{m(\nu_j)-1}\nu_j,....,N\nu_j$). With these, $N$(first vector)=0, $N$(second vector)=(first vector). The resultant block has $0$s on the diagonal and $1$s on the super diagonal.

For a $T \in L(V)$, with distinct eigenvalues $\lambda_1, \lambda_2,...\lambda_m$, as $V=U_1\oplus U_2 \oplus ...\oplus U_m$ where each $(T-\lambda_iI)|U_i$ is nilpotent, we have our Jordan basis, giving $T$ the form as in the opening image. The exact structure of the Jordan form depends not only on the arithmetic and geometric multiplicities of the eigenvalues, but also the dimensions of the powers of $(T-\lambda_iI)$, with $dim(null(T-\lambda_iI)^{k+1}-dim(null(T-\lambda_iI)^{k}$ being the number of Jordan blocks of size $>k$ corresponding to the eigenvalue $\lambda$.

Note that two matrices are conjugate if and only if they have the same Jordan canonical forms, up to a permutation of the Jordan Blocks.