I wonder why I wonder

Weblog on Natural Philosophy | REMOVED TO ART WEBSITE (click above)

The Large-Number Limit for Reaction Networks

I worked with Prof. John Baez on reaction networks at the Center for Quantum Technologies last summer. Here is a jointly written article just published on his blog which summarises what I was up to for the most part. It is concerned with how the master equation reduces to the rate equation in the limit where there are very large numbers of things of each kind. I also wrote an article on Coherence for Solutions of the Master Equation before. To know more about network theory, you can read John Baez’s book on the same.

The Lagrange Mesh Technique

As part of my final year project, I am studying about the Lagrange Mesh Technique.

The Lagrange-mesh technique is an approximation method, adapting Gaussian Quadrature to solve differential equations, particularly quantum-mechanical bound-state and scattering problems. There is no need for analytically evaluating matrix elements and the accuracy is exponential in the number of mesh points.

1. Gaussian Quadrature:

  • Gaussian Quadrature is the starting point of the Lagrange mesh technique. Here, integrals whose integrands are known only at a few points of the domain are estimated by approximating the integrands by the Lagrange interpolating polynomials, defined as: \displaystyle{\pi(x)=\prod_{j=1}^m(x-x_j)}. Note that \displaystyle{\pi'(x_j)=\prod_{i=1, i\neq j}^m(x_j-x_i)}. For a function f(x), the approximation used is \displaystyle{\Phi(x)=\sum_{j=1}^m \frac{\pi(x)}{(x-x_j)\pi' (x_j)} f(x_j)}. Obviously, \Phi(x_j)=f(x_j).
  • Consider the integral \displaystyle{\int_a^b dx~ W(x) f(x) \approx\int_a^b dx~ W(x) \Phi(x) }. Here W(x) is a weight function, positive in the domain of integration. Substituting the expression of \Phi(x), we get \displaystyle{\int_a^b dx~ \left( W(x) \sum_{j=1}^m \frac{\pi(x)}{(x-x_j)\pi' (x_j)} f(x_j)\right) = \sum_{j=1}^m w_j f(x_j)}, where \displaystyle{w_j=\int_a^b dx~ W(x)\frac{\pi(x)}{(x-x_j)\pi' (x_j)}}, also known as the Christoffel symbols.
  • The n-point quadrature is exact for polynomials upto degree n-1, but Gauss extended the method to give exact results for degree \leq 2n-1. This is done by introducing restrictions on the points where the function is to be known. For this, a family of orthogonal polynomials \{p_k\} is chosen, orthogonal with respect to the inner product: \displaystyle{\left< f,g\right> = \int_a^b dx~ f(x)g(x)W(x)}. For an n-point Gaussian quadrature then, the x_js are chosen as the zeroes of p_n(x).
  • As I prove in my project by induction, any polynomial p(x) orthogonal to all polynomials upto order n-1 in a given interval (a,b) with respect to a weight function W must have at least n roots in (a,b), as is required for Gaussian Quadrature to work.

2. Lagrange functions:


  • The Lagrange mesh technique uses the idea of Gaussian Quadrature to solve differential equations. For this, a set of N Lagrange functions f_i(x) are defined over (a,b) associated with N mesh points x_i over this interval. To the mesh is associated a Gaussian Quadrature.
  • The Lagrange functions have the following properties:

1. f_i(x_j)=\lambda_i ^{-1/2} \delta_{ij} (\lambda_i is a normalizing wieght defined below).

2. Gaussian Quadrature is exact for products f_i(x)f_j(x).

Property 2 implies that \displaystyle{\int_a^b dx~ f_i(x)f_j(x) = \sum_{k=1}^N \lambda_k f_i(x_k)f_j(x_k) =\delta_{ij}}.

  • If the corresponding Gaussian Quadrature involves the family of orthogonal polynomials \{ \Phi(x) \} with weight function W(x), then explicitly \displaystyle{f_i(x)=\frac{\lambda_i(x)^{-1/2} \Phi_N(x)}{\Phi_N'(x_i)(x-x_i)}}, where \displaystyle{\lambda_i(x)=\frac{w_i}{W(x)}}, where w_i is the Christoffel symbol corresponding to x_i.

3.  Schrödinger’s Equation, using the Lagrange Mesh Technique:


  • Consider Schrödinger’s eigenvalue equation in 1D: \displaystyle{\frac{-\hbar^2}{2m}(\frac{\partial^2\psi}{\partial x^2}) + V(x)\psi = E\psi } and approximate \psi by \displaystyle{\psi(x)=\sum_{j=1}^Nc_jf_j(x)}.
  • Using Gaussian quadrature to estimate the integrals got by multiplying the above by f_i(x) and integrating, we get \displaystyle{\frac{-\hbar^2}{2m}\sum_j (T_{ij}+V_{ij})c_j = E_i c_i }, where \displaystyle{V_{ij}\approx V(x_i)\delta_{ij}} is the potential energy matrix element and \displaystyle{T_{ij}\approx -\lambda_i^{1/2} f_j''(x_i)}, the kinetic energy matrix element.
  • Note that the potential matrix is diagonal, where each diagonal element need only be evaluated at a single mesh point and the kinetic matrix must be calculated only once for a given Lagrange basis, thus reducing computation time significantly.

Some examples of Lagrange meshes are meshes obtained using classical orthogonal polynomials like Legendre, Hermite, Laguerre. We can also use non-classical orthogonal polynomials to define other meshes like the Fourier mesh, sinc mesh, and the shifted Gaussians mesh. For the multidimensional case, we can also define a multidimensional Lagrange function as a product of 1-D Lagrange functions: F_{ijk}(x,y,z)=f_i(x)g_j(y)h_k(z).


1. Brandon Charles Bryant, Lagrange Meshes in Hadronic Physics, Honors in the Major Program, Theses, Paper 6, Florida State University, 2011

2. Daniel Baye, Lagrange-mesh method for quantum-mechanical problems, physica status solidi (b) 243.5 : 1095-1109, 2006

3. Daniel Baye and M. Vincke, Lagrange meshes from nonclassical orthogonal polynomials, Physical Review E, Volume 59, Number 6, June, 1999

Jordan normal form – Graph Theory

digraphThe last post was about the usual generalized eigenspaces approach to Jordan normal forms. Here I’ll describe another approach using the language of graph theory as in Turnbull and Aitken’s proof, in the article The Jordan Canonical Form: an Old Proof by Richard A. Brualdi.

Consider an nXn complex matrix A. The associated digraph (directed graph) D(B) has n vertices (labelled 1,2,...n) and has an arc (i,j) from vertex i to j if A(i,j)\neq 0.

Now by Schur’s Theorem, every nXn matrix A is similar to an upper triangular matrix, T. Note that for upper triangular matrices, the associated digraphs have arcs only from lower to higher vertices.

In a digraph D(i_1, i_2, i_3, ... i_{k-1}, i_k, i_1) where all is are distinct and where (i_1, i_2), (i_2, i_3), ...(i_{k-1}, i_k), (i_k, i_1) are all arcs of D, is called a cycle.

Note that since the arcs of the digraph of a triangular matrix only follow one direction, these are acyclic graphs. The converse is not true. However, a matrix with an acyclic digraph can be made triangular by a permutation similarity. To see this, note that if we have an acyclic graph we only need to reorder the vertices so that all arcs point towards higher numbered or lower numbered vertices. So for example, if you want to exchange the vertices 1 and 3, the permutation matrix P(1,3) operated on the right will redirect all arcs going to 3, to 1 and vice verse. Operated on the left, it will redirect all arcs coming from 3 to now come from 1, and vice verse.

Also, a path in D is a sequence (i_1, i_2, i_3, ..., i_k) of distinct vertices such that (i_1, i_2), (i_2, i_3), ...(i_{k-1}, i_k) are all arcs of D. A digraph with such an arc is called a path digraph. Again, any matrix with a path digraph can be brought to the form where the path goes from the smallest vertex to the largest vertex or vice verse, by a permutation similarity.

An example of a matrix with a path digraph is \begin{bmatrix} 3 & 2 & 0 & 0 \\ 0 & 1 & 7 & 0 \\ 0 & 0 & 0 & 3 \\ 0& 0 & 0 & 4 \end{bmatrix}. This looks like a Jordan block!

To make all the non zero super diagonal entries 1, we can use a diagonal similarity.  So if a diagonal matrix is represented by diag(d_1, d_2, ..., d_n), then for the above example the matrix that works is diag(1,1/2,1,1)diag(1,1,1/14,1)diag(1,1,1,1/42).

The digraph of the Jordan form of a given A, say A_J has a more complicated form.  It is a collection of paths which considered pairwise, have no common vertices. Each path here is associated with a single eigenvalue, although an eigenvalue can be associated with multiple paths. As an example \begin{bmatrix} 3 & 1 & 0 & 0 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0& 0 & 0 & 0 \end{bmatrix} has the following digraph:


Now to the main point. Jordan’s theorem can be rephrased in the language of graph theory as: Every nXn matrix A is similar to a matrix A_J, whose digraph is a collection of pairwise vertex disjoint paths where each path is associated with exactly one eigenvalue.

For the proof, we need the permutation, diagonal and combination similarities. The combination similarity C_{(i,j)} applied to a matrix B, adds h times column i of B to j, and -h times row j of BC_{(i,j)} to row i. The actual form of C_{(i,j)} is the identity matrix with h in the (i,j) position, where i \neq j. The inverse has a -h in place of the h.

To see how a combination symmetry affects an upper triangular matrix B:

\begin{array}{lcl} BC_{(i,j)}(a,b) & = & B(a,b), b\neq j \\ " & = & B(a,j)+hB(a,i), b=j \end{array}

\begin{array}{lcl} C_{(i,j)}^{-1}BC_{(i,j)}(a,b) & = & B(a,b), a\neq i, b\neq j \\ " & = & B(a,j)+hB(a,i), a\neq i, b=j \\ " & = & B(i,b)-hB(j,b), a=i, b\neq j \\ " & = & B(i,j)+h(B(i,i)-B(j,j)) - h^2 B(j,i), a=i, b=j \end{array}

If i<j, then of course B(j,i) =0.

Theorem: An nXn matrix A is similar to a matrix which is a direct sum of Jordan blocks, where a Jordan Block J_k is a kXk matrix with a single eigenvalue as all its diagonal terms and ones on the super diagonal.


1. By Schur’s theorem, there exists an invertible P such that T=P^{-1}AP, where T is upper triangular. Here equal diagonal entries can be made to occur consecutively.

2. Consider the block form of T, i.e. \begin{bmatrix} T_1 & X \\ 0 & T_2 \end{bmatrix}. Here T_1 is an upper triangular matrix with equal diagonal entries, say a, and T_2 is an upper triangular matrix, none of whose diagonal entries are equal to a. Using combination similarities, X can be made 0 without altering T_1 or T_2.

We can repeat the process in 2. again and again, till T is a direct sum of constant diagonal upper triangular matrices. …

Let’s first consider an nXn upper triangular matrix T, with all diagonal entries equal to some a. We now use an induction argument:

a. If n=1, T=\begin{bmatrix} a \end{bmatrix}, which is already a Jordan block. If n=2, T=\begin{bmatrix} a & p \\ 0 & a \end{bmatrix}. The p here can be made 1 by a diagonal symmetry, giving a Jordan block.

b. Suppose that (n-1)X(n-1) upper triangular matrices as above can be transformed into Jordan forms. Then so can the leading (n-1)X(n-1) upper triangular block of an nXn upper triangular matrix. So,

T_1 = \begin{bmatrix} F_{(n-1)X(n-1)} & *_{(n-1)X1} \\ 0_{1X(n-1)} & a \end{bmatrix}, where subscripts indicate the size of the blocks, and * is for a possible non zero block.

Following b., we have

\begin{bmatrix} F_{(n-1)X(n-1)} & *_{(n-1)X1} \\ 0_{1X(n-1)} & a \end{bmatrix} = \begin{bmatrix} (Q^{-1}SQ)_{(n-1)X(n-1)} & *_{(n-1)X1} \\ 0_{1X(n-1)} & a \end{bmatrix}

= \begin{bmatrix} Q^{-1}_{(n-1)X(n-1)} & 0_{(n-1)X1} \\ 0_{1X(n-1)} & 1 \end{bmatrix} \begin{bmatrix} S_{(n-1)X(n-1)} & **_{(n-1)X1} \\ 0_{1X(n-1)} & a \end{bmatrix} \begin{bmatrix} Q_{(n-1)X(n-1)} & 0_{(n-1)X1} \\ 0_{1X(n-1)} & 1 \end{bmatrix}

Now, to complete the argument, we just need to show that the middle matrix in the above expression can be reduced to it’s Jordan form. As an example, consider the following matrix:

\begin{bmatrix} a & 1 & 0 &0 & 0 & * \\ 0 & a & 0 & 0 & 0 & * \\ 0 & 0 & a & 1 & 0 & * \\ 0 & 0 & 0 & a & 1 & * \\ 0 & 0 & 0 & 0 & a & * \\ 0 & 0 & 0 & 0 & 0 & a \end{bmatrix}

As the combination symmetry C_{(i,j)} only affects the parts of row i and column j to the right and to the top of the element (i,j) respectively, we can make most of the elements of the last column in the above matrix 0. For instance, if element (3,6) is h, then C_{(4,6)}(-h) makes this 0 due to presence of a 1 in the 4th column. This can be done for all elements in the last column except (2,6) and (5,6) which do not have 1s available in their respective rows.

We’re almost there. If both (2,6) and (5,6) are 0, we already have the Jordan form. If both of them are non zero,  using a diagonal symmetry we can make the second last element of the last column, (5,6), here equal 1. Now if (2,6) is s, then if we do a C_{(2,5)}(s) followed by a C_{(1,4)}(s), element (2,6) becomes 0. We follow a similar procedure for any other pairs of non zero entries left in the general case, always eliminating the non zero entry across the smaller Jordan block. This will leave us with one non zero non diagonal entry in the last column. With a permutation similarity, this can be brought to be the second last element of the last column giving us a Jordan Form.

Thus, we have proved the existence of a Jordan form for every matrix.

\begin{bmatrix} a & 1 & 0 &0 & 0 & * \\ 0 & a & 0 & 0 & 0 & * \\ 0 & 0 & a & 1 & 0 & * \\ 0 & 0 & 0 & a & 1 & * \\ 0 & 0 & 0 & 0 & a & * \\ 0 & 0 & 0 & 0 & 0 & a \end{bmatrix}

As the combination symmetry C_{(i,j)} only affects the parts of row i and column j to the right and to the top of the element (i,j) respectively, we can make most of the elements of the last column in the above matrix 0. For instance, if element (3,6) is h, then C_{(4,6)}(-h) makes this 0 due to presence of a 1 in the 4th column. This can be done for all elements in the last column except (2,6) and (5,6) which do not have 1s available in their respective rows.

We’re almost there. If both (2,6) and (5,6) are 0, we already have the Jordan form. If both of them are non zero,  using a diagonal symmetry we can make the second last element of the last column, (5,6), here equal 1. Now if (2,6) is s, then if we do a C_{(2,5)}(s) followed by a C_{(1,4)}(s), element (2,6) becomes 0. We follow a similar procedure for any other pairs of non zero entries left in the general case, always eliminating the non zero entry across the smaller Jordan block. This will leave us with one non zero non diagonal entry in the last column. With a permutation similarity, this can be brought to be the second last element of the last column giving us a Jordan Form.

Thus, we have proved the existence of a Jordan form for every matrix.

Jordan normal form – Generalised Eigenspaces


It’s been a long time since I last posted. I’ve been studying about the Jordan Normal Form recently, and found some good but incomplete websites elaborating this interesting construction. The first time I read about this, I was very impressed- these are representative matrices for a whole families of similar matrices. Moreover, so many results about matrices requiring them to be diagonalizable can be generalised using the normal forms- the next best things to a diagonal representation.

As I searched for clear proofs surrounding this, I found about Sheldon Axler‘s book Linear Algebra Done Right, and the article The Jordan Canonical Form: an Old Proof by Richard A. Brualdi. The first one does it by introducing generalised eigenspaces, while the second one uses the language of graph theory. Investigating the relation between these two approaches will be an excellent thing to do, I think.

In this first part, I’ll outline the usual generalised eigenspaces approach, as described in Sheldon Axler’s book.


We want to describe an operator by decomposing it’s domain into invariant subspaces (If W is an invariant subspace of V, where T:V\to V, then if v\in W, T(v)\in W also). If the operator is diagonalizable, these invariant subspaces are the eigenspaces, and V= null(T-\lambda_1 I)\oplus null(T-\lambda_2 I)\oplus.....

As normally there aren’t enough eigenvectors, we want to show that in general, V= null(T-\lambda_1 I)^{dimV}\oplus null(T-\lambda_2 I)^{dimV}\oplus...., where \lambda_1, \lambda_2... are the distinct eigenvalues of T.

Let’s start by studying the nullspaces of powers of an operator.
First of all \{0\}\subset null(T)\subset null(T^2)\subset null(T^3)....

Of course, as V is finite dimensional, this can’t go on forever. So, we must have null(T^{dimV})=null(T^{dimV+1})=null(T^{dimV+2})..., because if we don’t, the dimensions of the nullspaces, which are subspaces of V, will go on increasing.

A similar thing is true for ranges of powers of operators. Here, V=ran(T^0)\supset ran(T)\supset ran(T^2)\supset .... Again this stops at T^{dimV} after which all subsequent ranges are equal.

Now, Schur’s lemma in linear algebra says that every square complex matrix is unitarily triangularizable (T=Q\Lambda Q^T). The diagonal elements of \Lambda are the eigenvalues of T. We can prove this by seeing that if \lambda is an eigenvalue of T, then det(T-\lambda I)=det(Q\Lambda Q^T -\lambda I)=det (Q (\Lambda-\lambda I) Q^T)=0 which shows that \lambda is one of the diagonal elements of \Lambda. The same reasoning in reverse can be used to prove the converse.

If T has dimV distinct eigenvalues, all of these are the diagonal elements of \Lambda. If not an eigenvalue \lambda is repeated dim(null(T-\lambda I)^{dimV})) times.

To prove this, consider the case of \lambda =0. We prove by induction.

For dimV=n,

if n=1, the fact is clearly true.

We assume that it is true for dimV=n-1.

Now, suppose that (v_1,v_2,v_3,...) is a basis for which T has an upper triangular representation \Lambda with \lambda_1, \lambda_2...\lambda_n as diagonal elements. If U=span(v_1,v_2,...,v_{n-1}), then U is clearly invariant due to \Lambda being triangular.

The matrix for T|U with the basis v_1,v_2,...v_{n-1} is \Lambda without the last row and column. By the induction hypothesis, 0 appears dim(null(T|U)^{n-1})) times. As null(T|U)^{n-1}=null(T|U)^{n} , 0 appears dim(null(T|U)^{n})) times.

Now for the remaining matrix, which has \lambda_n as the diagonal element, we consider two cases:

1. \lambda_n\neq 0: If \Lambda has \lambda_1, \lambda_2...\lambda_n as the diagonal elements, then the matrix representation of T^n is the upper triangular \Lambda^n with \lambda_1^n, \lambda_2^n...\lambda_n^n as diagonal elements. So T^n(v_n)=u+\lambda_n^n(v_n) for some u\in U. Now suppose that p\in null(T^n). Then p=u'+av_n where u'\in U and a\in F (field of V), which gives T^n(p)=0=T^n(u')+a(u)+a(\lambda_n^n(v_n)). The first two terms are in U, but the third one is not. So, a=0, meaning that p\in U. As a result null(T^n)\subset U, giving null (T^n)=null(T|U)^n. 0 appears dim(null(T)^{n})) times.

2. \lambda_n=0: If \lambda_n=0, then T(v_n)\in U giving T^n(v_n)=T^{n-1}(T(v_n))\in ran(T|U)^{n-1}=ran(T|U)^{n}. So, we can construct a vector p=u'+v_n such that p\not\in U and T^n(p)=0. Now,  dim(null(T^n))=dim(null(T|U)^n)+dim(U+null(T^n))-dimU. Here, as dim(U)<dim(U+null(T^n))\leq dim(V), and as dim(U)=n-1, therefore, dim(null(T^n))=dim(null(T|U)^n)+1, which is as desired.

Statement proved.

Note that the multiplicity corresponding to an eigenvalue \lambda is defined as dim(null(T-\lambda I)^{dimV}), i.e. the dimension of the associated generalised eigenspace. The sum of these multiplicities is equal to dimV as all the diagonal elements of \Lambda are eigenvalues of T. The characteristic polynomial associated with a matrix is (z-\lambda_1)^{d_1}.(z-\lambda_2)^{d_2}..., where d_1,d_2.. are the multiplicities.

Using Schur’s theorem as above, we can also prove the Cayley- Hamilton theorem very easily, through induction. The theorem states that if q is the characteristic polynomial of T, then q(T)=0. We need only show that q(T)v_j=0 for all basis vectors of T, v_j, where the v_js are basis vectors for which the matrix of T is upper triangular, as in Schur’s theorem.

Suppose that j=1. Using the triangular form of T, we have (T-\lambda_1)v_1=0.

Now, assume that for j between 1 to n :

0=(T-\lambda_1 I)v_1=(T-\lambda_1 I)(T-\lambda_2 I)v_2=(T-\lambda_1 I)(T-\lambda_2 I)(T-\lambda_3 I)v_3=...=(T-\lambda_1 I)(T-\lambda_2 I)....(T-\lambda_{j-1} I)v_{j-1}.

Now, because of the triangular form of T, (T-\lambda_j I)v_j\in span(v_1,v_2,...v_{j-1}). So (T-\lambda_1 I)(T-\lambda_2 I)....(T-\lambda_{j-1} I)(T-\lambda_{j} I)v_j=0.

Now come the main theorems leading up to Jordan’s form.

If T is an operator on a complex vector space V with distinct eigenvalues \lambda_1, \lambda_2, ... \lambda_m, with corresponding subspaces of generalised eigenvectors U_1, U_2, ...U_m, then V=U_1\oplus U_2\oplus ...\oplus U_m. For the proof, we have already seen that dimV=dimU_1+dimU_2+...dimU_m. Now we can see that each U_j is invariant under T, as if (T-\lambda_jI)^{d_j}x=0, so is (T(T-\lambda_jI)^{d_j})x=0=(T-\lambda_jI)^{d_j}Tx. As a result, if we consider T|U, where U=U_1+U_2+....U_m, then T|U has the same eigenvalues and multiplicities as T. We therefore get the desire result.

Note that the generalised eigenspaces are disjoint, as they should be. Generalised eigenspaces of T are invariant under T. Consider the eigenvalues \lambda_1 and \lambda_2 as examples. So if \nu\in null(T-\lambda_1 I)^{d_1}, then so do T(\nu) and c\nu, where c is any scalar. As a result, even (T-\lambda_2 I)^{d_2}\nu\in null(T-\lambda_1 I)^{d_1}. Now assume that \nu \in null(T-\lambda_2 I)^{d_2}. Then (T-\lambda_2 I)^{d_2}\nu=(T-\lambda_2I)(T-\lambda_2I)^{d_2-1}\nu=0. Therefore T(T-\lambda_2 I)^{d_2-1}\nu=\lambda_2(T-\lambda_2 I)^{d_2-1}\nu. So now (T-\lambda_1 I)^{d_1}(T-\lambda_2 I)^{d_2-1}\nu=(\lambda_2-\lambda_1)^{d_1}(T-\lambda_2 I)^{d_2-1}\nu. So if (T-\lambda_2 I)^{d_2-1}\nu\neq 0, then \lambda_1=\lambda_2. If \nu \in null((T-\lambda_2 I)^{d_2})\cap null((T-\lambda_1 I)^{d_1}) , then \nu \in null((T-\lambda_1 I)^{d_1-1}). Applying the same argument as above, we get that if \lambda_1\neq\lambda_2, then \nu \in null((T-\lambda_1 I)^{d_1-2}) an so on till we reach \nu \in null(I), giving \nu=0.

Now, if N\in L(V) is nilpotent, there exist vectors (\nu_1,\nu_2...\nu_k\in V), such that (\nu_1,N\nu_1,...,N^{m(\nu_1)}\nu_1,.....\nu_k,N\nu_k,...,N^{m(\nu_k)}\nu_k) is a basis of V, and (N^{m(\nu_1)}\nu_1,N^{m(\nu_2)}\nu_2,....,N^{m(\nu_k)}\nu_k) is a basis of null(N), where m(\nu_j) is the largest non negative integer such that N^{m(\nu_j)}\nu_j\neq 0.

For the proof, we use induction. As N is nilpotent, dim(ran(N))<dim(V). Assume that the claim holds for all vector spaces of lesser dimensions. So, (u_1,Nu_1,...,N^{m(u_1)}u_1,.....u_j,Nu_j,...,N^{m(u_j)}u_j) is a basis of ran(N) and (N^{m(u_1)}u_1,N^{m(u_2)}\nu_2,....,N^{m(u_j)}u_j) is a basis of null(N)\cap ran(N).

As each u_r \in ran(N), we can choose a corresponding \nu_r\in V, such that N\nu_r=u_r for each r. Therefore, m(\nu_r)=m(u_r)+1. Now we choose a subspace W of null(N) such that null(N)=(null(N)\cap ran(N))\oplus W, and then a basis (\nu_{j+1},\nu_{j+2}...\nu_k). As these are in null(N), m(\nu_{j+1},m(\nu_{j+2})...=0.

To show that the basis for V in the statement is linearly independent, suppose that \displaystyle{ \sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^s(\nu_r)=0}. Then \displaystyle{ \sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^{s+1}(\nu_r)=0=\sum_{r=1}^k \sum_{s=0}^{m(\nu_r)} a_{r,s}N^{s}(u_r)}.

Now by the induction hypothesis, a_{r,s}=0 for 1\leq r\leq j. Also, a_{1,m}(\nu_1)N^{m(\nu_1)}\nu_1+ ... + a_{j,m}(\nu_j)N^{m(\nu_j)}\nu_j=0 as (N^{m(u_1)}u_1,N^{m(u_2)}\nu_2,....,N^{m(u_j)}u_j) is a basis of null(N)\cap ran(N) and a_{j+1,o}\nu_{j+1}+....+a_{k,0}\nu_k=0 as (\nu_{j+1},\nu_{j+2}...\nu_k) is a basis of W.

Now, as assumed, dim(null(n)\cap ran(N))=j and dim(null(N))=k.  It can be seen that \displaystyle{ \sum_{r=0}^{k}(m(\nu_r)+1)= dim(V)}. Therefore the set of vectors under consideration is indeed a basis of V. Also, (N^{m(\nu_1)}\nu_1,N^{m(\nu_2)}\nu_2,....,N^{m(\nu_k)}\nu_k )

= (N^{m(u_1)}u_1,N^{m(u_2)}u_2,....,N^{m(u_k)}u_k, \nu_{j+1},\nu_{j+2},...,\nu_k) is a basis of null(N).

To get to the Jordan Canonical form, consider a nilpotent operator N\in L(V). Then for the vectors (N^{m(\nu_j)}\nu_j,N^{m(\nu_j)-1}\nu_j,....,N\nu_j). With these, N(first vector)=0, N(second vector)=(first vector). The resultant block has 0s on the diagonal and 1s on the super diagonal.

For a T \in L(V), with distinct eigenvalues \lambda_1, \lambda_2,...\lambda_m, as V=U_1\oplus U_2 \oplus ...\oplus U_m where each (T-\lambda_iI)|U_i is nilpotent, we have our Jordan basis, giving T the form as in the opening image. The exact structure of the Jordan form depends not only on the arithmetic and geometric multiplicities of the eigenvalues, but also the dimensions of the powers of (T-\lambda_iI), with dim(null(T-\lambda_iI)^{k+1}-dim(null(T-\lambda_iI)^{k} being the number of Jordan blocks of size >k corresponding to the eigenvalue \lambda.

Note that two matrices are conjugate if and only if they have the same Jordan canonical forms, up to a permutation of the Jordan Blocks.

The Lombard Effect


A few weeks ago, I had an ear infection due to which I could not hear properly through my left ear. Apart from the  general discomfort, I observed something totally unexpected- I started speaking more softly, that is at a lower volume. Although I couldn’t notice this change myself, I was told of it by my friends and family.

Remembering the well known fact that people start talking very loudly when hearing loud music through headphones, I knew for sure that there had to be some relation.

So, I contacted some professors of Auditory cognition in hopes of finding out more. Josh Mcdermott from MIT suggested I look into the Lombard Effect.

-From Wikipedia:


The Lombard effect or Lombard reflex is the involuntary tendency of speakers to increase their vocal effort when speaking in loud noise to enhance the audibility of their voice. This change includes not only loudness but also other acoustic features such as pitch, rate, and duration of sound syllables. This compensation effect results in an increase in the auditory signal-to-noise ratio of the speaker’s spoken words.

The effect links to the needs of effective communication as there is a reduced effect when words are repeated or lists are read where communication intelligibility is not important. Since the effect is also involuntary it is used as a means to detect malingering in those simulating hearing loss. Research upon birds and monkeys finds that the effect also occurs in the vocalizations of nonhuman animals.

The effect was discovered in 1909 by Étienne Lombard, a French otolaryngologist.


So my speaking softer than before, was kind of an inverse Lombard Effect.

I searched for otolaryngologists on the web and contacted Mario A. Svirsky from the NYU Langone Medical Center, who clarified and explained many of my doubts regarding this.

Regarding the observation with my infected ear, the obstruction in my middle ear attenuated the air conducted signal but not the bone conducted one that much. This made my own speech appear louder, leading  me to decrease its amplitude to compensate.  To others, this sounded like just my speech becoming softer.

If you are not aware of air and bone conduction: sound can reach the inner ear by way of two separate paths, and those paths in turn affect what we perceive. Air-conducted sound is transmitted from the surrounding environment through the external auditory canal, eardrum and middle ear to the cochlea, the fluid-filled spiral in the inner ear. Bone-conducted sound reaches the cochlea directly through the tissues of the head.

I then asked whether deaf people could hear their own voices, as bone conduction could still be there. This depends on the type of deafness. Sensorineural hearing loss affect both air and bone conducted signals while conductive hearing loss affects air conducted signals more.

Another question was that when singers cup their ears sometimes, they are eliminating the air conducted part of the sound. Why do they do that? When they cover their ears, it emphasizes the bone conducted signal, so the singer can hear himself better even if other instruments are very loud.

Also, the skull conducts lower frequencies better than air. This is why people perceive their own voices to be lower and fuller than others do, and a recording of one’s own voice frequently sounds higher than one expects it to sound.

-From an article in Scientific American, Why does my voice sound so different when it is recorded and played back?,


The voice you hear when you speak is the combination of sound carried along both paths. When you listen to a recording of yourself speaking, the bone-conducted pathway that you consider part of your “normal” voice is eliminated, and you hear only the air-conducted component in unfamiliar isolation. You can experience the reverse effect by putting in earplugs so you hear only bone-conducted vibrations.


Zorn’s Lemma – An elementary proof under the Axiom of Choice

Last year, around the same time, I had completed a summer project on Elementary Set Theory at the Indian Statistical Institute Delhi, under Prof. Ajit Iqbal Singh. The topic that took most of my time there, was the famous Zorn’s Lemma.

The statement of the Theorem is as follows:

If X is a partially ordered set such that every chain in X has an upper bound in X, then X contains a maximal element.

After my work there,  I wrote an article about this, which is posted on the arXiv.

It presents an elementary proof of Zorn’s Lemma under the Axiom of Choice, simplifying and supplying necessary details in the original proof by Paul R. Halmos in his book, Naive Set Theory. Also provided, is a preamble to Zorn’s Lemma, introducing the reader to a brief history of this important maximal principle.

Please feel free to ask me questions related to the article.

‘String’ Theory- Solving the Guitar

A few months ago, I gave a talk about some findings related to Guitars. Here are the details:

Basic Facts:

  • Vibrations of the strings in a guitar, are transmitted to the ear through disturbances in air pressure (sound waves). In an acoustic guitar, the string vibrations cause the hollow body to vibrate along, thus amplifying and adding pleasant overtones to the disturbances, which propagate as sound waves. In an electric guitar,  the string vibrations are picked up by the pickup coil inductors, which transmit the electric signals to an amplifier.
  • The group and phase velocities of sound waves are the same as they come through a non dispersive medium here. So, we don’t need to worry about different frequency waves coming  at different speeds.
  • Small variations in frequencies go unnoticed by the ear, atleast in the frequency range of a guitar ( around 82 Hz- 1319 Hz), with more resolution in the lower frequencies than the higher frequencies. This behavior holds for the whole spectrum, as obviously you can distinguish between 1 Hz and 1.5 Hz, but it is amazingly difficult to distinguish between 10000 Hz and 10020 Hz.
  • When more than one pure tone (i.e. sinusoidal sound wave) is played at the same time, the waves interfere to form a wave with a time period equal to the LCM of all the constituent waves.

Experiments and Observations:

I noticed that Fret widths change in a very non trivial way, as one goes down the neck of the guitar. So the basic question leading to these findings was:

Why and how do fret widths change the way they do ?

So, I

  • measured all the fret widths along the neck,
  • calculated the ratios of consecutive fret widths, and
  • measured the distance of each fret from the bottom of the Guitar.

Observations (in cm):


  • From the distance from fret bottom to string end in the table, we get that if we start with any fret i, the fret 12+i  is half the distance from the bottom, compared to fret i.
  • Motivated by this, and the Distance from bottom graph, we see that e^{0.058*12} = 2 and e^{0.058x} = 2^{x/12}.  Therefore, we get the brilliant result that each successive fret’s distance for the bottom is 2^{1/12} of the previous fret’s distance.
  • This conclusion also explains the constant ratio of successive fret widths as  2^{-1/12} = 0.94.

So how does a guitar work?

  • The Guitar is tuned by turning the knobs (tuning pegs) on the head. The sole purpose of this is to increase the tension in the corresponding string. Once tuned, the tension in the string is effectively constant.
  • The frets are there to change the effective length of the string, i.e. the part of the string that vibrates, i.e. the part of the string between the pressed down fret and the end of the string.
  • When we pluck a string, we are in effect, activating the fundamental mode corresponding to the effective string length. Though, other overtones are present, the amplitude of the fundamental mode is the greatest. In fact, if we pluck a string at half the effective string length, we hear a purer tune than when plucked above the pickup coil.
  • The 5th Fret of the 6th string produces sound at 110 Hz. Armed with the knowledge of the previous slide and how to tune different strings relative to each other, every position on the guitar can be mapped to a specific frequency.

The Logic behind tuning:

For resonance on a string with fixed ends, the condition is λ= 2L/n where L is the effective string length, n a natural number, and λ the wavelength.

Therefore, f=v*n/2L, where v is the velocity of waves on the string, and f the frequency.

Now,  v= (T/\mu)^{1/2}.

As, each string’s tension can be varied from the tuning pegs and the mass/length(\mu) is different for each, according to whether the string is coiled or not, we can set the frequency of any position on the guitar. Now the word tuning is commonly used by guitarists to mean tuning the different strings of a guitar relative to each other. There are many ways of doing this (the most common being the EADGBE tuning), but that is preference and convenience dependent. But the more physical tuning, is the tuning of frets relative to each other on a single string. This is the same for all strings and cannot be changed, which provides a clue to it being a more fundamental tuning. The most important observation is that the sounds transmitted from frets separated by 12 frets are perceived by us as similar. The note from the 12+i_{th} fret is said to be an octave (for reasons which will be apparent later) higher than the note from the i_{th} fret. The octave(also refers to the interval between a note and it’s octave) is divided into 12 notes, each note corresponding to one of the 12 frets between the i_{th} and the 12+i_{th} fret. The qualifier ‘Harmonic’ comes from music. As can be seen, the condition for resonance is λ= 2L*(1/n), where 1/n is the nth term of the Harmonic Series. In terms of frequencies, the Harmonic series is an arithmetic progression, as f=(v/2L)*n.

Now, as mentioned before, we always produce only fundamental modes in a guitar. This is why we change the effective length of the string instead of n, to vary frequency.

Therefore, from the previous slide, we get the astonishing result that the fundamental mode of a string and the fundamental mode of half the string sound similar. Therefore an octave is twice the frequency of the root note. Note that this also corresponds to the first overtone of the original unhalved string.

So, for any frequency f, 2^n times f sounds similar(n here is any integer). The use of this property is called transposition.

Therefore, the word pitch is better suited to music. Although the actual frequency of a sound may be doubled or  halved, the pitch is said to remain the same.

How much?


What are the denominations of currency required to pay someone each and every amount of money, so that you need a minimum number of notes, i.e. you have the lightest wallet possible? So, suppose you want to pay someone M Rs., the best case is of course, if you have an M Rs. note, but that does not make much sense if you might have to pay other amounts as well.

To be precise, we want to find the minimum number of denominations, that will allow us to pay every amount below a certain number, say W. Further, we assume that both the payer and the receiver have the same denominations.

For every denomination, you have 3 options:

  • 1. You either give some of those notes,
  • 2. or you receive some of those notes from the receiver,
  • 3. or you don’t do anything with those notes.

Now to make your wallet lighter, you would want to keep only one note of each denomination. So, we can replace ‘some’ by 1 in the above paragraph. More on this in the end.

Suppose the number of denominations required is N. Then let w=({w_1,\dotsc,w_n,\dotsc,w_N},) be the N tuple having the N required denominations as elements. Also, let {v_n} be the number of notes corresponding to {w_n}. From the previous paragraph, we see that each {v_n} can either be 1 (you give 1 note of denomination {w_n}), or -1 (you receive 1 note of denomination {w_n}) or 0 (you do not do a transaction of that denomination). So, let v=({v_1,\dotsc,v_n,\dotsc,v_N}) be the N tuple having the {v_n}s as elements.

Consequently, w.v=x, where {x\le W} and {w_n,x\in \mathbb{N}}, {v_n\in\{-1,0,1\}} for all n.

Recall that we have 3 options for each denomination- this suggests the use of Base 3 numbers or the Ternary number system. This is because in the ternary number system, we can represent any number with the use of only 0, 1, and 2 as digits compared to 0,1,2,3,4,5,6,7,8,9 in the decimal system. Eg.: 43=1121 in the ternary system as 43= ({1.3^3+1.3^2+2.3^1+1.3^0}).

So, given a number {y\le (3^N-1)}, we have a unique ternary representation of y in the form w.v' where w=(1,3,9,27,…,{3^N}) and v'=({v'_1,\dotsc,v'_n,\dotsc,v'_N}) with each {v'_n} being 0, 1 or 2.

To use this idea for our problem, all we have to do is to subtract (1,1,…n times) from v' to form {v=(v_1,\dotsc,v_n,\dotsc,v_N}) with each {v_n} being -1, 0 or 1.

Therefore, w.v'=y {\implies} w.v=y-w.(1,1,…n times)=x.

Now our problem is solved. We can pay every amount equal to or below W Rs., where {W=(3^N-1)/2}. Further, if we want to find the combination of notes required to pay a certain amount {x\le W}, we simply add {(3^n-1)/2} to x and find the ternary representation of that number. Then we subtract 1 from each digit in this ternary representation and get our required combination. Note that as ternary representations are unique for any number (not allowing negative powers of 3 to appear in the representation), the combination to pay x is also unique.

As an example, suppose that N is 5, so W=121. Therefore we require only one each of the 1, 3, 9, 27, 81 Rs. notes. Now if we have to pay 83 Rs., 83+121=204=2.81+1.27+1.9+2.3+0.1=21120 in the ternary system. So, v=(1,0,0,1,-1), i.e. you give the 81 and 3 Rs. notes, and take back a 1 Re. note. Simple.

Extending this, if the person you had to pay did not have any money, this would not work. Then for the case where you are not allowed to accept any money from the other person, the binary system would work perfectly as you have 2 options then: to give a note of a particular(nth) denomination ({v_n=1}) or not do anything({v_n=0}). Here W has to be {2^N-1} as we do not need to subtract (1,1,…n times) from {v'} as {v'} is equal to v.

So for W close to 121, say 127, N=7. Therefore, only one each of 1, 2, 4, 8, 16, 32, 64 Rs. notes is required. Then to give 83 Rs., as 83=1.64+0.32+1.16+0.8+0.4+1.2+1.1=1010011 (in binary system), you give the 64, 16, 2 and 1 Re. notes.

Now, you could be wondering whether keeping 2 or more notes of some other denominations, might be able to make your wallet lighter, but that is no true. So for example, in the base 5 number system, we have 0,1,2,3,4 as the basic digits which we would have to change to -2,-1,0,1,2 to make the case symmetric for both the payer and the receiver. In general if a is the base of the number system, n is the maximum number below which every amount can be paid, and N is the number of notes you would have to carry around, by a similar analysis as above, we get N=(a-1).ln((2n/(a+1))+1)/ln(a). Below is a graph with the x,y,z axes representing a, n and N respectively.


So, we can see that for a given y, z increases with x.

Does anyone know the logic behind choosing 1,2,5,10,20,50,100,1000 as the denominations that are actually used ?

Phyllotaxy- A Serendipitous Encounter


Simply put, Phyllotaxy is the arrangement of leaves on a plant stem. But this is not to be dismissed as a naive curiosity; Phyllotaxy is one of the most easily available and fascinating naturally occurring mathematical phenomena. It was a sunny afternoon on the {4^{th}} of March 2012, that I chanced upon something very beautiful. Observing the collection of plants at my home, I got curious about a certain tree. It had a symmetric looking canopy, but noticing the branches individually, I could not guess any pattern whatsoever. Luckily, I was in a mood to experiment, and decided to spend the day trying to find out more. At the beginning of my experiment, I had no idea what Phyllotaxy was and thought that the branch positioning would be random, but after analysing my results, I found something truly unexpected.

Below are the images of that tree:

1. Experiment

Though there were many interesting properties of the tree that could be measured, I restricted myself to measuring only the angle at which the branches of the tree emerged. If a cylindrical coordinate system is imagined to be set up with the longitudinal axis passing through the centre of the trunk, the azimuthal angle is what I measured.

I was lucky as the tree left marks wherever branches had emerged in the past, which made it easier to take more measurements. I drew a reference line along the trunk using a straight ruler as a guide. To measure the azimuthal angle of each branch, I measured the diameter of the trunk using a screw gauge and the distance of the mark from the reference line (along the circumference of the trunk, at the mark) using a measuring tape. Then, the required angle is:

\displaystyle \frac{distance\;from\;reference\;line}{circumference}\times 360^o

2. Observations

Branch No.Dia. (mm)Circ. (cm)Distance from reference line (cm)Angle (deg)

Even after tabulating the observations, I could not detect any pattern in the sequence of angles.

3. Plots

In hope of getting some insight, I decided to make a plot of the azimuthal angle vs. the branch no. . But the real turning point was when I joined consecutive points by line segments. Below are both the graphs.

4. Conclusion

From the figure above, it is evident that every fifth branch emerges at approximately the same angle. Moreover, the sequence of angles at which the intermediary branches emerge also repeats. Below is a computer generated image showing the emergence of branches around the main trunk (The upper vertical is {0^o}).

Actual photographs supporting the conclusion:

5 More about Phyllotaxy

After the experiment, I searched about my conclusions on the internet and found that this was called Phyllotaxy, and people as early as the Ancient Egyptians had investigated about it. Since then, numerous models based on packing efficiency, energy minimization, uneven distribution of Biochemical agents etc.., have been proposed by men like Leonardo Da Vinci, Johannes Kepler, Auguste Bravais, Louis Bravais, Hofmeister, Airy, Adler, Douady, and Couder.

5.1. Classification

Distichous Phyllotaxy: Botanical elements (leaves, branches etc..) grow one by one, each at 180 degrees from the previous one.

Whorled Phyllotaxy: Two or more Botanical elements grow at the same node on the stem. Elements in a node are evenly spread around the stem, midway between those in the previous node.

Spiral Phyllotaxy: Botanical elements grow one by one, each at a constant divergence angle d from the previous one. This is the most common pattern, and most often the divergence angle d is close to the Golden Angle, which is about 137.5 degrees. Eg.: Alternate distichous leaves will have an angle of 1/2 of a full rotation. In beech and hazel the angle is 1/3, in oak and apricot it is 2/5, in sunflowers, poplar, and pear, it is 3/8, and in willow and almond the angle is 5/13 of a full rotation. The numerator and denominator normally consist of a Fibonacci number and its second successor.

Periodic Orbits: The rarest type of Phyllotaxy, is when the divergence angle is not constant but repeats periodically a finite sequence of values. On inspecting the other plants at my home, I found all the above Phyllotaxies. I was extremely lucky that the tree I had chosen had an almost periodic orbit Phyllotaxy.

5. Afterthoughts

After reading through some relevant literature, I could not help but try fitting the observations with that of a spiral Phyllotaxy (being the most common Phyllotaxy). The average of the difference between consecutive angles, i.e. the average divergence angle comes out to be {141^o}, which is quite close to the golden angle ({137.5^o}). It can be seen that the graphs below match quite nicely.

As observed

As observed

Divergence angle= 141 degrees

Divergence angle= 141 degrees

Divergence angle= 137.5 degrees

Divergence angle= 137.5 degrees

Perhaps Spiral Phyllotaxys give way to Phyllotaxys with almost periodic orbits as the tree grows. Also, to really discuss about the Phyllotaxy one has to determine at the early stages when the branches originate from groups of cells called primordia. But still, this simple experiment was quite informative for the analysis of pattern generation with respect to branches.  Though many mathematical models which explain this phenomenon do exist, I wish I can find a general article on the recent advances in this direction and whether a clear reason for such patterns exists as of now. Also, what are the evolutionary advantages of Fibonacci Phyllotaxy?

What are Symmetric Differences made of?

Here, I extend the notion of symmetric differences of 2 sets to a finite family of sets and present some results regarding a unique representation of symmetric differences. The style of the article follows Paul Halmos’s advocacy of using “words more than numbers” in mathematical exposition (Paul R. Halmos, I want to be a Mathematician, Springer- Verlag, 1985). Also, provided are illustrations, which apart from reinforcing the main results graphically, also display the aesthetic beauty of symmetric differences and lead into a fascinating question related to rotationally symmetric venn diagrams. Although this is my own work, I have found that, similar theorems do exist in literature (Chi Woo, Symmetric difference on a finite number of sets, (version 5), PlanetMath.org).

1. Symmetric Differences

The symmetric difference of two sets A and B is specified as {A\Delta B=(A\setminus B)\cup (B\setminus A)}.

1.1. Symmetric differences are commutative, as can be seen by interchanging A and B in the definition.

1.2. If all sets considered have the same parent set, then {A\Delta B=(A\cap B')\cup (A'\cap B)}. Similarly, {(A\Delta B)\Delta C}, on simplification, is equal to {(A\cap B'\cap C')\cup (A'\cap B\cap C')\cup (A'\cap B'\cap C)\cup (A\cap B\cap C)}. From subsection 1.1, as {\Delta} is commutative, {(A\Delta B)\Delta C=C\Delta (A\Delta B)} . Also, as the simplified form of {(A\Delta B)\Delta C} is symmetric with respect to A, B and C, therefore {C\Delta (A\Delta B)=A\Delta (B\Delta C)} and so {\Delta} is associative. Note that now, {A\Delta (B\Delta C)} can be written as {A\Delta B\Delta C}, i.e., without brackets.

1.3. The power set of a set X, P(X), with {\Delta} as addition and {\cap} as multiplication is a ring with identity as:

a) From subsection 1.1, {\Delta} is commutative.

b) From subsection 1.2, {\Delta} is associative.

c) {\varnothing} is the additive identity, as for any subset of X, say A, {A\Delta\varnothing=A}.

d) Also, each set is its own additive inverse as {A\Delta A=\varnothing}.

e) {\cap} is associative and commutative.

f) The set X is the multiplicative identity, as {A\cap X=A} for any A in P(X).

g) {\cap} is distributive over {\Delta}.

The proof of g) is as follows: {A\cap (B\Delta C)} {=A\cap ((B\cap C')\cup (C\cap B'))} {=(A\cap (B\cap C'))\cup (A\cap (C\cap B'))} {=((A\cap B)\cap C')\cup ((A\cap C)\cap B')} {=((A\cap B)\cap (C'\cup A'))\cup ((A\cap C)\cap (B'\cup A'))} {=((A\cap B)\cap (C\cap A)')\cup ((A\cap C)\cap (B\cap A)')} {=(A\cap B)\Delta (A\cap C)}.

1.4. The notion of symmetric differences is not restricted to only 2 or 3 sets. In fact, subsections 1.1 and 1.2 permit us to speak of the symmetric difference of a finite family of sets. As an example, given the symmetric difference of 3 sets, as in subsection 1.2, we can form the symmetric difference of 4 sets, by taking the symmetric difference of a new set, say D, with the given symmetric difference of 3 sets. This would be written as {A\Delta B\Delta C\Delta D}. Therefore, given a family of sets whose terms are, say {A_1,A_2,\dotsc,A_{n-1},A_n} , the symmetric difference of the family can be written as {A_1\Delta A_2\dotsc A_{n-1}\Delta A_n}, without any brackets as such. Also, the order in which the symmetric differences are taken does not matter, i.e., rearranging the brackets, if any, does not change the value.

1.5. We now set up some notation, terminology, and related basic facts:

i) Let r and n be natural numbers with {0\leq r<n} and {n\geq 2}. For {r\geq1}, an r-dashed intersection of a family of sets {\{ A_i\}} (where r varies over natural numbers {\leq} n and {\geq 1}), is {A'_1\cap A'_2\dotsc\cap A'_{r-1}\cap A'_r\cap A_{r+1}\cap A_{r+2}\dotsc\cap A_{n-1}\cap A_n}, where { '} is used to indicate the complement of the set, in a universal set X. The indices can be permuted to give different r-dashed intersections of the given family. The 0-dashed intersection of a family is simply the intersection of all its terms.

ii) The symmetric difference of a family whose terms are n given sets is abbreviated as The symmetric difference of n sets. Similarly, An r-dashed intersection of a family whose terms are n given sets is abbreviated as an r-dashed intersection of n sets.

iii) Note that any two different r-dashed intersections of the same family(if all constituent sets are distinct) are disjoint.

iv) As observed in subsection 1.2 :

a) The symmetric difference of 2 sets is the union of all 1-dashed intersections of the 2 given sets.

b) The symmetric difference of 3 sets is the union of all 0-dashed and 2-dashed intersections of the 3 given sets.

Theorem 1.6: Let {n\geq 2}.

I) If n is even, the symmetric difference of n sets, not necessarily distinct, is the union of all r-dashed intersections of the n given sets, where r varies over odd numbers less than n.

II) If n is odd, the symmetric difference of n sets, not necessarily distinct, is the union of all r-dashed intersections of the n given sets, where r varies over even numbers less than n.

III) The number of sets, not necessarily distinct or non empty, in the representation of the symmetric difference of n sets is {2^{n-1}}. In particular, the number of sets representing the symmetric difference of n sets in an n-Venn diagram( According to Stan Wagon and Peter Webb, Venn symmetry and prime numbers: a seductive proof revisited, American Mathematical Monthly 115 (2008), 645 – 648 , an n-Venn diagram is a Venn diagram on n sets, which is defined to be a collection of n Jordan curves {C_1,C_2,\dotsc ,C_n} in the plane such that any two intersect in finitely many points and each of the {2^n} sets of the form {\cap C_i ^{\epsilon _i}} is nonempty and connected, where {\epsilon _i} is one of “interior” or “exterior”. Thus the Venn regions are all bounded except for the region exterior to all curves; each bounded region is the interior of a Jordan curve. An n-Venn diagram is symmetric if each curve {C_i} is {\rho ^i (C_1)}, where {\rho} is a rotation of order n about some center. The universally familiar three-circle Venn diagram is symmetric, as is the one on two sets using two circles. According to Lewis, Clarence Irving (1918). A Survey of Symbolic Logic. Berkeley: University of California Press. p. 157., the “principle of these diagrams is that classes [or sets] be represented by regions in such relation to one another that all the possible logical relations of these classes can be indicated in the same diagram. That is, the diagram initially leaves room for any possible relation of the classes, and the actual or given relation, can then be specified by indicating that some particular region is null or is not-null”.) is {2^{n-1}}.

In short, statement I) will be expressed as: the symmetric difference of n sets, where n is odd is the union of all evenly-dashed intersections of the n given sets. Similarly, statement II) will be expressed as the symmetric difference of n sets, where n is even is the union of all oddly-dashed intersections of the n given sets.

Joint Proof of statements I) and II):

For all {n\in\mathbb{N}} and {n\geq 1}, let {S_n} denote the following two phrases: a) Statement I) for the symmetric difference of 2n sets b) Statement II) for the symmetric difference of 2n+1 sets.  {S_n} is said to be true when both a) and b) corresponding to {S_n} are true.

From part iv) of subsection 1.5, {S_1} is true. Suppose that {S_n} is true. According to subsection 1.4, rearranging the brackets in the expression for the symmetric difference of 2(n+1) sets does not change its value. Thus, the symmetric difference of 2(n+1) sets can be written as the symmetric difference of the first 2n+1 sets and the (2n+2)th set. This is equal to (((2n+2)th set{)'\cap}(union of all evenly-dashed intersections of first 2n+1 sets)){\cup}((union of all evenly-dashed intersections of first 2n+1 sets{)'\cap}((2n+2)th set)). The next step is explained in the NOTE below. Now by De Morgan’s Laws, this is equal to (((2n+2)th set{)')\cap}(union of all evenly-dashed intersections of first 2n+1 sets){\cup}((intersection of all oddly-dashed unions of first 2n+1 sets){\cap}((2n+2)th set))). So, this is equal to (union of all oddly-dashed intersections of 2(n+1) sets where each intersection contains ((2n+2)th set{)'}) {\cup} (union of all oddly-dashed intersections of 2(n+1) sets where each intersection contains ((2n+2)th set)). This is equal to the union of all oddly-dashed intersections of 2(n+1) sets. Similarly, we can show that the symmetric difference of 2(n+1)+1 sets is the union of all evenly-dashed intersections of 2(n+1)+1 sets. Therefore {S_{n+1}} is true. Therefore, as {S_1} is true, and if {S_n} is true, then {S_{n+1}} is also true; by induction {S_n} is true for all n.


(union of all evenly-dashed intersections of 2n+1 sets{)'} =(intersection of all (evenly-dashed intersections of 2n+1 sets{)')}. Consider a typical evenly dashed intersection of 2n+1 sets, say {A'_1\cap A'_2\dotsc\cap A'_{2r-1}\cap A'_{2r}\cap A_{2r+1}\cap A_{2r+2}\dotsc\cap A_{2n}\cap A_{2n+1}}, where {0\leq r\leq n}. Then {(A'_1\cap A'_2\dotsc\cap A'_{2r-1}\cap A'_{2r}\cap A_{2r+1}\cap A_{2r+2}\dotsc\cap A_{2n}\cap A_{2n+1})'} = {A_1\cup A_2\dotsc\cup A_{2r-1}\cup A_{2r}\cup A'_{2r+1}\cup A'_{2r+2}\dotsc\cup A'_{2n}\cup A'_{2n+1}}. This will be referred to as a (2(n-r)+1)-dashed union of 2n+1 sets. As {0\leq (n-r)\leq n}, all oddly-dashed unions of 2n+1 sets are produced by varying r. Therefore, (intersection of all (evenly-dashed intersections of 2n+1 sets{)')} = (intersection of all oddly-dashed unions of 2n+1 sets).

Proof of statement III) using construction in the joint proof of statements I) and II) and elementary combinatorics identities:

If n is odd, the number of evenly-dashed intersections of the n sets (not necessarily distinct or disjoint), occuring in the representation, is equal to {\displaystyle\sum_{r=even\, numbers\,\leq n}\ {n\choose r}=2^{n-1}}. Similarly, if n is even, the number of oddly-dashed intersections of the n sets (not necessarily distinct or disjoint), occuring in the representation, is equal to {\displaystyle\sum_{r=odd\, numbers\,\leq n}\ {n\choose r}=2^{n-1}}. The elementary identities used here can be obtained by substituting 1 and -1 in place of x in the binomial expansion of {(1-x)^n}.

Alternate proof of III:

When n sets intersect, the number of regions, excluding the space outside the union of the sets, formed in an n-Venn diagram is {2^n-1}. If the number of regions formed by the symmetric difference of n-1 sets is {p_{n-1}}, then the number of regions formed by the symmetric difference of n sets is maximum, when the nth set intersects all the {2^{n-1}} regions formed by the first n-1 sets, along with the {p_{n-1}} regions of the symmetric difference and is still left with some elements which are not in any of the other n-1 sets. As the symmetric difference of 2 sets does not contain their intersection, therefore the symmetric difference of n sets, will not contain {p_{n-1}} of the {2^{n-1}-1} regions formed by the intersection of n-1 sets. So, the number of regions formed by the symmetric difference of n sets, {p_n=(p_{n-1})+(2^{n-1}-1-p_{n-1})+1=2^{n-1}}.

An illustration of the above proved theorem is shown below.

Note: The diagrams on the left are ordinary Venn diagrams with pink, tan, blue, green and cyan used for the constituent sets. The Venn diagrams on the right represent the corresponding symmetric differences.

Colour key for diagrams on the Right:

0-dashed intersections: blue,

1-dashed intersections: green,

2-dashed intersections: yellow,

3-dashed intersections: red,

4-dashed intersections: white.



Note that the more aesthetically pleasing, rotationally symmetric diagram for 4 sets does not exist, due to the following: Rotationally symmetric Venn diagrams for n sets exist, if and only if n is prime.(David Henderson, Venn diagrams for more than four classes, Amer. Math. Monthly, 70, 1963, 425-426; Stan Wagon and Peter Webb, Venn symmetry and prime numbers: a seductive proof revisited, American Mathematical Monthly 115 (2008), 645 – 648 }; Jerry Griggs, Charles Killian, and Carla Savage, Venn diagrams and symmetric chain decompositions in the Boolean Lattice, Electronic Journal of Combinatorics, Volume 11, January 2, 2004.)