Let {(X, \mu)} be a probability space and {T: X \rightarrow X} a measure-preserving transformation. In many cases, it turns out that the averages of a function {f} given by

\displaystyle \frac{1}{N} \sum_{i=0}^{N-1} f \circ T^i

actually converge a.e. to a constant.

This is the case if {T} is ergodic, which we define as follows: {T} is ergodic if for all {E \subset X} with {T^{-1}E = E}, {m(E)=1} or {0}. This is a form of irreducibility; the system {X,T} has no smaller subsystem (disregarding measure zero sets). It is easy to see that this is equivalent to the statement: {f} measurable (one could assume measurable and bounded if one prefers) and {T}-invariant implies {f} constant a.e. (One first shows that if {T} is ergodic, then {\mu(T^{-1}E \Delta E )} implies {\mu(E)=0,1}, by constructing something close to {E} that is {T}-invariant.)

In this case, therefore, the ergodic theorem takes the following form. Let {f: X \rightarrow \mathbb{C}} be integrable. Then almost everywhere,

\displaystyle \boxed{ \frac{1}{N} \sum_{i=0}^{N-1} f ( T^i (x)) \rightarrow \int_X f d\mu .}

This is a very useful fact, and it has many applications.

Example: rotations of the circle

Consider the unit circle {S^1} and the rotation {T_\xi: S^1 \rightarrow S^1}, {z \rightarrow e^{2 \pi i \xi}}, where {\xi} is irrational. I claim that it is ergodic. Indeed, suppose {f \in L^2(S^1)} was invariant under the rotation; suppose its Fourier expansion is {\sum_{n \in \mathbb{Z}} c_n z^n}. Then by assumption {c_n = c_n \xi^n} for all {n}, so {c_n \equiv 0}. In the same vein, it can be shown that a rotation by {a} of a compact abelian group (with respect to Haar measure) is ergodic iff the powers of {a} are dense.


An averaging interpretation of ergodicity

We now prove:

Proposition 1 {T} is ergodic iff for all {A,B} measurable\displaystyle \frac{1}{N} \sum_{k=0}^{N-1} \mu( T^{-k} A \cap B ) \rightarrow \mu(A) \mu(B), \ N \rightarrow \infty.


The proof is an easy application of the ergodic theorem, but let’s see what it means intuitively. If {C,D} are independent sets (independent in the sense of probability theory), then {\mu(C)\mu(D) = \mu(C \cap D)}. Now {\mu(T^{-k} A) \mu(B) = \mu(A) \mu(B)}, so the theorem says that ergodicity is equivalent to the statement that for any {A,B}, the sets {T^{-k}A , B} are asymptotically independent of each other in a Cesaro summability sense. This in turn leads to the stronger notions of weak and strong mixing given below.

Suppose first {T} is ergodic. Then we have

\displaystyle \frac{1}{N} \sum_{k=0}^{N-1} \chi_{T^{-k}A} \rightarrow \mu(A) \quad \mathrm{a.e.}

as {N \rightarrow \infty}, by the Birkhoff theorem. If we multiply by {\chi_B} and integrate (recall the dominated convergence theorem), we get the claim as in the proposition.

Now suppose the limit exists as stated for any {A,B}, and we prove ergodicity. Suppose {T^{-1}E = E}; then it follows that {\mu(E) = \mu(E)^2} so {\mu(E)=0,1}.

Weak and strong mixing

Say that {T} is weak-mixing if for all {A, B} measurable

\displaystyle \frac{1}{N} \sum_{k=0}^{N-1} | \mu( T^{-k} A \cap B ) - \mu(A) \mu(B) | \rightarrow 0

as {N \rightarrow \infty}. This is clearly a strengthening of ergodicity. Say that {T} is strong-mixing if for all {A,B} measurable,

\displaystyle \mu( T^{-N} A \cap B ) \rightarrow \mu(A) \mu(B), \ N \rightarrow \infty.

In these conditions, it is often only necessary to check them on some subset of all measurable sets. If any measurable set can be arbitrarily approximated by an element of some class {\mathcal{S}} (which is to say that if {A} is measurable and {\epsilon>0}, there is {S \in \mathcal{S}} with {\mu( A \Delta S) < \epsilon}), then one only needs to check these conditions on {\mathcal{S}}. This can be seen by a standard argument.

As an example, consider the space {X = \prod_{\mathbb{Z}} Y}, where {Y} is the discrete measure space {\{0, 1, \dots, k-1\}} such that {\{x\}} has measure {\frac{1}{k}}. Then one has a shift {S: X \rightarrow X} that shifts the coordinates by one. It is easy to check that the strong-mixing hypothesis holds when {A,B} are sets that depend on only finitely many coordinates, so {S} is strong-mixing more generally.