Let ${X}$ be a measure space with measure ${\mu}$; let ${T: X \rightarrow X}$ be a measure-preserving transformation. Last time we looked at how the averages

$\displaystyle A_N := \frac{1}{N} \sum_{i=0}^{N-1} f \circ T^i$

behave in ${L^2}$. But, now we want pointwise convergence.

The pointwise ergodic theorem

We consider the pointwise ergodic theorem of Garrett George Birkhoff:

Theorem 1 (Birkhoff) Let ${f \in L^1(\mu)}$. Then the averages ${A_N}$ converge almost everywhere to a function ${f^* \in L^1(\mu)}$ with ${f^* \circ = f^*}$ a.e.

There is something of analogy between Birkhoff’s theorem and the well-known fact from real analysis that a function in ${L^1}$ on a euclidean space can be recovered from its integrals over balls, i.e., for almost all ${x \in \mathbb{R}^n}$,

$\displaystyle f(x) = \lim_{r \rightarrow 0} \frac{1}{m(B(x,r)} \int_{B(x,r)} f$

where ${m}$ is Lebesgue measure. The proof of this latter theorem usually proceeds by associating to a locally integrable function ${f}$ on ${\mathbb{R}^n}$ the Hardy-Littlewood maximal function

$\displaystyle Mf := \sup_{r>0} \frac{1}{m(B(x,r)} \int_{B(x,r)} |f|$

and proving that it defines a bounded sublinear operator from ${L^1}$ to weak ${L^1}$. Then, one uses an approximation argument since the continuous functions with compact support are dense in ${L^1}$.

The maximal ergodic theorem

In proving the Birkhoff ergodic theorem, we will define the maximal operator

$\displaystyle M_Tf(x) = \sup_{N \in \mathbb{Z}_{\geq 0}} \frac{1}{N} \sum_{i=0}^{N-1} f(T^i(x)).$

(When ${N=0}$, that expression is set to be zero, so ${M_Tf \geq 0}$ everywhere.) There is a similar weak-type inequality for this, which we will prove from the maximal ergodic theorem:

Theorem 2

For ${f \in L^1(\mu)}$,$\displaystyle \boxed{ \int_{M_T f > 0 } f d \mu \geq 0. }$

To prove this, we use the abbreviation ${U_T g := g \circ T}$; then ${U_T}$ becomes a transformation of ${L^1}$ onto itself. When ${M_Tf > 0}$, we have

$\displaystyle U_T M_Tf + f = M_T f$

as is easily seen. In particular,

$\displaystyle \int_{M_T f > 0 } f d \mu = \int_{M_T f > 0} M_T f d \mu - \int_{M_T f > 0} U_T M_T f d \mu$

which is at least $$||M_T f||_1 – ||UM_T f||_1 \geq 0$$ since

$\displaystyle \int_{M_T f > 0} M_T f d \mu = \int M_T f d \mu$

and ${U}$ is bounded by 1. This completes the proof.

It isn’t quite clear how the maximal ergodic theorem is a weak-type inequality. To do this, we fix ${\alpha > 0}$ and note that

$\displaystyle M_T f > \alpha \quad \mathrm{iff} \quad M_T(f-\alpha) > 0.$

In particular, by the maximal theorem,

$\displaystyle \boxed{ \int_{M_T f > \alpha} f d \mu \geq \alpha \mu( \{ M_T f > \alpha \} ) }$

which implies ${\mu( \{ M_T f > \alpha \} ) \leq \frac{1}{\alpha} ||f||_1}$, a weak-type bound. What we will actually use, however, is the boxed statement above, or rather a variant of it. If ${E \subset X}$ with ${T^{-1}E = E}$, then

$\displaystyle \boxed{ \int_{M_T f > \alpha \cap E} f d \mu \geq \alpha \mu( E \cap \{ M_T f > \alpha \} ) }$

which follows by doing all this with ${E, T|_E}$ replacing ${X, T}$.

Proof of the ergodic theorem

Given ${f \in L^1(\mu)}$ and ${\alpha, \beta \in \mathbb{R}}$, consider the sets

$\displaystyle U_{\alpha} : \lim \sup_{N \rightarrow \infty} \frac{1}{N} \sum_{i=0}^{N-1} U_T^i f(x) > \alpha$

and

$\displaystyle L_{\beta} := \lim \inf_{N \rightarrow \infty} \frac{1}{N} \sum_{i=0}^{N-1} U_T^i f(x) < \beta.$

I will show that when ${\alpha > \beta}$, ${\mu(U_{\alpha} \cap L_{\beta}) = 0}$. Taking the union of these intersections for ${\alpha, \beta \in \mathbb{Q}}$ with ${\alpha > \beta}$, one gets a set of measure zero outside of which the limit of the averages exists. So, it is enough to prove ${\mu(U_{\alpha} \cap L_{\beta}) = 0}$. Now ${T^{-1}(U_{\alpha} \cap L_{\beta}) = U_{\alpha} \cap L_{\beta}}$, as is easily seen. Also, at each point of ${(U_{\alpha} \cap L_{\beta}}$, we have ${M_T f > \alpha}$ so by the last boxed statement,

$\displaystyle \alpha \mu(U_{\alpha} \cap L_{\beta}) \leq \int_{U_{\alpha} \cap L_{\beta}} f d \mu .$

Now we can do the exact same thing with ${\beta}$, since ${U_{\alpha} \cap L_{\beta}}$ is the same thing as ${L_{-\alpha} \cap U_{-\beta}}$ for ${-f}$, which implies

$\displaystyle -\beta \mu( U_{\alpha} \cap L_{\beta} ) \leq \int_{U_{\alpha} \cap L_{\beta} } - f d \mu$

and putting this all together gives ${\alpha \mu(U_{\alpha} \cap L_{\beta}) \leq \beta \mu (U_{\alpha} \cap L_{\beta})}$, possible only if ${\mu( U_{\alpha} \cap L_{\beta}) = 0}$.