As I mentioned earlier, I’ve been reading a little about the probabilistic method, especially in combinatorics and related fields, as a result of a course I’m TAing. This material is certainly fundamental to a lot of mathematics and computer science, but I’ve never really studied probability. I’ve been trying to change that. The material in this post is from Alon and Spencer’s The Probabilistic Method.
1. First moments
For the purposes of the probabilistic method, we often want to show that a certain event is possible, which is done by showing that it occurs with positive probability as the outcome of a random process. Let be a nonnegative random variable on a probability space . A basic observation is that:
that is, a random variable takes at least its expectation value with positive probability. This alone is sufficient to obtain many interesting results.
Example: Let’s go back to the problem of bounding below the Ramsey numbers. Consider a random 2-coloring of the edges of the graph , and let be the random variable which outputs the number of monochromatic -cliques, and . We note that if we remove one vertex from from each monochromatic edge, we get a 2-colored complete graph with vertices and no -cliques.
In particular, for any 2-coloring of , we can find a 2-coloring of with no monochromatic -clique. Moreover, we find that:
where we note that is the sum of the indicator random variables for the events “ is monochromatic,” each of which occurs with probability . It follows that is at least this value with positive probability, so that there exists a 2-coloring of the graph with no monochromatic -clique. In particular, for every ,
This method of “alterations” gives a slight improvement to the probabilistic method in its first form applied to this problem, but it’s not as good as the improvements made using the Lovasz local lemma.
Example: A tournament is an orientation of the complete graph . It is known that every tournament has a Hamiltonian path. The probabilistic method can be used to show that there is a tournament on vertices with at least
Hamiltonian paths. Namely, for every ordering of via , we consider the event that a random tournament on (that is, an independent choice of each orientation with probability ) has a Hamiltonian path in . Its probability is .
Let be the random variable assigning to a tournament the number of Hamiltonian paths. Then is the sum of random variables of the form , one for each linear ordering of . We have seen that , so that
There is thus a tournament on which takes at least this value, or a tournament with at least Hamiltonian paths.
2. Chebyshev’s inequality
However, this method is a little crude; it used no information on other than its expectation value. We can get better control on the distribution of if we know not only the expectation value of , but the higher moments. Let be the variance of . Then we have Chebyshev’s inequality,
which in turn follows from Markov’s inequality by writing
and using the definition . This has two benefits: one, we get both lower and upper bounds on the values of for “most” points in the probability space, though we are required to know the variance; and two, we get a quadratic term in the denominator.
For a simple example, let’s suppose that are identically distributed, independent simple random variables with expectation zero. Then if is large, we would expect that
tends to be fairly clustered near the origin. The law of large numbers and the central limit theorem provide justifications for this fact. From our point of view, we can see some of this by noting that the variance of is significantly controlled by the assumption of independence. If is the common standard deviation of the , then has variance , so that
This is the weak version of the law of large numbers: the probabilities of deviations above a certain level tends to zero as . Again, the observation that enabled us to do this was the independence of . We can do a little better if we consider not just the first and second moments but rather the moment generating function; this leads to the Chernoff bound.
3. The Hardy-Ramanujan theorem
Let’s give an example: how many prime factors does a random integer have? Let , the probability space assigning each integer between and equal probability. In particular, we have a random variable with the definition
so that is the sum of the indicator variables testing whether a number is divisible by . We’d like to understand the distribution of .
To do this, note first that since
we can conclude
where we use the prime number theorem. This suggests that a “random” number has about factors. To make this precise, though, we need a second moment inequality: that is, we need to bound the variance of . It turns out that we can do this, because is the sum of the , which are “almost” independent in the sense that their covariances are small.
The result is given by:
Theorem 1 (Hardy; Ramanujan) Let approach infinity. Then
The proof of this result is based upon estimating the variance of the random variable on and Chebyshev’s inequality. In fact, it is of the same order as the expectation value. Namely, we need to estimate
where denotes the covariance
To estimate the first term, we use
where we take advantage of the fact that is an indicator random variable.
Next, we need to show that the covariances are small, so that the random variables are “almost” independent. Since is the event that an integer is divisible by , we find:
This is at most
For , there are at most primes , so that this sum is at most
In particular, combining the bound on the covariance and , we find that
Chebyshev’s inequality now gives the Hardy-Ramanujan theorem: the probabilities in the statement are in fact .