Oftentimes, we might want to test that two polynomials are equal. Many problems can be encoded as the equality of two polynomials—as we are going to see later, the existence of a bipartite matching in a graph can be phrased by saying that a certain determinant polynomial is nonzero.

1. Polynomials

Testing that a polynomial in the usual encoding (as a list of coefficients) is nontrivial is, of course, a trivial task, one that a simple linear scan would accomplish. But, as we are going to see, the polynomials that one may wish to test for being nonzero need not be presented to us in this form. Rather, they may come to us recursively.

Here is one example. We can think of a boolean formula as a tree of gates. Each gate has one or two ends flowing into it and one edge leaving. The gates can be and, or, and not (in the last case, one edge flows in). Given the boolean inputs that flow into it, the gate outputs true or false according to the usual formulas. Finally, there are input gates where one can choose to place either true or false, and one final output gate at the end. We can connect the output lines of gates as input lines of new gates, and combining them, we can get a boolean formula.

Similarly, we could present a polynomial as a tree of gates. Each gate is either addition or multiplication; the input gates can hold variables or constants. When we put the inputs in, we get a polynomial. But in a gate of size ${n}$, the total output (because of all the multiplications) could be exponential in the degree! The upshot of this is that, if we are presented with a polynomial in the form of a tree of gates, we find that actually evaluating the coefficients of the polynomial may be an exponential-time, intractable task. But if we wanted to evaluate the polynomial at a specific value, and modulo some (not too large) value, we could do it efficiently, by working through the gates and reducing mod the value at each step to keep things from getting too large.

So we get the idea that to test whether a polynomial is nonzero, it may be useful, instead of expanding it out, to evaluate it on some specific values, in which case we will get a probabilistic algorithm. However, nonzero polynomials have roots, and to make such an approach of testing work, we will have to randomize things.

2. The Schwarz-Zippel lemma

The Schwarz-Zippel lemma gives a bound on the probability that a multivariate polynomial over a finite field will have roots in a given set.

Lemma 1 (Schwarz-Zippel) Let ${F}$ be a field, ${P(x_1, \dots, x_n) \in F[x_1, \dots, x_n]}$ a nonzero polynomial of total degree ${d}$. Suppose ${S \subset F}$ is a finite subset. Then if the tuple ${(a_1, \dots, a_n)}$ is randomly chosen from ${S^n}$, then the probability that ${P(a_1, \dots, a_n) = 0}$ is at most ${\frac{d}{|S|}}$.

We are going to prove this by induction on the number ${n}$ of variables. When ${n=1}$, this is just the statement that a one-variable polynomial of degree ${d}$ can have at most ${d}$ roots (if it is nonzero). Now suppose the result proved for ${n-1}$. We can write

$\displaystyle P = \sum_{i=0}^n x_n^i P_i(x_1, \dots, x_{n-1})$

where ${P_i}$ involves only the first ${n-1}$ variables and has degree ${d-i}$. Let ${i'}$ be the largest index such that ${P_{i'}}$ is not identically zero. Then at the pairs ${(a_1, \dots, a_n)}$ such that ${P_{i'}(a_1, \dots, a_{n-1}) = 0}$, there will be at most ${i'}$ choices of ${a_n}$ to make ${P}$ vanish. So we have

$\displaystyle \mathcal{P}(P(a_1, \dots, a_n) = 0) \leq \mathcal{P}(P_{i'}(a_1, \dots, a_{n-1}) = 0 ) + \mathcal{P}(P_i \neq 0) \frac{i'}{|S|} .$

By induction, this is at most

$\displaystyle \frac{d - i'}{|S|} + \frac{i'}{|S|} = \frac{d}{|S|}$

which proves the lemma.

3. Polynomial identity testing

So, we are given the following situation. We have two polynomials ${P_1(x_1, \dots, x_n), P_2(x_1, \dots, x_n) \in \mathbb{Z}[x_1, \dots, x_n]}$ expressed in some funny form (like via an arithmetic circuit), and these polynomials may have very large degree (exponential in the input size ${n}$). However, if ${k}$ is a reasonable number (of size polynomial in ${n}$), we can evaluate the polynomials ${P_1, P_2}$ mod ${k}$ in polynomial time. This is the case for the boolean circuit, for instance.

Then actually computing the coefficients ${P_1, P_2}$ cannot be done efficiently because the degree is exponential. Even computing the value of ${P_1, P_2}$ by themselves is infeasible because the output may need exponentially many bits! However, we can compute their values mod any small ${k}$. From this, we want a probabilistic algorithm that will tell us whether ${P_1 = P_2}$ or not.

Here is the idea. Let us suppose we know that the degrees of ${P_1, P_2}$ are at most ${M}$ (which is generally exponential in ${n}$). Given this, choose ${k}$ randomly in ${[0, M^2]}$. (Then ${k}$ has ${O(n)}$ bits, so we are ok.) Choose inputs ${a_1, \dots, a_n}$ randomly in ${[0, M^2]}$. Then we evaluate ${P_1 - P_2}$ mod ${k}$ on ${a_1, \dots, a_n}$. This can be done in polynomial time. If this is zero, output that ${P_1, P_2}$ are equal; if not, output that they are unequal.

What we need to show is that if ${P_1 \neq P_2}$, then there is a high probability that the algorithm will output “unequal.” Well, there is a high probability that ${P_1(a_1, \dots, a_n) \neq P_2(a_1, \dots, a_n)}$ as integers by the Schwarz-Zippel lemma. Suppose that these two values are indeed unequal; then the claim is that for most choices of ${k}$, we will get something nonzero mod ${k}$. But ${(P_1 - P_2)(a_1, \dots, a_n)}$ is ${O(M^{2M})}$, so it has ${O(M \log M)}$ prime factors. Conversely, there are about ${M^2/\log M}$ possible primes in ${[0, M^2]}$. So the probability that ${k}$ is prime and does not divide ${(P_1 - P_2)(a_1, \dots, a_m)}$ is at least

$\displaystyle \frac{M^2/\log M - M \log M}{M^2} ,$

which is comparable to ${\frac{1}{\log M}}$ or comparable to ${\frac{1}{n}}$. It follows that the probability that our algorithm falsely concludes that two unequal polynomials are equal is about ${\frac{1}{n}}$. Granted, this is ${o(1)}$ as ${n \rightarrow \infty}$. So, to make it better, we just run it ${n}$ times or so; then the probability that our algorithm fails is small, while it still runs in polynomial time.

I should probably mention that I’m drawing the material from these computer science posts primarily from Papadimitriou’s and Arora-Barak’s textbooks on complexity theory, and some of my (limited) understanding of basic algorithms comes from CLRS (as well as the course I am taking).