1 Randomized Algorithm Settings

Randomized algorithm is an algorithm that in addition to its inputs also uses internally coin-tosses (i.e. uniformly random bits) in order to decide on its output, so different invocations of the algorithm with same inputs may yield different outputs that depend on the results of these coins. One can think about the internal coin-tosses as a binary string of length that is being fixed before the algorithm starts and being consumed sequentially during the algorithm’s execution. Such string is often called seed, and the randomness within the algorithm now comes from choosing a different string at the beginning of each execution.
Note that tossing fair coins, is exactly like drawing uniformly at random a string from the set (i.e. each string has a probability of ).
Suppose we have a randomized algorithm that solves a problem of interest. We have a guarantee that for any input of length , algorithm runs in , and outputs either a valid solution if it succeeds or output . The success probability of the algorithm is (which may be even infinitesimally small).
The first question to ask is, can we convert the randomized algorithm into a deterministic one?

2 Brute Force Derandomization

The answer is yes, and there is a simple way to do that: Upon receiving an input , iterate over all possible strings and run . That is, simulate on with as the random seed. Return the first solution that outputs (i.e. anything that is not ).
The idea behind this method is that if there is a non-zero probability for to output a valid solution for the input , then there must be a random seed that leads to that output when executing algorithm on . So all we need to do is iterating over all possible seeds of length , and that leads to a deterministic algorithm.
You probably already see the problem here. If then the running time of the deterministic procedure is .
That is, of course, terrible running time.
On the other hand, if is a constant or even up to , then the resulting running time will stay polynomial, and everyone is happy. In the next section we will see that with an additional assumption, we can “decrease” the seed length from to bits.

3 Pairwise Independent Seed

We start off with a definition.
k-wise Independent Family. Let be a set of random variables. We say that is a k-wise independent family if for any subset of variables , and values that is, any subset of random variables from is fully independent. The family is called pairwise independent family when .
What this definition have with our settings? Observe that up until now we implicitly assumed that the algorithm needs its coins to be fully independent. That is, if we think of as a series of Bernoulli RVs then we actually require to be t-wise independent family, which is a much stronger property compare to the pairwise independence property. If we relax that requirement and assume that the algorithm only needs its coins to be pairwise independent, then there is a way (described in next section) to construct pairwise independent bits using only fully independent bits. That construction can be “embedded” into our algorithm such that we ultimately end up with a procedure that only requires fully independent seed. Applying the derandomization method from previous section will result with a deterministic algorithm that will still run in time.
Of course, the aforementioned derandomization method is valid only if the assumption of pairwise independent seed doesn’t break the correctness analysis of the randomized algorithm.

4 Stretching the Seed

Suppose we have fair fully independent coins, we want to construct a method to generate from them fair pairwise independent bits. The following claim shows such construction.
Suppose we have fully independent Bernoulli random variables, each with probability . Fix them in some arbitrary order to get a (random) vector of length . Let be the set of all possible sums modulo 2 of those RVs and note that , then is a family of pairwise independent bits.
To prove that we start with a very basic lemma.
Let be two independent random variables, let then . Moreover, is independent from and from .
For any thus . Next w.l.o.g we prove that independent from . Take any . In case then so and are independent. And in case that then similarly we get which conclude the proof.
Now we prove the claim from above.
Take two different variables and , and let , we want to show that and are independent. Obviously, we only care about the set of indices which are non-zero in the vectors and . Let these sets be respectively. Denote , and where summation over empty set is defined as the constant (all the operations in the proof are modulo 2). Note that any pair of variables from are independent because they defined on mutually disjoint set of indices. Moreover, by the lemma from above, each of them if it is not the constant zero then is distributed as . When plugging-in the new notations we get Because there must be at most one variable from which may be constant. Therefore, no matter which it is (if any), either or is a sum of two non-constant variables. W.l.o.g let it be , then it has the component which does not appear in and can flip the result of the sum with probability regardless of what happen anywhere else, therefore is independent of , and that completes the proof.
Last one remark. In our settings the randomized algorithm either return a solution or says “failed”, if we have a polynomial procedure to validate a solution we can generalize the settings to the case when the algorithm always output a solution, and then we verify it to determine if it is “succeeds” case of “failed” case.