Derandomization for Pairwise Independent Seed
1 Randomized Algorithm Settings
Randomized algorithm is an algorithm that in addition to its inputs also uses internally
coin-tosses (i.e. uniformly random bits) in order to decide on its output, so different invocations of the algorithm with same inputs may yield different outputs that depend on the results of these coins. One can think about the internal coin-tosses as a binary string
of length
that is being fixed before the algorithm starts and being consumed sequentially during the algorithm’s execution. Such string is often called seed, and the randomness within the algorithm now comes from choosing a different string at the beginning of each execution.
Note that tossing
fair coins, is exactly like drawing uniformly at random a string
from the set
(i.e. each string has a probability of
).
Suppose we have a randomized algorithm
that solves a problem of interest. We have a guarantee that for any input
of length
, algorithm
runs in
, and outputs either a valid solution
if it succeeds or output
. The success probability of the algorithm is
(which may be even infinitesimally small).
The first question to ask is, can we convert the randomized algorithm into a deterministic one?
2 Brute Force Derandomization
The answer is yes, and there is a simple way to do that: Upon receiving an input
, iterate over all possible strings
and run
. That is, simulate
on
with
as the random seed. Return the first solution that
outputs (i.e. anything that is not
).
The idea behind this method is that if there is a non-zero probability for
to output a valid solution for the input
, then there must be a random seed that leads to that output when executing algorithm
on
. So all we need to do is iterating over all possible seeds of length
, and that leads to a deterministic algorithm.
You probably already see the problem here. If
then the running time of the deterministic procedure is
.
That is, of course, terrible running time.
On the other hand, if
is a constant or even up to
, then the resulting running time will stay polynomial, and everyone is happy. In the next section we will see that with an additional assumption, we can “decrease” the seed length from
to
bits.
3 Pairwise Independent Seed
We start off with a definition.
k-wise Independent Family. Let
be a set of random variables. We say that
is a k-wise independent family if for any subset of
variables
, and values
that is, any subset of
random variables from
is fully independent. The family
is called pairwise independent family when
.
What this definition have with our settings? Observe that up until now we implicitly assumed that the algorithm needs its coins to be fully independent. That is, if we think of
as a series of Bernoulli RVs then we actually require
to be t-wise independent family, which is a much stronger property compare to the pairwise independence property. If we relax that requirement and assume that the algorithm only needs its coins to be pairwise independent, then there is a way (described in next section) to construct
pairwise independent bits using only
fully independent bits. That construction can be “embedded” into our algorithm such that we ultimately end up with a procedure that only requires
fully independent seed. Applying the derandomization method from previous section will result with a deterministic algorithm that will still run in
time.
Of course, the aforementioned derandomization method is valid only if the assumption of pairwise independent seed doesn’t break the correctness analysis of the randomized algorithm.
4 Stretching the Seed
Suppose we have
fair fully independent coins, we want to construct a method to generate from them
fair pairwise independent bits. The following claim shows such construction.
Suppose we have
fully independent Bernoulli random variables, each with probability
. Fix them in some arbitrary order to get a (random) vector
of length
. Let
be the set of all possible sums modulo 2 of those RVs and note that
, then
is a family of pairwise independent bits.
To prove that we start with a very basic lemma.
Let
be two independent random variables, let
then
. Moreover,
is independent from
and from
.
For any
thus
. Next w.l.o.g we prove that
independent from
. Take any
. In case
then
so
and
are independent. And in case that
then similarly we get
which conclude the proof.
Now we prove the claim from above.
Take two different variables
and
, and let
, we want to show that
and
are independent. Obviously, we only care about the set of indices which are non-zero in the vectors
and
. Let these sets be
respectively. Denote
,
and
where summation over empty set is defined as the constant
(all the operations in the proof are modulo 2). Note that any pair of variables from
are independent because they defined on mutually disjoint set of indices. Moreover, by the lemma from above, each of them if it is not the constant zero then is distributed as
. When plugging-in the new notations we get
Because
there must be at most one variable from
which may be constant. Therefore, no matter which
it is (if any), either
or
is a sum of two non-constant variables. W.l.o.g let it be
, then it has the
component which does not appear in
and can flip the result of the sum with probability
regardless of what happen anywhere else, therefore
is independent of
, and that completes the proof.
Last one remark. In our settings the randomized algorithm either return a solution or says “failed”, if we have a polynomial procedure to validate a solution we can generalize the settings to the case when the algorithm always output a solution, and then we verify it to determine if it is “succeeds” case of “failed” case.