On the Hardness of Counting Problems - Part 2
1 Recap
The previous post discussed the
and
families. We said that the hierarchy between decision problems from
do not preserved when moving to their Counting versions. We defined
and
formulas (see section 2 in previous post), and the problems CNF-SAT that is
-complete and DNF-SAT that is in
. We used the relation between their Counting versions to show that
and
are equally hard to be solved exactly. The relation is as follows: if
is a
variables
formula, then
Furthermore, we showed that both problems are
-complete, demonstrating that even an “easy” decision problem like DNF-SAT can be very hard when considering its Counting counterpart.
This post we will show that there is some hierarchy inside
when considering approximated solutions instead of exact solutions.
2 Approximation Algorithm for #DNF-SAT
2.1 Deterministic Approximation
Let
be a
formula, and
. We want to find a polynomial time algorithm that yields some
which is an approximation for
. Formally, for an approximation factor of
,
. We will talk more about the approximation parameter later.
First note that if a clause contains a variable and its negation then immediately we can conclude that there are no satisfying assignments (s.a.) for that clause as it contains a contradiction, so we will omit it from the formula (can be done in linear time). So from now on, assume that
do not contain any contradiction. For the rest of this discussion, we say that a boolean variable appear in a clause
if either it or its negation contained in
.
Let’s focus on an arbitrary clause
. We can easily count the number of s.a. it has; it is exactly
when
is the number of variables that do not appear in
. That is true because the value of any variable in
is determined by its appearance (if it has negation or not) so only variables which are not in
can be either
or
and having two options.
Moreover, we can efficiently sample at uniform any s.a. for that clause. Simply iterate over the
variables, if a variable appears in
then it have only one allowed value so assign it that value, and if it is not appear in
then toss a coin and assign
or
accordingly.
Trivial approximation algorithm will calculate the exact
for each
in
and returns their sum
. However, we may have s.a. that were counted more than once (if they are satisfying more than a single clause). Thus the approximation factor of that algorithm is
, because in the worst case all the clauses have the same set of satisfying assignments so we counted each s.a.
times, yielding
.
We can improve that algorithm by using the inclusion-exclusion principle: instead of accumulating the number of s.a. for each clause, accumulate the number of s.a. that weren’t count so far.
Unfortunately, finding which assignment was already counted forces us to enumerate all possible assignments, which obviously takes exponential time. Hence we will give a probabilistic estimation for that, instead. When working with estimations, we need to state the success probability for our estimation and it is also preferable that we will have a way to increase that probability (mostly at the cost of more computations and drawing more random samples).
Next section we will describe how to get an estimation of
such that
for any
, even a polynomially small one (i.e.
) with a very high probability.
2.2 Probabilistic Approximation
The overall procedure is still simple: iterate over each clause
and estimate the fraction of s.a. for
that were not counted before (as a number between zero and one), denote that estimation by
, let
be the total number of s.a. for
calculated as mentioned in the deterministic approximation section. Then output
.
It left to show how to compute
, so we do the following. Let
, when
will be determined later. Sample
independent times, an uniformly random s.a. for
. For each assignment define a corresponding binary variable
that is
if it is not satisfying any previous clause, and
otherwise. Note that evaluating all
takes
time. Then return
. That completes our probabilistic approximation algorithm for
.
Analysis
We need to analyze the quality of approximation of each
, then show that with high probability
is within the claimed approximation bounds. For the first part, we focus on a single estimation
for the clause
.
Let
be the number of all s.a. for
and let
be the number of s.a. for
that aren’t satisfying any clause before
. Out goal is to show that
is, with high probability, close to
. Observe that all the variables
for estimating
were actually sampled i.i.d from a Bernoulli distribution with parameter
(which is also its expectation). As a result
, so apply Hoeffding inequality to get
which means that with high probability the estimator is “successful”, that is,
and equivalently,
If all the estimators
are successful, then
and also similarly
, which are exactly the approximation bounds that we want to achieve. To conclude the analysis, note that by applying the union bound, all the estimators will be successful with probability
.
Trade off between the parameters
It is interesting to examine the relation between the different parameters, and how each affect the running time, success probability, and the quality of approximation.
Consider the parameter
, which is the number of samples the algorithm has to draw to estimate each
. That
depends linearly on the parameter
which controls the “success probability” of all the estimators (simultaneously). On the one hand, large
gives a success probability that is extremely close to
. But on the other hand, it increase the number of samples
per clause, and the total running time of the algorithm, which is
.
Similarly, the better approximation we want for
, the smaller
that we should set (even smaller than
), then the number of samples and the running time increase inversely proportional to
.
3 What About Approximating #CNF-SAT?
Previous section dealt with approximating, either in the deterministic or probabilistic manner, the
problem. Unfortunately, as oppose to that we did with exact solutions, approximating
do not help us approximating
. We will prove that by proving a stronger claim.
Let
be any
-complete problem, and denote
its Counting versions. There is no polynomial-time approximation algorithm of
to any factor, unless
.
Assume that
. Suppose toward contradiction that we have some polynomial-time approximation algorithm for
, and denote it by
. Given an instance
for the decision problem
, simulate
and get an approximation for
. Then declare
as the output of the decision problem if that approximation is zero, and
otherwise. Note that this reduction runs in polynomial time. Moreover, the Counting versions have zero solutions if and only if
. Since any multiplicative approximation of zero is still zero, we can use the approximation algorithm for the counting problem to efficiently solve the
-complete decision problem, contradicting that
.
4 Formulas are Not Always Easy!
During this and the previous post, we mentioned
formulas as something which is easy to solve. We showed that
, and always compared it to its “difficult” variation - the
version. However, a
formula can also be “difficult”. It just depends what we are asking. For example, deciding if a
formula
is a tautology (i.e. evaluated to
for any assignment) is an
-complete problem. One can show that by noting that
is a
formula, and
is a
instance for CNF-SAT if and only if
is a
instance for the DNF-TAUTOLOGY problem. Thus, the problem are equally hard.
Similarly, we can define an “easy” problem for
formulas: decide whether a
formula
is a tautology can be done in polynomial time, by checking if its negation
has a s.a. which is polynomial since
and
is a
formula.
As a conclusion, the hardness of those decision problems depends not only on the form of the formula but also on what are we asking to decide.