On the Hardness of Counting Problems - Part 2
1 Recap
The previous post discussed the 
 and 
 families. We said that the hierarchy between decision problems from 
 do not preserved when moving to their Counting versions. We defined 
 and 
 formulas (see section 2 in previous post), and the problems CNF-SAT that is 
-complete and DNF-SAT that is in 
. We used the relation between their Counting versions to show that 
 and 
 are equally hard to be solved exactly. The relation is as follows: if 
 is a 
 variables 
 formula, then 
Furthermore, we showed that both problems are 
-complete, demonstrating that even an “easy” decision problem like DNF-SAT can be very hard when considering its Counting counterpart. 
This post we will show that there is some hierarchy inside 
 when considering approximated solutions instead of exact solutions.
2 Approximation Algorithm for #DNF-SAT
2.1 Deterministic Approximation
Let 
 be a 
 formula, and 
. We want to find a polynomial time algorithm that yields some 
 which is an approximation for 
. Formally, for an approximation factor of 
, 
. We will talk more about the approximation parameter later.
First note that if a clause contains a variable and its negation then immediately we can conclude that there are no satisfying assignments (s.a.) for that clause as it contains a contradiction, so we will omit it from the formula (can be done in linear time). So from now on, assume that 
 do not contain any contradiction. For the rest of this discussion, we say that a boolean variable appear in a clause 
 if either it or its negation contained in 
.
Let’s focus on an arbitrary clause 
 . We can easily count the number of s.a. it has; it is exactly 
 when 
 is the number of variables that do not appear in 
. That is true because the value of any variable in 
 is determined by its appearance (if it has negation or not) so only variables which are not in 
 can be either 
 or 
 and having two options.
Moreover, we can efficiently sample at uniform any s.a. for that clause. Simply iterate over the 
 variables, if a variable appears in 
 then it have only one allowed value so assign it that value, and if it is not appear in 
 then toss a coin and assign 
 or 
 accordingly. 
Trivial approximation algorithm will calculate the exact 
 for each 
 in 
 and returns their sum 
. However, we may have s.a. that were counted more than once (if they are satisfying more than a single clause). Thus the approximation factor of that algorithm is 
, because in the worst case all the clauses have the same set of satisfying assignments so we counted each s.a. 
 times, yielding 
.
We can improve that algorithm by using the inclusion-exclusion principle: instead of accumulating the number of s.a. for each clause, accumulate the number of s.a. that weren’t count so far.
Unfortunately, finding which assignment was already counted forces us to enumerate all possible assignments, which obviously takes exponential time. Hence we will give a probabilistic estimation for that, instead. When working with estimations, we need to state the success probability for our estimation and it is also preferable that we will have a way to increase that probability (mostly at the cost of more computations and drawing more random samples). 
Next section we will describe how to get an estimation of 
 such that 
 for any 
, even a polynomially small one (i.e. 
) with a very high probability. 
2.2 Probabilistic Approximation
The overall procedure is still simple: iterate over each clause 
 and estimate the fraction of s.a. for 
 that were not counted before (as a number between zero and one), denote that estimation by 
, let 
 be the total number of s.a. for 
 calculated as mentioned in the deterministic approximation section. Then output 
.
It left to show how to compute 
, so we do the following. Let 
, when 
 will be determined later. Sample 
 independent times, an uniformly random s.a. for 
. For each assignment define a corresponding binary variable 
 that is 
 if it is not satisfying any previous clause, and 
 otherwise. Note that evaluating all 
 takes 
 time. Then return 
. That completes our probabilistic approximation algorithm for 
.
Analysis
We need to analyze the quality of approximation of each 
, then show that with high probability 
 is within the claimed approximation bounds. For the first part, we focus on a single estimation 
 for the clause 
. 
Let 
 be the number of all s.a. for 
 and let 
 be the number of s.a. for 
 that aren’t satisfying any clause before 
. Out goal is to show that 
 is, with high probability, close to 
. Observe that all the variables 
 for estimating 
 were actually sampled i.i.d from a Bernoulli distribution with parameter 
 (which is also its expectation). As a result 
, so apply Hoeffding inequality to get 
which means that with high probability the estimator is “successful”, that is,
and equivalently, 
If all the estimators 
 are successful, then 
and also similarly 
, which are exactly the approximation bounds that we want to achieve. To conclude the analysis, note that by applying the union bound, all the estimators will be successful with probability 
.
Trade off between the parameters
It is interesting to examine the relation between the different parameters, and how each affect the running time, success probability, and the quality of approximation.
Consider the parameter 
, which is the number of samples the algorithm has to draw to estimate each 
. That 
 depends linearly on the parameter 
 which controls the “success probability” of all the estimators (simultaneously). On the one hand, large 
 gives a success probability that is extremely close to 
. But on the other hand, it increase the number of samples 
 per clause, and the total running time of the algorithm, which is 
.
Similarly, the better approximation we want for 
, the smaller 
 that we should set (even smaller than 
), then the number of samples and the running time increase inversely proportional to 
.
3 What About Approximating #CNF-SAT?
Previous section dealt with approximating, either in the deterministic or probabilistic manner, the 
 problem. Unfortunately, as oppose to that we did with exact solutions, approximating 
 do not help us approximating 
. We will prove that by proving a stronger claim.
Let 
 be any 
-complete problem, and denote 
 its Counting versions. There is no polynomial-time approximation algorithm of 
 to any factor, unless 
.
Assume that 
. Suppose toward contradiction that we have some polynomial-time approximation algorithm for 
, and denote it by 
. Given an instance 
 for the decision problem 
, simulate 
 and get an approximation for 
. Then declare 
 as the output of the decision problem if that approximation is zero, and 
 otherwise. Note that this reduction runs in polynomial time. Moreover, the Counting versions have zero solutions if and only if 
. Since any multiplicative approximation of zero is still zero, we can use the approximation algorithm for the counting problem to efficiently solve the 
-complete decision problem, contradicting that 
.
4 Formulas are Not Always Easy!
During this and the previous post, we mentioned 
 formulas as something which is easy to solve. We showed that 
, and always compared it to its “difficult” variation - the 
 version. However, a 
 formula can also be “difficult”. It just depends what we are asking. For example, deciding if a 
 formula 
 is a tautology (i.e. evaluated to 
 for any assignment) is an 
-complete problem. One can show that by noting that 
 is a 
 formula, and 
 is a 
 instance for CNF-SAT if and only if 
 is a 
 instance for the DNF-TAUTOLOGY problem. Thus, the problem are equally hard.
Similarly, we can define an “easy” problem for 
 formulas: decide whether a 
 formula 
 is a tautology can be done in polynomial time, by checking if its negation 
 has a s.a. which is polynomial since 
 and 
 is a 
 formula.
As a conclusion, the hardness of those decision problems depends not only on the form of the formula but also on what are we asking to decide.