1 Recap

The previous post discussed the P,NP \mathcal{P},\mathcal{NP} and #P \#\mathcal{P} families. We said that the hierarchy between decision problems from P,NP \mathcal{P},\mathcal{NP} do not preserved when moving to their Counting versions. We defined CNF CNF and DNF DNF formulas (see section 2 in previous post), and the problems CNF-SAT that is NP \mathcal{NP} -complete and DNF-SAT that is in P \mathcal{P} . We used the relation between their Counting versions to show that #DNF \#DNF and #CNF \#CNF are equally hard to be solved exactly. The relation is as follows: if ϕ \phi is a n n variables CNF CNF formula, then #CNF(ϕ)+#DNF(¬ϕ)=2n \#CNF\left(\phi\right)+\#DNF\left(\neg\phi\right)=2^{n} Furthermore, we showed that both problems are #P \#\mathcal{P} -complete, demonstrating that even an “easy” decision problem like DNF-SAT can be very hard when considering its Counting counterpart.
This post we will show that there is some hierarchy inside #P \#\mathcal{P} when considering approximated solutions instead of exact solutions.

2 Approximation Algorithm for #DNF-SAT

2.1 Deterministic Approximation

Let ψ=i=1mCi \psi=\bigvee_{i=1}^{m}C_{i} be a DNF DNF formula, and k=#DNF(ψ) k^{*}=\#DNF\left(\psi\right) . We want to find a polynomial time algorithm that yields some k k which is an approximation for k k^{*} . Formally, for an approximation factor of p>1 p>1 , kkpk k^{*}\le k\le p\cdot k^{*} . We will talk more about the approximation parameter later.
First note that if a clause contains a variable and its negation then immediately we can conclude that there are no satisfying assignments (s.a.) for that clause as it contains a contradiction, so we will omit it from the formula (can be done in linear time). So from now on, assume that ψ \psi do not contain any contradiction. For the rest of this discussion, we say that a boolean variable appear in a clause C C if either it or its negation contained in C C .
Let’s focus on an arbitrary clause Ciψ C_{i}\in\psi . We can easily count the number of s.a. it has; it is exactly #DNF(Ci)=2t \#DNF\left(C_{i}\right)=2^{t} when t{0,1,..,n1} t\in\left\{ 0,1,..,n-1\right\} is the number of variables that do not appear in Ci C_{i} . That is true because the value of any variable in Ci C_{i} is determined by its appearance (if it has negation or not) so only variables which are not in Ci C_{i} can be either True True or False False and having two options.
Moreover, we can efficiently sample at uniform any s.a. for that clause. Simply iterate over the n n variables, if a variable appears in Ci C_{i} then it have only one allowed value so assign it that value, and if it is not appear in Ci C_{i} then toss a coin and assign True True or False False accordingly.
Trivial approximation algorithm will calculate the exact #DNF(Ci) \#DNF\left(C_{i}\right) for each Ci C_{i} in ψ \psi and returns their sum k=i=1m#DNF(Ci) k=\sum_{i=1}^{m}\#DNF\left(C_{i}\right) . However, we may have s.a. that were counted more than once (if they are satisfying more than a single clause). Thus the approximation factor of that algorithm is m m , because in the worst case all the clauses have the same set of satisfying assignments so we counted each s.a. m m times, yielding kmk k\le m\cdot k^{*} .
We can improve that algorithm by using the inclusion-exclusion principle: instead of accumulating the number of s.a. for each clause, accumulate the number of s.a. that weren’t count so far.
Unfortunately, finding which assignment was already counted forces us to enumerate all possible assignments, which obviously takes exponential time. Hence we will give a probabilistic estimation for that, instead. When working with estimations, we need to state the success probability for our estimation and it is also preferable that we will have a way to increase that probability (mostly at the cost of more computations and drawing more random samples).
Next section we will describe how to get an estimation of k k such that (1p)kk(1+p)k \left(1-p\right)k^{*}\le k\le\left(1+p\right)k^{*} for any p p , even a polynomially small one (i.e. p=1poly(n) p=\frac{1}{poly\left(n\right)} ) with a very high probability.

2.2 Probabilistic Approximation

The overall procedure is still simple: iterate over each clause Ci C_{i} and estimate the fraction of s.a. for Ci C_{i} that were not counted before (as a number between zero and one), denote that estimation by k~i \tilde{k}_{i} , let Si S_{i} be the total number of s.a. for Ci C_{i} calculated as mentioned in the deterministic approximation section. Then output k=i=1mk~iSi k=\sum_{i=1}^{m}\tilde{k}_{i}\cdot S_{i} .
It left to show how to compute k~i \tilde{k}_{i} , so we do the following. Let t=c(mp)2lnm t=c\left(\frac{m}{p}\right)^{2}\ln m , when c c will be determined later. Sample t t independent times, an uniformly random s.a. for Ci C_{i} . For each assignment define a corresponding binary variable Xj X_{j} that is 1 1 if it is not satisfying any previous clause, and 0 0 otherwise. Note that evaluating all {Xj}j=1t \left\{ X_{j}\right\} _{j=1}^{t} takes O(poly(n,m)t) \mathcal{O}\left(poly\left(n,m\right)\cdot t\right) time. Then return k~i=1tj=1tXj \tilde{k}_{i}=\frac{1}{t}\sum_{j=1}^{t}X_{j} . That completes our probabilistic approximation algorithm for #DNF \#DNF .

Analysis

We need to analyze the quality of approximation of each ki~ \tilde{k_{i}} , then show that with high probability i=1mk~iSi \sum_{i=1}^{m}\tilde{k}_{i}\cdot S_{i} is within the claimed approximation bounds. For the first part, we focus on a single estimation k~i \tilde{k}_{i} for the clause Ci C_{i} .
Let Si S_{i} be the number of all s.a. for Ci C_{i} and let si s_{i} be the number of s.a. for Ci C_{i} that aren’t satisfying any clause before Ci C_{i} . Out goal is to show that k~i \tilde{k}_{i} is, with high probability, close to siSi \frac{s_{i}}{S_{i}} . Observe that all the variables {Xj}j=1t \left\{ X_{j}\right\} _{j=1}^{t} for estimating k~i \tilde{k}_{i} were actually sampled i.i.d from a Bernoulli distribution with parameter qi=siSi q_{i}=\frac{s_{i}}{S_{i}} (which is also its expectation). As a result E[k~i]=qi \mathbb{E}\left[\tilde{k}_{i}\right]=q_{i} , so apply Hoeffding inequality to get P[k~iqi>pm]2exp(2(pm)2t2t)=2exp(2clnm)=2m2c \begin{aligned} \mathbf{P}\left[\left|\tilde{k}_{i}-q_{i}\right|>\frac{p}{m}\right] & \le2\exp\left(\frac{-2\left(\frac{p}{m}\right)^{2}t^{2}}{t}\right)\\ & =2\exp\left(-2c\ln m\right)=2m^{-2c} \end{aligned} which means that with high probability the estimator is “successful”, that is,qipmk~iqi+pm q_{i}-\frac{p}{m}\le\tilde{k}_{i}\le q_{i}+\frac{p}{m} and equivalently, siSipmk~iSisi+Sipm s_{i}-S_{i}\frac{p}{m}\le\tilde{k}_{i}\cdot S_{i}\le s_{i}+S_{i}\frac{p}{m} If all the estimators k~1,k~2,.. \tilde{k}_{1},\tilde{k}_{2},.. are successful, then ki=1m(si+Sipm)=k+i=1mSipmk+pm(mk)=(1+p)k k\le\sum_{i=1}^{m}\left(s_{i}+S_{i}\frac{p}{m}\right)=k^{*}+\sum_{i=1}^{m}S_{i}\frac{p}{m}\le k^{*}+\frac{p}{m}\left(mk^{*}\right)=\left(1+p\right)k^{*} and also similarly k(1p)k k\ge\left(1-p\right)k^{*} , which are exactly the approximation bounds that we want to achieve. To conclude the analysis, note that by applying the union bound, all the estimators will be successful with probability 1i=1m2m2c=12m2c1 \ge1-\sum_{i=1}^{m}2m^{-2c}=1-\frac{2}{m^{2c-1}} .

Trade off between the parameters

It is interesting to examine the relation between the different parameters, and how each affect the running time, success probability, and the quality of approximation.
Consider the parameter t t , which is the number of samples the algorithm has to draw to estimate each k~i \tilde{k}_{i} . That t t depends linearly on the parameter c c which controls the “success probability” of all the estimators (simultaneously). On the one hand, large c c gives a success probability that is extremely close to 1 1 . But on the other hand, it increase the number of samples t t per clause, and the total running time of the algorithm, which is O(poly(n,m)cp2) \mathcal{O}\left(poly\left(n,m\right)\cdot c\cdot p^{-2}\right) .
Similarly, the better approximation we want for k k^{*} , the smaller p p that we should set (even smaller than 1 1 ), then the number of samples and the running time increase inversely proportional to p2 p^{2} .

3 What About Approximating #CNF-SAT?

Previous section dealt with approximating, either in the deterministic or probabilistic manner, the #DNF \#DNF problem. Unfortunately, as oppose to that we did with exact solutions, approximating #DNF \#DNF do not help us approximating #CNF \#CNF . We will prove that by proving a stronger claim.
Let P P be any NP \mathcal{NP} -complete problem, and denote #P \#P its Counting versions. There is no polynomial-time approximation algorithm of #P \#P to any factor, unless P=NP \mathcal{P}=\mathcal{NP} .
Assume that PNP \mathcal{P}\ne\mathcal{NP} . Suppose toward contradiction that we have some polynomial-time approximation algorithm for #P \#P , and denote it by Alg Alg . Given an instance x x for the decision problem P P , simulate Alg(x) Alg\left(x\right) and get an approximation for #P(x) \#P\left(x\right) . Then declare No No as the output of the decision problem if that approximation is zero, and Yes Yes otherwise. Note that this reduction runs in polynomial time. Moreover, the Counting versions have zero solutions if and only if P(x)=No P\left(x\right)=No . Since any multiplicative approximation of zero is still zero, we can use the approximation algorithm for the counting problem to efficiently solve the NP \mathcal{NP} -complete decision problem, contradicting that PNP \mathcal{P}\ne\mathcal{NP} .

4DNF DNF Formulas are Not Always Easy!

During this and the previous post, we mentioned DNF DNF formulas as something which is easy to solve. We showed that DNFSATP DNFSAT\in\mathcal{P} , and always compared it to its “difficult” variation - the CNF CNF version. However, a DNF DNF formula can also be “difficult”. It just depends what we are asking. For example, deciding if a DNF DNF formula ψ \psi is a tautology (i.e. evaluated to True True for any assignment) is an NP \mathcal{NP} -complete problem. One can show that by noting that ϕ:=¬ψ \phi:=\neg\psi is a CNF CNF formula, and ϕ \phi is a No No instance for CNF-SAT if and only if ψ \psi is a Yes Yes instance for the DNF-TAUTOLOGY problem. Thus, the problem are equally hard.
Similarly, we can define an “easy” problem for CNF CNF formulas: decide whether a CNF CNF formula ϕ \phi is a tautology can be done in polynomial time, by checking if its negation ψ:=¬ϕ \psi:=\neg\phi has a s.a. which is polynomial since DNFSATP DNFSAT\in\mathcal{P} and ψ \psi is a DNF DNF formula.
As a conclusion, the hardness of those decision problems depends not only on the form of the formula but also on what are we asking to decide.