2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 2015
We show how to compute any symmetric Boolean function on n variables over any field (as well as t... more We show how to compute any symmetric Boolean function on n variables over any field (as well as the integers) with a probabilistic polynomial of degree O(n log(1/ε)) and error at most ε. The degree dependence on n and ε is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution. This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let c(n) : N → N. Suppose we are given a database D of n vectors in {0, 1} c(n) logn and a collection of n query vectors Q in the same dimension. For all u ∈ Q, we wish to compute a v ∈ D with minimum Hamming distance from u. We solve this problem in n 2−1/O(c(n) log 2 c(n)) randomized time. Hence, the problem is in "truly subquadratic" time for O(log n) dimensions, and in subquadratic time for d = o((log 2 n)/(log log n) 2). We apply the algorithm to computing pairs with maximum inner product, closest pair in ℓ 1 for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.
Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014
We present a new randomized method for computing the min-plus product (a.k.a., tropical product) ... more We present a new randomized method for computing the min-plus product (a.k.a., tropical product) of two n × n matrices, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense n-node directed graphs with arbitrary edge weights. On the real RAM, where additions and comparisons of reals are unit cost (but all other operations have typical logarithmic cost), the algorithm runs in time n 3 2 Ω(log n) 1/2 and is correct with high probability. On the word RAM, the algorithm runs in n 3 /2 Ω(logn) 1/2 +n 2+o(1) log M time for edge weights in ([0, M] ∩ Z) ∪ {∞}. Prior algorithms took either O(n 3 / log c n) time for various c ≤ 2, or O(M α n β) time for various α > 0 and β > 2. The new algorithm applies a tool from circuit complexity, namely the Razborov-Smolensky polynomials for approximately representing AC 0 [p] circuits, to efficiently reduce a matrix product over the (min, +) algebra to a relatively small number of rectangular matrix products over F 2 , each of which are computable using a particularly efficient method due to Coppersmith. We also give a deterministic version of the algorithm running in n 3 /2 log δ n time for some δ > 0, which utilizes the Yao-Beigel-Tarui translation of AC 0 [m] circuits into "nice" depth-two circuits. * This is a preliminary version; comments are welcome.
Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014
Let ACC• THR be the class of constant-depth circuits comprised of AND, OR, and MODm gates (for so... more Let ACC• THR be the class of constant-depth circuits comprised of AND, OR, and MODm gates (for some constant m > 1), with a bottom layer of gates computing arbitrary linear threshold functions. This class of circuits can be seen as a "midpoint" between ACC (where we know nontrivial lower bounds) and depth-two linear threshold circuits (where nontrivial lower bounds remain open). We give an algorithm for evaluating an arbitrary symmetric function of 2 n o(1) ACC • THR circuits of size 2 n o(1) , on all possible inputs, in 2 n • poly(n) time. Several consequences are derived: • The number of satisfying assignments to an ACC • THR circuit of subexponential size can be computed in 2 n−n ε time (where ε > 0 depends on the depth and modulus of the circuit). • NEXP does not have quasi-polynomial size ACC • THR circuits, and NEXP does not have quasipolynomial size ACC• SYM circuits. Nontrivial size lower bounds were not known even for AND• OR • THR circuits. • Every 0-1 integer linear program with n Boolean variables and s linear constraints is solvable in 2 n−Ω(n/((logM)(log s) 5)) • poly(s, n, M) time with high probability, where M upper bounds the bit complexity of the coefficients. (For example, 0-1 integer programs with weights in [−2 poly(n) , 2 poly(n) ] and poly(n) constraints can be solved in 2 n−Ω(n/ log 6 n) time.) Impagliazzo, Paturi, and Schneider [IPS13] recently gave an algorithm forÕ(n) constraints; ours is the first asymptotic improvement over exhaustive search for for up to subexponentially many constraints. We also present an algorithm for evaluating depth-two linear threshold circuits (a.k.a., THR • THR) with exponential weights and 2 n/24 size on all 2 n input assignments, running in 2 n • poly(n) time. This is evidence that non-uniform lower bounds for THR • THR are within reach.
Proceedings of the forty-fifth annual ACM symposium on Theory of Computing, 2013
We study connections between Natural Proofs, derandomization, and the problem of proving "weak" c... more We study connections between Natural Proofs, derandomization, and the problem of proving "weak" circuit lower bounds such as NEXP ⊂ TC 0 , which are still wide open. Natural Proofs have three properties: they are constructive (an efficient algorithm A is embedded in them), have largeness (A accepts a large fraction of strings), and are useful (A rejects all strings which are truth tables of small circuits). Strong circuit lower bounds that are "naturalizing" would contradict present cryptographic understanding, yet the vast majority of known circuit lower bound proofs are naturalizing. So it is imperative to understand how to pursue unNatural Proofs. Some heuristic arguments say constructivity should be circumventable: largeness is inherent in many proof techniques, and it is probably our presently weak techniques that yield constructivity. We prove: • Constructivity is unavoidable, even for NEXP lower bounds. Informally, we prove for all "typical" non-uniform circuit classes C, NEXP ⊂ C if and only if there is a polynomial-time algorithm distinguishing some function from all functions computable by C-circuits. Hence NEXP ⊂ C is equivalent to exhibiting a constructive property useful against C. • There are no P-natural properties useful against C if and only if randomized exponential time can be "derandomized" using truth tables of circuits from C as random seeds. Therefore the task of proving there are no P-natural properties is inherently a derandomization problem, weaker than but implied by the existence of strong pseudorandom functions. These characterizations are applied to yield several new results, including improved ACC 0 lower bounds and new unconditional derandomizations. In general, we develop and apply several new connections between the existence of certain algorithms for analyzing truth tables, and the non-existence of small circuits for problems in large classes such as NEXP.
In circuit complexity, the polynomial method is a general approach to proving circuit lower bound... more In circuit complexity, the polynomial method is a general approach to proving circuit lower bounds in restricted settings. One shows that functions computed by sufficiently restricted circuits are "correlated" in some way with a low-complexity polynomial, where complexity may be measured by the degree of the polynomial or the number of monomials. Then, results limiting the capabilities of low-complexity polynomials are extended to the restricted circuits. Old theorems proved by this method have recently found interesting applications to the design of algorithms for basic problems in the theory of computing. This paper surveys some of these applications, and gives a few new ones.
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014
We study algorithms for the satisfiability problem for quantified Boolean formulas (QBFs), and co... more We study algorithms for the satisfiability problem for quantified Boolean formulas (QBFs), and consequences of faster algorithms for circuit complexity. • We show that satisfiability of quantified 3-CNFs with m clauses, n variables, and two quantifier blocks (one existential block and one universal) can be solved deterministically in time 2 n−Ω(√ n) • poly(m). For the case of multiple quantifier blocks (alternations), we show that satisfiability of quantified CNFs of size poly(n) on n variables with q quantifier blocks can be solved in 2 n−n 1/(q+1) • poly(n) time by a zero-error randomized algorithm. These are the first provable improvements over brute force search in the general case, even for quantified polynomial-sized CNFs with two quantifier blocks. A second zero-error randomized algorithm solves QBF on circuits of size s in 2 n−Ω(q) • poly(s) time when the number of quantifier blocks is q. • We complement these algorithms by showing that improvements on them would imply new circuit complexity lower bounds. For example, if satisfiability of quantified CNF formulas with n variables, poly(n) size and at most q quantifier blocks can be solved in time 2 n−n ωq (1/q) , then the complexity class NEXP does not have O(log n) depth circuits of polynomial size. Furthermore, solving satisfiability of quantified CNF formulas with n variables, poly(n) size and O(log n) quantifier blocks in time 2 n−ω(log(n)) time would imply the same circuit complexity lower bound. The proofs of these results proceed by establishing strong relationships between the time complexity of QBF satisfiability over CNF formulas and the time complexity of QBF satisfiability over arbitrary Boolean formulas.
I will discuss the recent proof that the complexity class NEXP (nondeterministic exponential time... more I will discuss the recent proof that the complexity class NEXP (nondeterministic exponential time) lacks nonuniform ACC circuits of polynomial size. The proof will be described from the perspective of someone trying to discover it.
We formally study two methods for data sanitation that have been used extensively in the database... more We formally study two methods for data sanitation that have been used extensively in the database community: k-anonymity and ℓ-diversity. We settle several open problems concerning the difficulty of applying these methods optimally, proving both positive and negative results:-2-anonymity is in P.-The problem of partitioning the edges of a triangle-free graph into 4-stars (degree-three vertices) is NP-hard. This yields an alternative proof that 3-anonymity is NP-hard even when the database attributes are all binary.-3-anonymity with only 27 attributes per record is MAX SNP-hard.-For databases with n rows, k-anonymity is in O(4 n • poly(n))) time for all k > 1.-For databases with ℓ attributes, alphabet size c, and n rows, k-Anonymity can be solved in 2 O(k 2 (2c) ℓ) + O(nℓ) time.-3-diversity with binary attributes is NP-hard, with one sensitive attribute.-2-diversity with binary attributes is NP-hard, with three sensitive attributes.
The algebraic framework introduced in [Koutis, Proc. of the 35 th ICALP 2008] reduces several com... more The algebraic framework introduced in [Koutis, Proc. of the 35 th ICALP 2008] reduces several combinatorial problems in parameterized complexity to the problem of detecting multilinear degree-k monomials in polynomials presented as circuits. The best known (randomized) algorithm for this problem requires only O * (2 k) time and oracle access to an arithmetic circuit, i.e. the ability to evaluate the circuit on elements from a suitable group algebra. This algorithm has been used to obtain the best known algorithms for several parameterized problems. In this paper we use communication complexity to show that the O * (2 k) algorithm is essentially optimal within this evaluation oracle framework. On the positive side, we give new applications of the method: finding a copy of a given tree on k nodes, a spanning tree with at least k leaves, a minimum set of nodes that dominate at least t nodes, and an m-dimensional k-matching. In each case we achieve a faster algorithm than what was known. We also apply the algebraic method to problems in exact counting. Among other results, we show that a combination of dynamic programming and a variation of the algebraic method can break the trivial upper bounds for exact parameterized counting in fairly general settings.
Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2004
The technique of k-anonymization has been proposed in the literature as an alternative way to rel... more The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are N P-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science - ITCS '15, 2015
We revisit a natural zero-sum game from several prior works. A circuit player, armed with a colle... more We revisit a natural zero-sum game from several prior works. A circuit player, armed with a collection of Boolean circuits, wants to compute a function f with one (or some) of its circuits. An input player has a collection of inputs, and wants to find one (or some) inputs on which the circuit player cannot compute f. Several results are known on the existence of small-support strategies for zero-sum games, in particular the above circuit-input game. We give two new applications of these classical results to circuit complexity: Natural properties useful against self-checking circuits are equivalent to circuit lower bounds. We show how the Natural Proofs barrier may be potentially sidestepped, by simply focusing on analyzing circuits that check their answers. Slightly more precisely, we prove NP ⊂ P/poly if and only if there are natural properties that (a) accept the SAT function and (b) are useful against polynomial-size circuits that never err when they report SAT. (Note, via self-reducibility, any small circuit can be turned into one of this kind!) The proof is very general; similar equivalences hold for other lower bound problems. Our message is that one should search for lower bound methods that are designed to succeed (only) against circuits with "one-sided error." Circuit Complexity versus Testing Circuits With Data. We reconsider the problem of program testing, which we formalize as deciding if a given circuit computes a (fixed) function f. We define the "data complexity" of f (as a function of circuit size s) to be the minimum cardinality of a test suite of inputs: a set of input/output pairs necessary and sufficient for deciding if any given circuit of size at most s computes a slice of f. (This is a "gray-box testing" problem, where the value s is side information.) We prove that designing small test suites for f is equivalent to proving circuit lower bounds on f : the data complexity of testing f is "small" if and only if the circuit complexity of f is "large." Therefore, circuit lower bounds may be constructively viewed as data design circuit-testing problems.
We present a new way to encode weighted sums into unweighted pairwise constraints, obtaining the ... more We present a new way to encode weighted sums into unweighted pairwise constraints, obtaining the following results. • Define the k-SUM problem to be: given n integers in [−n 2k , n 2k ] are there k which sum to zero? (It is well known that the same problem over arbitrary integers is equivalent to the above definition, by linear-time randomized reductions.) We prove that this definition of k-SUM remains W[1]-hard, and is in fact W[1]-complete: k-SUM can be reduced to f (k) • n o(1) instances of k-Clique. • The maximum node-weighted k-Clique and node-weighted k-dominating set problems can be reduced to n o(1) instances of the unweighted k-Clique and k-dominating set problems, respectively. This implies a strong equivalence between the time complexities of the node weighted problems and the unweighted problems: any polynomial improvement on one would imply an improvement for the other. • A triangle of weight 0 in a node weighted graph with m edges can be deterministically found in m 1.41 time.
Proceedings of the 4th conference on Innovations in Theoretical Computer Science, 2013
We consider a model of teaching in which the learners are consistent and have bounded state, but ... more We consider a model of teaching in which the learners are consistent and have bounded state, but are otherwise arbitrary. The teacher is non-interactive and "massively open": the teacher broadcasts a sequence of examples of an arbitrary target concept, intended for every possible on-line learning algorithm to learn from. We focus on the problem of designing interesting teachers: efficient sequences of examples that lead all capable and consistent learners to learn concepts, regardless of the underlying algorithm used by the learner. We use two measures of teaching efficiency: the number of mistakes made by the worst-case learner, and the maximum length of the example sequence needed for the worst-case learner. Our results are summarized as follows: • Given a uniform random sequence of examples of an nbit concept function, learners (capable of consistently learning the concept) with s(n) bits of state are guaranteed to make only O(n • s(n)) mistakes and exactly learn the concept, with high probability. This theorem has interesting corollaries; for instance, every concept c has a sequence of examples can teach c to all capable consistent on-line learners implementable with s(n)-size circuits, such that every learner makes onlyÕ(s(n) 2) mistakes. That is, all resourcebounded algorithms capable of consistently learning a concept can be simultaneously taught that concept with few mistakes, on a single example sequence. We also show how to efficiently generate such a sequence of examples on-line: using Nisan's pseudorandom generator, each example in the sequence can be generated with polynomial-time overhead per example, with an O(n • s(n))bit initial seed.
A fertile area of recent research has demonstrated concrete polynomial-time lower bounds for natu... more A fertile area of recent research has demonstrated concrete polynomial-time lower bounds for natural hard problems on restricted computational models. Among these problems are Satisfiability, Vertex Cover, Hamilton Path, MOD 6 -SAT, Majority-of-Majority-SAT, and Tautologies, to name a few. The proofs of these lower bounds follow a proof-by-contradiction strategy that we call resource trading or alternation trading . An important open problem is to determine how powerful such proofs can possibly be. We propose a methodology for studying these proofs that makes them amenable to both formal analysis and automated theorem proving. We prove that the search for better lower bounds can often be turned into a problem of solving a large series of linear programming instances. Implementing a small-scale theorem prover based on these results, we extract new human-readable time lower bounds for several problems and identify patterns that allow for further generalization. The framework can also ...
2009 24th Annual IEEE Conference on Computational Complexity, 2009
In 1982, Kannan showed that Σ P 2 does not have n k-sized circuits for any k. Do smaller classes ... more In 1982, Kannan showed that Σ P 2 does not have n k-sized circuits for any k. Do smaller classes also admit such circuit lower bounds? Despite several improvements of Kannan's result, we still cannot prove that P NP does not have linear size circuits. Work of Aaronson and Wigderson provides strong evidence-the "algebrization" barrier-that current techniques have inherent limitations in this respect. We explore questions about fixed-polynomial size circuit lower bounds around and beyond the algebrization barrier. We find several connections, including
2010 IEEE 25th Annual Conference on Computational Complexity, 2010
We consider two natural extensions of the communication complexity model that are inspired by dis... more We consider two natural extensions of the communication complexity model that are inspired by distributed computing. In both models, two parties are equipped with synchronized discrete clocks, and we assume that a bit can be sent from one party to another in one step of time. Both models allow implicit communication, by allowing the parties to choose whether to send a bit during each step. We examine trade-offs between time (total number of possible time steps elapsed) and communication (total number of bits actually sent). In the synchronized bit model, we measure the total number of bits sent between the two parties (e.g., email). We show that, in this model, communication costs can differ from the usual communication complexity by a factor roughly logarithmic in the number of time steps, and no more than such a factor. In the synchronized connection model, both parties choose whether or not to open their end of the communication channel at each time step. An exchange of bits takes place only when both ends of the channel are open (e.g., instant messaging), in which case we say that a connection has occurred. If a party does not open its end, it does not learn whether the other party opened its channel. When we restrict the number of time steps to be polynomial in the input length, and the number of connections to be polylogarithmic in the input length, the class of problems solved with this model turns out to be roughly equivalent to the communication complexity analogue of P N P ([BFS86]). Using our new model, we give what we believe to be the first lower bounds for this class, separating P N P from Σ2 ∩ Π2 in the communication complexity setting. Although these models are both quite natural, they have unexpected power, and lead to a refinement of problem classifications in communication complexity. This material is based on work supported by NSF grants CCF-0832797 and DMS-0835373. our models, it is natural to ask about trade-offs between the two. We consider two versions of synchronized communication complexity. In the first version of the model, called the synchronized bit model, we simply count all bits sent between the two parties. In the second version, called the synchronized connection model, a transfer of bits between parties takes place during a time step only when both parties try to send a bit during that step. (We do not allow participants to learn whether the other party was attempting communication if no communication occurs. Allowing such knowledge would also be natural, e.g., modeling telephone charge policies where an unanswered telephone ring is free. However, this would also be sufficiently powerful that communication would be essentially free; one player could code their input into the sequence of rings and silences.) That is, in the synchronized connection model, we count the number of successful "connections" between the parties. The two models, although related in spirit, are very different in nature. We show that the synchronized bit model is equivalent to normal communication complexity up to factors logarithmic in the time, but can help by such logarithmic factors. (In fact, in constant round protocols, it always saves such a factor.
2009 50th Annual IEEE Symposium on Foundations of Computer Science, 2009
We present new combinatorial algorithms for Boolean matrix multiplication (BMM) and preprocessing... more We present new combinatorial algorithms for Boolean matrix multiplication (BMM) and preprocessing a graph to answer independent set queries. We give the first asymptotic improvements on combinatorial algorithms for dense BMM in many years, improving on the "Four Russians" O(n 3 /(w log n)) bound for machine models with wordsize w. (For a pointer machine, we can set w = log n.) The algorithms utilize notions from Regularity Lemmas for graphs in a novel way. are sufficiently small. For more discussion on the (im)practicality of Strassen's algorithm and variants, cf. [34], [16], [2]. 2 We would like to give a definition of "combinatorial algorithm", but this appears elusive. Although the term has been used in many of the cited references, nothing in the literature resembles a definition. For the purposes of this paper, let us think of a "combinatorial algorithm" simply as one that does not call an oracle for ring matrix multiplication. 3 Historical Note: Similar work of Moon and Moser [37] from 1966 shows that the inverse of a matrix over GF (2) needs exactly Θ(n 2 / log n) row operations, providing an upper and lower bound. On a RAM, their algorithm runs in O(n 3 /(w log n)) time.
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014
In low-depth circuit complexity, the polynomial method is a way to prove lower bounds by translat... more In low-depth circuit complexity, the polynomial method is a way to prove lower bounds by translating weak circuits into low-degree polynomials, then analyzing properties of these polynomials. Recently, this method found an application to algorithm design: Williams (STOC 2014) used it to compute all-pairs shortest paths in n 3 /2 Ω(√ log n) time on dense n-node graphs. In this paper, we extend this methodology to solve a number of problems in combinatorial pattern matching and Boolean algebra, considerably faster than previously known methods. First, we give an algorithm for BOOLEAN ORTHOGONAL DETECTION, which is to detect among two sets A, B ⊆ {0, 1} d of size n if there is an x ∈ A and y ∈ B such that x, y = 0. For vectors of dimension d = c(n) log n, we solve BOOLEAN ORTHOGONAL DETECTION in n 2−1/O(log c(n)) time by a Monte Carlo randomized algorithm. We apply this as a subroutine in several other new algorithms:
2013 IEEE Conference on Computational Complexity, 2013
We explore relationships between circuit complexity, the complexity of generating circuits, and a... more We explore relationships between circuit complexity, the complexity of generating circuits, and algorithms for analyzing circuits. Our results can be divided into two parts: 1. Lower Bounds Against Medium-Uniform Circuits. Informally, a circuit class is "medium uniform" if it can be generated by an algorithmic process that is somewhat complex (stronger than LOGTIME) but not infeasible. Using a new kind of indirect diagonalization argument, we prove several new unconditional lower bounds against medium uniform circuit classes, including: • For all k, P is not contained in P-uniform SIZE(n k). That is, for all k there is a language L k ∈ P that does not have O(n k)-size circuits constructible in polynomial time. This improves Kannan's lower bound from 1982 that NP is not in P-uniform SIZE(n k) for any fixed k. • For all k, NP is not in P NP ||-uniform SIZE(n k). This also improves Kannan's theorem, but in a different way: the uniformity condition on the circuits is stronger than that on the language itself. • For all k, LOGSPACE does not have LOGSPACE-uniform branching programs of size n k. 2. Eliminating Non-Uniformity and (Non-Uniform) Circuit Lower Bounds. We complement these results by showing how to convert any potential simulation of LOGTIME-uniform NC 1 in ACC 0 /poly or TC 0 /poly into a medium-uniform simulation using small advice. This lemma can be used to simplify the proof that faster SAT algorithms imply NEXP circuit lower bounds, and leads to the following new connection: • Consider the following task: given a TC 0 circuit C of n O(1) size, output yes when C is unsatisfiable, and output no when C has at least 2 n−2 satisfying assignments. (Behavior on other inputs can be arbitrary.) Clearly, this problem can be solved efficiently using randomness. If this problem can be solved deterministically in 2 n−ω(log n) time, then NEXP ⊂ TC 0 /poly. The lemma can also be used to derandomize randomized TC 0 simulations of NC 1 on almost all inputs: • Suppose NC 1 ⊆ BPTC 0. Then for every ε > 0 and every language L in NC 1 , there is a (uniform) TC 0 circuit family of polynomial size recognizing a language L such that L and L differ on at most 2 n ε inputs of length n, for all n.
I will describe prior and current work on connecting the art of finding good satisfiability algor... more I will describe prior and current work on connecting the art of finding good satisfiability algorithms with the art of proving complexity lower bounds: proofs of limitations on what problems can be solved by good algorithms. Surprisingly, even minor algorithmic progress on solving the circuit satisfiability problem faster than exhaustive search can be applied to prove strong circuit complexity lower bounds. These connections have made it possible to prove new complexity lower bounds that had long been conjectured, and they suggest concrete directions for further progress.
2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 2015
We show how to compute any symmetric Boolean function on n variables over any field (as well as t... more We show how to compute any symmetric Boolean function on n variables over any field (as well as the integers) with a probabilistic polynomial of degree O(n log(1/ε)) and error at most ε. The degree dependence on n and ε is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution. This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let c(n) : N → N. Suppose we are given a database D of n vectors in {0, 1} c(n) logn and a collection of n query vectors Q in the same dimension. For all u ∈ Q, we wish to compute a v ∈ D with minimum Hamming distance from u. We solve this problem in n 2−1/O(c(n) log 2 c(n)) randomized time. Hence, the problem is in "truly subquadratic" time for O(log n) dimensions, and in subquadratic time for d = o((log 2 n)/(log log n) 2). We apply the algorithm to computing pairs with maximum inner product, closest pair in ℓ 1 for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.
Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014
We present a new randomized method for computing the min-plus product (a.k.a., tropical product) ... more We present a new randomized method for computing the min-plus product (a.k.a., tropical product) of two n × n matrices, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense n-node directed graphs with arbitrary edge weights. On the real RAM, where additions and comparisons of reals are unit cost (but all other operations have typical logarithmic cost), the algorithm runs in time n 3 2 Ω(log n) 1/2 and is correct with high probability. On the word RAM, the algorithm runs in n 3 /2 Ω(logn) 1/2 +n 2+o(1) log M time for edge weights in ([0, M] ∩ Z) ∪ {∞}. Prior algorithms took either O(n 3 / log c n) time for various c ≤ 2, or O(M α n β) time for various α > 0 and β > 2. The new algorithm applies a tool from circuit complexity, namely the Razborov-Smolensky polynomials for approximately representing AC 0 [p] circuits, to efficiently reduce a matrix product over the (min, +) algebra to a relatively small number of rectangular matrix products over F 2 , each of which are computable using a particularly efficient method due to Coppersmith. We also give a deterministic version of the algorithm running in n 3 /2 log δ n time for some δ > 0, which utilizes the Yao-Beigel-Tarui translation of AC 0 [m] circuits into "nice" depth-two circuits. * This is a preliminary version; comments are welcome.
Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014
Let ACC• THR be the class of constant-depth circuits comprised of AND, OR, and MODm gates (for so... more Let ACC• THR be the class of constant-depth circuits comprised of AND, OR, and MODm gates (for some constant m > 1), with a bottom layer of gates computing arbitrary linear threshold functions. This class of circuits can be seen as a "midpoint" between ACC (where we know nontrivial lower bounds) and depth-two linear threshold circuits (where nontrivial lower bounds remain open). We give an algorithm for evaluating an arbitrary symmetric function of 2 n o(1) ACC • THR circuits of size 2 n o(1) , on all possible inputs, in 2 n • poly(n) time. Several consequences are derived: • The number of satisfying assignments to an ACC • THR circuit of subexponential size can be computed in 2 n−n ε time (where ε > 0 depends on the depth and modulus of the circuit). • NEXP does not have quasi-polynomial size ACC • THR circuits, and NEXP does not have quasipolynomial size ACC• SYM circuits. Nontrivial size lower bounds were not known even for AND• OR • THR circuits. • Every 0-1 integer linear program with n Boolean variables and s linear constraints is solvable in 2 n−Ω(n/((logM)(log s) 5)) • poly(s, n, M) time with high probability, where M upper bounds the bit complexity of the coefficients. (For example, 0-1 integer programs with weights in [−2 poly(n) , 2 poly(n) ] and poly(n) constraints can be solved in 2 n−Ω(n/ log 6 n) time.) Impagliazzo, Paturi, and Schneider [IPS13] recently gave an algorithm forÕ(n) constraints; ours is the first asymptotic improvement over exhaustive search for for up to subexponentially many constraints. We also present an algorithm for evaluating depth-two linear threshold circuits (a.k.a., THR • THR) with exponential weights and 2 n/24 size on all 2 n input assignments, running in 2 n • poly(n) time. This is evidence that non-uniform lower bounds for THR • THR are within reach.
Proceedings of the forty-fifth annual ACM symposium on Theory of Computing, 2013
We study connections between Natural Proofs, derandomization, and the problem of proving "weak" c... more We study connections between Natural Proofs, derandomization, and the problem of proving "weak" circuit lower bounds such as NEXP ⊂ TC 0 , which are still wide open. Natural Proofs have three properties: they are constructive (an efficient algorithm A is embedded in them), have largeness (A accepts a large fraction of strings), and are useful (A rejects all strings which are truth tables of small circuits). Strong circuit lower bounds that are "naturalizing" would contradict present cryptographic understanding, yet the vast majority of known circuit lower bound proofs are naturalizing. So it is imperative to understand how to pursue unNatural Proofs. Some heuristic arguments say constructivity should be circumventable: largeness is inherent in many proof techniques, and it is probably our presently weak techniques that yield constructivity. We prove: • Constructivity is unavoidable, even for NEXP lower bounds. Informally, we prove for all "typical" non-uniform circuit classes C, NEXP ⊂ C if and only if there is a polynomial-time algorithm distinguishing some function from all functions computable by C-circuits. Hence NEXP ⊂ C is equivalent to exhibiting a constructive property useful against C. • There are no P-natural properties useful against C if and only if randomized exponential time can be "derandomized" using truth tables of circuits from C as random seeds. Therefore the task of proving there are no P-natural properties is inherently a derandomization problem, weaker than but implied by the existence of strong pseudorandom functions. These characterizations are applied to yield several new results, including improved ACC 0 lower bounds and new unconditional derandomizations. In general, we develop and apply several new connections between the existence of certain algorithms for analyzing truth tables, and the non-existence of small circuits for problems in large classes such as NEXP.
In circuit complexity, the polynomial method is a general approach to proving circuit lower bound... more In circuit complexity, the polynomial method is a general approach to proving circuit lower bounds in restricted settings. One shows that functions computed by sufficiently restricted circuits are "correlated" in some way with a low-complexity polynomial, where complexity may be measured by the degree of the polynomial or the number of monomials. Then, results limiting the capabilities of low-complexity polynomials are extended to the restricted circuits. Old theorems proved by this method have recently found interesting applications to the design of algorithms for basic problems in the theory of computing. This paper surveys some of these applications, and gives a few new ones.
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014
We study algorithms for the satisfiability problem for quantified Boolean formulas (QBFs), and co... more We study algorithms for the satisfiability problem for quantified Boolean formulas (QBFs), and consequences of faster algorithms for circuit complexity. • We show that satisfiability of quantified 3-CNFs with m clauses, n variables, and two quantifier blocks (one existential block and one universal) can be solved deterministically in time 2 n−Ω(√ n) • poly(m). For the case of multiple quantifier blocks (alternations), we show that satisfiability of quantified CNFs of size poly(n) on n variables with q quantifier blocks can be solved in 2 n−n 1/(q+1) • poly(n) time by a zero-error randomized algorithm. These are the first provable improvements over brute force search in the general case, even for quantified polynomial-sized CNFs with two quantifier blocks. A second zero-error randomized algorithm solves QBF on circuits of size s in 2 n−Ω(q) • poly(s) time when the number of quantifier blocks is q. • We complement these algorithms by showing that improvements on them would imply new circuit complexity lower bounds. For example, if satisfiability of quantified CNF formulas with n variables, poly(n) size and at most q quantifier blocks can be solved in time 2 n−n ωq (1/q) , then the complexity class NEXP does not have O(log n) depth circuits of polynomial size. Furthermore, solving satisfiability of quantified CNF formulas with n variables, poly(n) size and O(log n) quantifier blocks in time 2 n−ω(log(n)) time would imply the same circuit complexity lower bound. The proofs of these results proceed by establishing strong relationships between the time complexity of QBF satisfiability over CNF formulas and the time complexity of QBF satisfiability over arbitrary Boolean formulas.
I will discuss the recent proof that the complexity class NEXP (nondeterministic exponential time... more I will discuss the recent proof that the complexity class NEXP (nondeterministic exponential time) lacks nonuniform ACC circuits of polynomial size. The proof will be described from the perspective of someone trying to discover it.
We formally study two methods for data sanitation that have been used extensively in the database... more We formally study two methods for data sanitation that have been used extensively in the database community: k-anonymity and ℓ-diversity. We settle several open problems concerning the difficulty of applying these methods optimally, proving both positive and negative results:-2-anonymity is in P.-The problem of partitioning the edges of a triangle-free graph into 4-stars (degree-three vertices) is NP-hard. This yields an alternative proof that 3-anonymity is NP-hard even when the database attributes are all binary.-3-anonymity with only 27 attributes per record is MAX SNP-hard.-For databases with n rows, k-anonymity is in O(4 n • poly(n))) time for all k > 1.-For databases with ℓ attributes, alphabet size c, and n rows, k-Anonymity can be solved in 2 O(k 2 (2c) ℓ) + O(nℓ) time.-3-diversity with binary attributes is NP-hard, with one sensitive attribute.-2-diversity with binary attributes is NP-hard, with three sensitive attributes.
The algebraic framework introduced in [Koutis, Proc. of the 35 th ICALP 2008] reduces several com... more The algebraic framework introduced in [Koutis, Proc. of the 35 th ICALP 2008] reduces several combinatorial problems in parameterized complexity to the problem of detecting multilinear degree-k monomials in polynomials presented as circuits. The best known (randomized) algorithm for this problem requires only O * (2 k) time and oracle access to an arithmetic circuit, i.e. the ability to evaluate the circuit on elements from a suitable group algebra. This algorithm has been used to obtain the best known algorithms for several parameterized problems. In this paper we use communication complexity to show that the O * (2 k) algorithm is essentially optimal within this evaluation oracle framework. On the positive side, we give new applications of the method: finding a copy of a given tree on k nodes, a spanning tree with at least k leaves, a minimum set of nodes that dominate at least t nodes, and an m-dimensional k-matching. In each case we achieve a faster algorithm than what was known. We also apply the algebraic method to problems in exact counting. Among other results, we show that a combination of dynamic programming and a variation of the algebraic method can break the trivial upper bounds for exact parameterized counting in fairly general settings.
Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2004
The technique of k-anonymization has been proposed in the literature as an alternative way to rel... more The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are N P-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science - ITCS '15, 2015
We revisit a natural zero-sum game from several prior works. A circuit player, armed with a colle... more We revisit a natural zero-sum game from several prior works. A circuit player, armed with a collection of Boolean circuits, wants to compute a function f with one (or some) of its circuits. An input player has a collection of inputs, and wants to find one (or some) inputs on which the circuit player cannot compute f. Several results are known on the existence of small-support strategies for zero-sum games, in particular the above circuit-input game. We give two new applications of these classical results to circuit complexity: Natural properties useful against self-checking circuits are equivalent to circuit lower bounds. We show how the Natural Proofs barrier may be potentially sidestepped, by simply focusing on analyzing circuits that check their answers. Slightly more precisely, we prove NP ⊂ P/poly if and only if there are natural properties that (a) accept the SAT function and (b) are useful against polynomial-size circuits that never err when they report SAT. (Note, via self-reducibility, any small circuit can be turned into one of this kind!) The proof is very general; similar equivalences hold for other lower bound problems. Our message is that one should search for lower bound methods that are designed to succeed (only) against circuits with "one-sided error." Circuit Complexity versus Testing Circuits With Data. We reconsider the problem of program testing, which we formalize as deciding if a given circuit computes a (fixed) function f. We define the "data complexity" of f (as a function of circuit size s) to be the minimum cardinality of a test suite of inputs: a set of input/output pairs necessary and sufficient for deciding if any given circuit of size at most s computes a slice of f. (This is a "gray-box testing" problem, where the value s is side information.) We prove that designing small test suites for f is equivalent to proving circuit lower bounds on f : the data complexity of testing f is "small" if and only if the circuit complexity of f is "large." Therefore, circuit lower bounds may be constructively viewed as data design circuit-testing problems.
We present a new way to encode weighted sums into unweighted pairwise constraints, obtaining the ... more We present a new way to encode weighted sums into unweighted pairwise constraints, obtaining the following results. • Define the k-SUM problem to be: given n integers in [−n 2k , n 2k ] are there k which sum to zero? (It is well known that the same problem over arbitrary integers is equivalent to the above definition, by linear-time randomized reductions.) We prove that this definition of k-SUM remains W[1]-hard, and is in fact W[1]-complete: k-SUM can be reduced to f (k) • n o(1) instances of k-Clique. • The maximum node-weighted k-Clique and node-weighted k-dominating set problems can be reduced to n o(1) instances of the unweighted k-Clique and k-dominating set problems, respectively. This implies a strong equivalence between the time complexities of the node weighted problems and the unweighted problems: any polynomial improvement on one would imply an improvement for the other. • A triangle of weight 0 in a node weighted graph with m edges can be deterministically found in m 1.41 time.
Proceedings of the 4th conference on Innovations in Theoretical Computer Science, 2013
We consider a model of teaching in which the learners are consistent and have bounded state, but ... more We consider a model of teaching in which the learners are consistent and have bounded state, but are otherwise arbitrary. The teacher is non-interactive and "massively open": the teacher broadcasts a sequence of examples of an arbitrary target concept, intended for every possible on-line learning algorithm to learn from. We focus on the problem of designing interesting teachers: efficient sequences of examples that lead all capable and consistent learners to learn concepts, regardless of the underlying algorithm used by the learner. We use two measures of teaching efficiency: the number of mistakes made by the worst-case learner, and the maximum length of the example sequence needed for the worst-case learner. Our results are summarized as follows: • Given a uniform random sequence of examples of an nbit concept function, learners (capable of consistently learning the concept) with s(n) bits of state are guaranteed to make only O(n • s(n)) mistakes and exactly learn the concept, with high probability. This theorem has interesting corollaries; for instance, every concept c has a sequence of examples can teach c to all capable consistent on-line learners implementable with s(n)-size circuits, such that every learner makes onlyÕ(s(n) 2) mistakes. That is, all resourcebounded algorithms capable of consistently learning a concept can be simultaneously taught that concept with few mistakes, on a single example sequence. We also show how to efficiently generate such a sequence of examples on-line: using Nisan's pseudorandom generator, each example in the sequence can be generated with polynomial-time overhead per example, with an O(n • s(n))bit initial seed.
A fertile area of recent research has demonstrated concrete polynomial-time lower bounds for natu... more A fertile area of recent research has demonstrated concrete polynomial-time lower bounds for natural hard problems on restricted computational models. Among these problems are Satisfiability, Vertex Cover, Hamilton Path, MOD 6 -SAT, Majority-of-Majority-SAT, and Tautologies, to name a few. The proofs of these lower bounds follow a proof-by-contradiction strategy that we call resource trading or alternation trading . An important open problem is to determine how powerful such proofs can possibly be. We propose a methodology for studying these proofs that makes them amenable to both formal analysis and automated theorem proving. We prove that the search for better lower bounds can often be turned into a problem of solving a large series of linear programming instances. Implementing a small-scale theorem prover based on these results, we extract new human-readable time lower bounds for several problems and identify patterns that allow for further generalization. The framework can also ...
2009 24th Annual IEEE Conference on Computational Complexity, 2009
In 1982, Kannan showed that Σ P 2 does not have n k-sized circuits for any k. Do smaller classes ... more In 1982, Kannan showed that Σ P 2 does not have n k-sized circuits for any k. Do smaller classes also admit such circuit lower bounds? Despite several improvements of Kannan's result, we still cannot prove that P NP does not have linear size circuits. Work of Aaronson and Wigderson provides strong evidence-the "algebrization" barrier-that current techniques have inherent limitations in this respect. We explore questions about fixed-polynomial size circuit lower bounds around and beyond the algebrization barrier. We find several connections, including
2010 IEEE 25th Annual Conference on Computational Complexity, 2010
We consider two natural extensions of the communication complexity model that are inspired by dis... more We consider two natural extensions of the communication complexity model that are inspired by distributed computing. In both models, two parties are equipped with synchronized discrete clocks, and we assume that a bit can be sent from one party to another in one step of time. Both models allow implicit communication, by allowing the parties to choose whether to send a bit during each step. We examine trade-offs between time (total number of possible time steps elapsed) and communication (total number of bits actually sent). In the synchronized bit model, we measure the total number of bits sent between the two parties (e.g., email). We show that, in this model, communication costs can differ from the usual communication complexity by a factor roughly logarithmic in the number of time steps, and no more than such a factor. In the synchronized connection model, both parties choose whether or not to open their end of the communication channel at each time step. An exchange of bits takes place only when both ends of the channel are open (e.g., instant messaging), in which case we say that a connection has occurred. If a party does not open its end, it does not learn whether the other party opened its channel. When we restrict the number of time steps to be polynomial in the input length, and the number of connections to be polylogarithmic in the input length, the class of problems solved with this model turns out to be roughly equivalent to the communication complexity analogue of P N P ([BFS86]). Using our new model, we give what we believe to be the first lower bounds for this class, separating P N P from Σ2 ∩ Π2 in the communication complexity setting. Although these models are both quite natural, they have unexpected power, and lead to a refinement of problem classifications in communication complexity. This material is based on work supported by NSF grants CCF-0832797 and DMS-0835373. our models, it is natural to ask about trade-offs between the two. We consider two versions of synchronized communication complexity. In the first version of the model, called the synchronized bit model, we simply count all bits sent between the two parties. In the second version, called the synchronized connection model, a transfer of bits between parties takes place during a time step only when both parties try to send a bit during that step. (We do not allow participants to learn whether the other party was attempting communication if no communication occurs. Allowing such knowledge would also be natural, e.g., modeling telephone charge policies where an unanswered telephone ring is free. However, this would also be sufficiently powerful that communication would be essentially free; one player could code their input into the sequence of rings and silences.) That is, in the synchronized connection model, we count the number of successful "connections" between the parties. The two models, although related in spirit, are very different in nature. We show that the synchronized bit model is equivalent to normal communication complexity up to factors logarithmic in the time, but can help by such logarithmic factors. (In fact, in constant round protocols, it always saves such a factor.
2009 50th Annual IEEE Symposium on Foundations of Computer Science, 2009
We present new combinatorial algorithms for Boolean matrix multiplication (BMM) and preprocessing... more We present new combinatorial algorithms for Boolean matrix multiplication (BMM) and preprocessing a graph to answer independent set queries. We give the first asymptotic improvements on combinatorial algorithms for dense BMM in many years, improving on the "Four Russians" O(n 3 /(w log n)) bound for machine models with wordsize w. (For a pointer machine, we can set w = log n.) The algorithms utilize notions from Regularity Lemmas for graphs in a novel way. are sufficiently small. For more discussion on the (im)practicality of Strassen's algorithm and variants, cf. [34], [16], [2]. 2 We would like to give a definition of "combinatorial algorithm", but this appears elusive. Although the term has been used in many of the cited references, nothing in the literature resembles a definition. For the purposes of this paper, let us think of a "combinatorial algorithm" simply as one that does not call an oracle for ring matrix multiplication. 3 Historical Note: Similar work of Moon and Moser [37] from 1966 shows that the inverse of a matrix over GF (2) needs exactly Θ(n 2 / log n) row operations, providing an upper and lower bound. On a RAM, their algorithm runs in O(n 3 /(w log n)) time.
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014
In low-depth circuit complexity, the polynomial method is a way to prove lower bounds by translat... more In low-depth circuit complexity, the polynomial method is a way to prove lower bounds by translating weak circuits into low-degree polynomials, then analyzing properties of these polynomials. Recently, this method found an application to algorithm design: Williams (STOC 2014) used it to compute all-pairs shortest paths in n 3 /2 Ω(√ log n) time on dense n-node graphs. In this paper, we extend this methodology to solve a number of problems in combinatorial pattern matching and Boolean algebra, considerably faster than previously known methods. First, we give an algorithm for BOOLEAN ORTHOGONAL DETECTION, which is to detect among two sets A, B ⊆ {0, 1} d of size n if there is an x ∈ A and y ∈ B such that x, y = 0. For vectors of dimension d = c(n) log n, we solve BOOLEAN ORTHOGONAL DETECTION in n 2−1/O(log c(n)) time by a Monte Carlo randomized algorithm. We apply this as a subroutine in several other new algorithms:
2013 IEEE Conference on Computational Complexity, 2013
We explore relationships between circuit complexity, the complexity of generating circuits, and a... more We explore relationships between circuit complexity, the complexity of generating circuits, and algorithms for analyzing circuits. Our results can be divided into two parts: 1. Lower Bounds Against Medium-Uniform Circuits. Informally, a circuit class is "medium uniform" if it can be generated by an algorithmic process that is somewhat complex (stronger than LOGTIME) but not infeasible. Using a new kind of indirect diagonalization argument, we prove several new unconditional lower bounds against medium uniform circuit classes, including: • For all k, P is not contained in P-uniform SIZE(n k). That is, for all k there is a language L k ∈ P that does not have O(n k)-size circuits constructible in polynomial time. This improves Kannan's lower bound from 1982 that NP is not in P-uniform SIZE(n k) for any fixed k. • For all k, NP is not in P NP ||-uniform SIZE(n k). This also improves Kannan's theorem, but in a different way: the uniformity condition on the circuits is stronger than that on the language itself. • For all k, LOGSPACE does not have LOGSPACE-uniform branching programs of size n k. 2. Eliminating Non-Uniformity and (Non-Uniform) Circuit Lower Bounds. We complement these results by showing how to convert any potential simulation of LOGTIME-uniform NC 1 in ACC 0 /poly or TC 0 /poly into a medium-uniform simulation using small advice. This lemma can be used to simplify the proof that faster SAT algorithms imply NEXP circuit lower bounds, and leads to the following new connection: • Consider the following task: given a TC 0 circuit C of n O(1) size, output yes when C is unsatisfiable, and output no when C has at least 2 n−2 satisfying assignments. (Behavior on other inputs can be arbitrary.) Clearly, this problem can be solved efficiently using randomness. If this problem can be solved deterministically in 2 n−ω(log n) time, then NEXP ⊂ TC 0 /poly. The lemma can also be used to derandomize randomized TC 0 simulations of NC 1 on almost all inputs: • Suppose NC 1 ⊆ BPTC 0. Then for every ε > 0 and every language L in NC 1 , there is a (uniform) TC 0 circuit family of polynomial size recognizing a language L such that L and L differ on at most 2 n ε inputs of length n, for all n.
I will describe prior and current work on connecting the art of finding good satisfiability algor... more I will describe prior and current work on connecting the art of finding good satisfiability algorithms with the art of proving complexity lower bounds: proofs of limitations on what problems can be solved by good algorithms. Surprisingly, even minor algorithmic progress on solving the circuit satisfiability problem faster than exhaustive search can be applied to prove strong circuit complexity lower bounds. These connections have made it possible to prove new complexity lower bounds that had long been conjectured, and they suggest concrete directions for further progress.
Uploads
Papers by Ryan Williams