Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019, ArXiv
Set partitioning is a key component of many algorithms in machine learning, signal processing, and communications. In general, the problem of finding a partition that minimizes a given impurity (loss function) is NP-hard. As such, there exists a wealth of literature on approximate algorithms and theoretical analyses of the partitioning problem under different settings. In this paper, we formulate and solve a variant of the partition problem called the minimum impurity partition under constraint (MIPUC). MIPUC finds an optimal partition that minimizes a given loss function under a given concave constraint. MIPUC generalizes the recently proposed deterministic information bottleneck problem which finds an optimal partition that maximizes the mutual information between the input and partition output while minimizing the partition output entropy. Our proposed algorithm is developed based on a novel optimality condition, which allows us to find a locally optimal solution efficiently. Mor...
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Partitioning algorithms play a key role in machine learning, signal processing, and communications. They are used in many well-known NP-hard problems such as k-means clustering and vector quantization. The goodness of a partition scheme is measured by a given impurity function over the resulted partitions. The optimal partition is one(s) with the minimum impurity. Practical algorithms for finding an optimal partitioning are approximate, heuristic, and often assume certain properties of the given impurity function such as concavity/convexity. In this paper, we propose a heuristic, efficient (linear time) algorithm for finding the minimum impurity for a broader class of impurity functions which includes popular impurities such as Gini index and entropy. We also make a connection to a well-known result which states that the optimal partitions correspond to the regions separated by hyperplane cuts in the probability space of the posterior distribution.
2012 IEEE Information Theory Workshop, 2012
We investigate certain optimization problems for Shannon information measures, namely, minimization of joint and conditional entropies H(X, Y), H(X|Y), H(Y |X), and maximization of mutual information I(X; Y), over convex regions. When restricted to the so-called transportation polytopes (sets of distributions with fixed marginals), very simple proofs of NP-hardness are obtained for these problems because in that case they are all equivalent, and their connection to the well-known SUBSET SUM and PARTITION problems is revealed. The computational intractability of the more general problems over arbitrary polytopes is then a simple consequence. Further, a simple class of polytopes is shown over which the above problems are not equivalent and their complexity differs sharply, namely, minimization of H(X, Y) and H(Y |X) is trivial, while minimization of H(X|Y) and maximization of I(X; Y) are strongly NP-hard problems. Finally, two new (pseudo)metrics on the space of discrete probability distributions are introduced, based on the so-called variation of information quantity, and NP-hardness of their computation is shown.
2019
We view the Information Bottleneck Principle (IBP: Tishby et al., 1999; Schwartz-Ziv and Tishby, 2017) and Predictive Information Bottleneck Principle (PIBP: Still et al., 2007; Alemi, 2019) as special cases of a family of general information bottleneck objectives (IBOs). Each IBO corresponds to a particular constrained optimization problem where the constraints apply to: (a) the mutual information between the training data and the learned model parameters or extracted representation of the data, and (b) the mutual information between the learned model parameters or extracted representation of the data and the test data (if any). The heuristics behind the IBP and PIBP are shown to yield different constraints in the corresponding constrained optimization problem formulations. We show how other heuristics lead to a new IBO, different from both the IBP and PIBP, and use the techniques from (Alemi, 2019) to derive and optimize a variational upper bound on the new IBO. We then apply the ...
Entropy, 2012
Information Bottleneck-based methods use mutual information as a distortion function in order to extract relevant details about the structure of a complex system by compression. One of the approaches used to generate optimal compressed representations is by annealing a parameter. In this manuscript we present a common framework for the study of annealing in information distortion problems. We identify features that should be common to any annealing optimization problem. The main mathematical tools that we use come from the analysis of dynamical systems in the presence of symmetry (equivariant bifurcation theory). Through the compression problem, we make connections to the world of combinatorial optimization and pattern recognition. The two approaches use very different vocabularies and consider different problems to be "interesting". We provide an initial link, through the Normalized Cut Problem, where the two disciplines can exchange tools and ideas.
ArXiv, 2020
Given an original discrete source X with the distribution p_X that is corrupted by noise to produce the noisy data Y with the given joint distribution p(X, Y). A quantizer/classifier Q : Y -> Z is then used to classify/quantize the data Y to the discrete partitioned output Z with probability distribution p_Z. Next, Z is transmitted over a deterministic channel with a given channel matrix A that produces the final discrete output T. One wants to design the optimal quantizer/classifier Q^* such that the cost function F(X; T) between the input X and the final output T is minimized while the probability of the partitioned output Z satisfies a concave constraint G(p_Z) < C. Our results generalized some famous previous results. First, an iteration linear time complexity algorithm is proposed to find the local optimal quantizer. Second, we show that the optimal partition should produce a hard partition that is equivalent to the cuts by hyper-planes in the probability space of the pos...
PLOS ONE
In analysis of multi-component complex systems, such as neural systems, identifying groups of units that share similar functionality will aid understanding of the underlying structures of the system. To find such a grouping, it is useful to evaluate to what extent the units of the system are separable. Separability or inseparability can be evaluated by quantifying how much information would be lost if the system were partitioned into subsystems, and the interactions between the subsystems were hypothetically removed. A system of two independent subsystems are completely separable without any loss of information while a system of strongly interacted subsystems cannot be separated without a large loss of information. Among all the possible partitions of a system, the partition that minimizes the loss of information, called the Minimum Information Partition (MIP), can be considered as the optimal partition for characterizing the underlying structures of the system. Although the MIP would reveal novel characteristics of the neural system, an exhaustive search for the MIP is numerically intractable due to the combinatorial explosion of possible partitions. Here, we propose a computationally efficient search to precisely identify the MIP among all possible partitions by exploiting the submodularity of the measure of information loss, when the measure of information loss is submodular. Submodularity is a mathematical property of set functions which is analogous to convexity in continuous functions. Mutual information is one such submodular information loss function, and is a natural choice for measuring the degree of statistical dependence between paired sets of random variables. By using mutual information as a loss function, we show that the search for MIP can be performed in a practical order of computational time for a reasonably large system (N = 100 ∼ 1000). We also demonstrate that MIP search allows for the detection of underlying global structures in a network of nonlinear oscillators.
IEEE Transactions on Information Theory, 2000
Electronic Notes in Theoretical Computer Science, 2019
The Gini impurity is a very popular criterion to select attributes during decision trees construction. In the problem of finding a partition with minimum weighted Gini impurity (PMWGP), the one faced during the construction of decision trees, a set of vectors must be partitioned into k different clusters such that the partition's overall Gini impurity is minimized. We show that PMWGP is APX-hard for arbitrary k and admits a randomized PTAS when the number of clusters is fixed. These results significantly improve the current knowledge on the problem. The key idea to obtain these results is to explore connections between PMWGP and the geometric k-means clustering problem.
IEEE Transactions on Information Theory, 2012
A new histogram-based mutual information estimator using data-driven tree-structured partitions (TSP) is presented in this paper. The derived TSP is a solution to a complexity regularized empirical information maximization, with the objective of finding a good tradeoff between the known estimation and approximation errors. A distribution-free concentration inequality for this tree-structured learning problem as well as finite sample performance bounds for the proposed histogram-based solution is derived. It is shown that this solution is density-free strongly consistent and that it provides, with an arbitrary high probability, an optimal balance between the mentioned estimation and approximation errors. Finally, for the emblematic scenario of independence, I(X; Y) = 0, it is shown that the TSP estimate converges to zero with O(e 0n +log log n).
2021 IEEE International Symposium on Information Theory (ISIT)
We generalize the information bottleneck (IB) and privacy funnel (PF) problems by introducing the notion of a sensitive attribute, which arises in a growing number of applications. In this generalization, we seek to construct representations of observations that are maximally (or minimally) informative about a target variable, while also satisfying constraints with respect to a variable corresponding to the sensitive attribute. In the Gaussian and discrete settings, we show that by suitably approximating the Kullback-Liebler (KL) divergence defining traditional Shannon mutual information, the generalized IB and PF problems can be formulated as semi-definite programs (SDPs), and thus efficiently solved, which is important in applications of high-dimensional inference. We validate our algorithms on synthetic data and demonstrate their use in imposing fairness in machine learning on real data as an illustrative application.
2017 International ITG Conference on Systems, Communications and Coding (SCC), 2017
Lossy data compression has been studied under the celebrated Rate-Distortion theory which provides the compression rate in order to quantize a signal without exceeding a given distortion measure. Recently, with information bottleneck an alternative approach has been emerged in the field of machine learning. The fundamental idea is to include the original source into the problem setup when quantizing an observation variable and to use strictly information theoretic measures to design the quantizer. This paper yields an insight to this framework, discusses corresponding algorithms and their performance, and provides a new algorithmic approach of low complexity.
2010
We offer a fresh perspective on solving the set covering problem to near optimality with off-the-shelf methods. We formulate minimizing the gap of a generic primal-dual heuristic for the set covering problem as an integer program and analyze its performance. The empirical insights from this analysis lead to a simple and powerful primal-dual approach for solving the set covering problem to near optimality with a state-of-the-art standard solver.
Entropy
The information bottleneck (IB) framework, proposed in [...]
ArXiv, 2017
The privacy-utility tradeoff problem is formulated as determining the privacy mechanism (random mapping) that minimizes the mutual information (a metric for privacy leakage) between the private features of the original dataset and a released version. The minimization is studied with two types of constraints on the distortion between the public features and the released version of the dataset: (i) subject to a constraint on the expected value of a cost function $f$ applied to the distortion, and (ii) subject to bounding the complementary CDF of the distortion by a non-increasing function $g$. The first scenario captures various practical cost functions for distorted released data, while the second scenario covers large deviation constraints on utility. The asymptotic optimal leakage is derived in both scenarios. For the distortion cost constraint, it is shown that for convex cost functions there is no asymptotic loss in using stationary memoryless mechanisms. For the complementary CD...
Journal of Mathematical Chemistry, 2005
A methodology, derived by analogy to Shannon's information-theoretic theory of communication and utilizing the concept of mutual information, has been developed to characterize partitioned property spaces. A family of non-intersecting subsets that cover the "universe" of objects represents a partitioned property space. Each subset is thus an equivalence class. A partition and it's associated equivalence classes can be generated using any one of a number of procedures including hierarchical and non-hierarchical clustering, direct approaches using rough set methods, and cell-based partitioning, to name a few. Thus, partitioned property spaces arise in many instances and represent a very large class of problems. The approach is based on set-valued mappings from equivalence classes in one partition to those in another and provides a coarse-grained means for comparing property spaces. From these mappings it is possible to compute a number of Shannon entropies that afford calculation of mutual information, which represents that amount of information shared by two partitions of a set of objects. Taking the ratio of the mutual information with the maximum possible mutual information yields a quantity that measures the similarity of the two partitions. While the focus in this work is directed towards small sets of objects the approach can be extended to many more classes of problems that can be put into a similar form, which includes many types of cheminformatic and biological problems. A number of scenarios are presented that illustrate the concept and indicate the broader class of problems that can be handled by this method.
Given a set S with real-valued members, associated with each member one of two possible types; a multi-partitioning of S is a sequence of the members of S such that if x, y ∈ S have different types and x < y, x precedes y in the multi-partitioning of S. We give two distribution-sensitive algorithms for the set multi-partitioning problem and a matching lower bound in the algebraic decision-tree model. One of the two algorithms can be made stable and can be implemented in place. We also give an output-sensitive algorithm for the problem.
This paper considers the arbitrary-proportional finite-set-partitioning problem which involves partitioning a finite set of size into subsets with respect to arbitrary nonnegative proportions , 1,2, . . , where and are positive integers. This is the core art of many fundamental problems such as determining quotas for different individuals of different weights or sampling from a discrete-valued weighted sample set to get a new identically distributed but non-weighted sample set (e.g. the resampling needed in the particle filter). The challenge raises as the size of each subset, denoted as , must be an integer while the unbiased expectation is often not, given that ∑ , ∑ 1. To solve this problem, a metric (cost function) is defined on their discrepancies and correspondingly a solution is proposed to determine the sizes of each subsets, gaining the minimal cost. Theoretical proof and simulation demonstrations are provided to demonstrate the optimality of the scheme in the sense of the proposed metric.
Journal of Physics A: Mathematical and Theoretical
The partial information decomposition (PID) is perhaps the leading proposal for resolving information shared between a set of sources and a target into redundant, synergistic, and unique constituents. Unfortunately, the PID framework has been hindered by a lack of a generally agreed-upon, multivariate method of quantifying the constituents. Here, we take a step toward rectifying this by developing a decomposition based on a new method that quantifies unique information. We first develop a broadly applicable method-the dependency decomposition-that delineates how statistical dependencies influence the structure of a joint distribution. The dependency decomposition then allows us to define a measure of the information about a target that can be uniquely attributed to a particular source as the least amount which the source-target statistical dependency can influence the information shared between the sources and the target. The result is the first measure that satisfies the core axioms of the PID framework while not satisfying the Blackwell relation, which depends on a particular interpretation of how the variables are related. This makes a key step forward to a practical PID.
Physical Review E, 2020
Complex systems often exhibit multiple levels of organization covering a wide range of physical scales, so the study of the hierarchical decomposition of their structure and function is frequently convenient. To better understand this phenomenon, we introduce a generalization of information theory that works with hierarchical partitions. We begin revisiting the recently introduced Hierarchical Mutual Information (HMI), and show that it can be written as a level by level summation of classical conditional mutual information terms. Then, we prove that the HMI is bounded from above by the corresponding hierarchical joint entropy. In this way, in analogy to the classical case, we derive hierarchical generalizations of many other classical information-theoretic quantities. In particular, we prove that, as opposed to its classical counterpart, the hierarchical generalization of the Variation of Information is not a metric distance, but it admits a transformation into one. Moreover, focusing on potential applications of the existing developments of the theory, we show how to adjust by chance the HMI. We also corroborate and analyze all the presented theoretical results with exhaustive numerical computations, and include an illustrative application example of the introduced formalism. Finally, we mention some open problems that should be eventually addressed for the proposed generalization of information theory to reach maturity.
For n ∈ N, we consider the problem of partitioning the interval [0, n) into k subintervals of positive integer lengths 1 , . . . , k such that the lengths satisfy a set of simple constraints of the form i ij j where ij is one of <, >, or =. In the full information case, ij is given for all 1 i, j k. In the sequential information case, ij is given for all 1 < i < k and j = i ± 1. That is, only the relations between the lengths of consecutive intervals are specified. The cyclic information case is an extension of the sequential information case in which the relationship 1k between 1 and k is also given. We show that all three versions of the problem can be solved in time polynomial in k and log n.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.