Minimizing Impurity Partition Under Constraints

thinh nguyen

Minimizing Impurity Partition Under Constraints

thinh nguyen

2019, ArXiv

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Set partitioning is a key component of many algorithms in machine learning, signal processing, and communications. In general, the problem of finding a partition that minimizes a given impurity (loss function) is NP-hard. As such, there exists a wealth of literature on approximate algorithms and theoretical analyses of the partitioning problem under different settings. In this paper, we formulate and solve a variant of the partition problem called the minimum impurity partition under constraint (MIPUC). MIPUC finds an optimal partition that minimizes a given loss function under a given concave constraint. MIPUC generalizes the recently proposed deterministic information bottleneck problem which finds an optimal partition that maximizes the mutual information between the input and partition output while minimizing the partition output entropy. Our proposed algorithm is developed based on a novel optimality condition, which allows us to find a locally optimal solution efficiently. Mor...

thinh Nguyen

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Partitioning algorithms play a key role in machine learning, signal processing, and communications. They are used in many well-known NP-hard problems such as k-means clustering and vector quantization. The goodness of a partition scheme is measured by a given impurity function over the resulted partitions. The optimal partition is one(s) with the minimum impurity. Practical algorithms for finding an optimal partitioning are approximate, heuristic, and often assume certain properties of the given impurity function such as concavity/convexity. In this paper, we propose a heuristic, efficient (linear time) algorithm for finding the minimum impurity for a broader class of impurity functions which includes popular impurities such as Gini index and entropy. We also make a connection to a well-known result which states that the optimal partitions correspond to the regions separated by hyperplane cuts in the probability space of the posterior distribution.

Log In

Minimizing Impurity Partition Under Constraints

Sign up for access to the world's latest research

Abstract

Related papers

Related topics