Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
Conventional clustering algorithms utilize a single criterion that may not conform to the diverse shapes of the underlying clusters. We offer a new clustering approach that uses multiple clustering objective functions simultaneously. The proposed multiobjective clustering is a two-step process. It includes detection of clusters by a set of candidate objective functions as well as their integration into the target partition. A key ingredient of the approach is a cluster goodness function that evaluates the utility of multiple clusters using re-sampling techniques. Multiobjective data clustering is obtained as a solution to a discrete optimization problem in the space of clusters. At meta-level, our algorithm incorporates conflict resolution techniques along with the natural data constraints. An empirical study on a number of artificial and real-world data sets demonstrates that multiobjective data clustering leads to valid and robust data partitions.
Memetic Computing, 2015
Clustering is an unsupervised classification method in the field of data mining. Many population based evolutionary and swarm intelligence optimization methods are proposed to optimize clustering solutions globally based on a single selected objective function which lead to produce a single best solution. In this sense, optimized solution is biased towards a single objective, hence it is not equally well to the data set having clusters of different geometrical properties. Thus, clustering having multiple objectives should be naturally optimized through multiobjective optimization methods for capturing different properties of the data set. To achieve this clustering goal, many multiobjective population based optimization methods, e.g., multiobjective genetic algorithm, mutiobjective particle swarm optimization (MOPSO), are proposed to obtain diverse tradeoff solutions in the pareto-front. As single directional diversity mechanism in particle swarm optimization converges prematurely to local optima, this paper presents a two-stage diversity mechanism in MOPSO to improve its exploratory capabilities by incorporating crossover operator of the genetic algorithm. External archive is used to store non-dominated solutions, which is further utilized to find one best solution having highest F-measure value at the end of the run. Two conceptually orthogonal internal measures SSE and connectedness are used to estimate the clustering quality. Results demonstrate effectiveness of the proposed method over its competitors MOPSO, non-dominated sorting genetic algo
Applied Soft Computing, 2013
In this paper a new multiobjective (MO) clustering technique (GenClustMOO) is proposed which can automatically partition the data into an appropriate number of clusters. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local subclusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Three objective functions, one reflecting the total compactness of the partitioning based on the Euclidean distance, the other reflecting the total symmetry of the clusters, and the last reflecting the cluster connectedness, are considered here. These are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method, in order to detect the appropriate number of clusters as well as the appropriate partitioning. The symmetry present in a partitioning is measured using a newly developed point symmetry based distance. Connectedness present in a partitioning is measured using the relative neighborhood graph concept. Since AMOSA, as well as any other MO optimization technique, provides a set of Pareto-optimal solutions, a new method is also developed to determine a single solution from this set. Thus the proposed GenClustMOO is able to detect the appropriate number of clusters and the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. The effectiveness of the proposed GenClustMOO in comparison with another recent multiobjective clustering technique (MOCK), a single objective genetic algorithm based automatic clustering technique (VGAPS-clustering), K-means and single linkage clustering techniques is comprehensively demonstrated for nineteen artificial and seven real-life data sets of varying complexities. In a part of the experiment the effectiveness of AMOSA as the underlying optimization technique in GenClustMOO is also demonstrated in comparison to another evolutionary MO algorithm, PESA2.
2018
The task of clustering is to group the data items that are similar into different clusters in such a way that the similarity within each cluster is high and the dissimilarity between the clusters is also high. A novel partitional clustering algorithm called HB K-Means algorithm (High Dimensional Bisecting K-Means) based on high dimensional data set was developed in our previous work. In order to improve this novel algorithm, constraints such as Stability based measure and Mean Square Error (MSE) were incorporated resulting in CHB K-Means (Constraint Based HB K-Means) algorithm. In addition to these constraints, cluster compactness and density are also important to obtain better clustering results. In this paper, it is proposed to develop a MultiObjective Optimization (MOO) technique by including different indices such as DB-Index, XB-Index and Sym-Index. These three indices will be used as fitness function for the proposed Fractional Genetic PSO algorithm (FGPSO) which is the hybrid...
Neural Computing and Applications, 2016
The multi-objective clustering with automatic determination of the number of clusters (MOCK) approach is improved in this work by means of an empirical comparison of three multi-objective evolutionary algorithms added to MOCK instead of the original algorithm used in such approach. The results of two different experiments using seven real data sets from UCI repository are reported: (1) using two multi-objective optimization performance metrics (hypervolume and two-set coverage) and (2) using the F-measure to evaluate the clustering quality. The results are compared against the original version of MOCK and also against other algorithms representative of the state-of-the-art. Such results indicate that the new versions are highly competitive and capable to deal with different types of data sets.
Artificial Intelligence Research, 2017
Multiobjective clustering techniques have been used to simultaneously consider several complementary aspects of clustering quality. They optimize two or more cluster validity indices simultaneously, they lead to high-quality results, and have emerged as attractive and robust alternatives for solving clustering problems. This paper provides a brief review of bio-Inspired multiobjective clustering, and proposes a bee-inspired multiobjective optimization (MOO) algorithm, named cOptBees-MO, to solve multiobjective data clustering problems. In its survey part, a brief tutorial on MOO and multiobjective clustering optimization (MOCO) is presented, followed by a review of the main works in the area. Particular attention is given to the many objective functions used in MOCO. To evaluate the performance of the algorithm it was executed for various datasets and the results presented high quality clusters, diverse solutions an the automatic determination of a suitable number of clusters.
Proceedings of the 2005 SIAM International Conference on Data Mining, 2005
This paper investigates validity analysis of alternative clustering results obtained using the algorithm named Multiobjective K-Means Genetic Algorithm (MOKGA). The reported results are promising. MOKGA gives the optimal number of clusters as a solution set. The achieved clustering results are then analyzed and validated under several cluster validity techniques proposed in the literature. The optimal clusters are ranked for each validity index. The approach is tested by conducting experiments using three well-known data sets. The obtained results for each dataset are compared with those reported in the literature to demonstrate the applicability and effectiveness of the proposed approach.
Machine Learning, 2013
Supervised alternative clustering is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with trade-offs. Most approaches in the literature optimize these objectives sequentially (one objective after another one) or indirectly (by some heuristic combination of the objectives). Solving a multi-objective optimization problem in these ways can result in solutions which are dominated, and not Pareto-optimal. We develop a direct algorithm, called COGNAC, which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. COGNAC performs the recombination operator at the cluster level instead of at the object level, as in the traditional genetic algorithms. It can accept arbitrary clustering quality and dissimilarity objectives and provides solutions dominating those obtained by other stateof-the-art algorithms. Based on COGNAC, we propose another algorithm called SGAC for the sequential generation of alternative clusterings where each newly found alternative clustering is guaranteed to be different from all previous ones. The experimental results on widely used benchmarks demonstrate the advantages of our approach.
IET Image Processing, 2012
We develop a metaheuristic procedure for multiobjective clustering problems. Our goal is to find good approximations of the efficient frontier for this class of problems and provide a means for improving decision making in multiple areas of application and in particular those related to marketing. The procedure is based on the tabu and scatter search methodologies. Clustering problems have been the subject of numerous studies; however, most of the work has focused on single-objective problems. Clustering using multiple criteria and/or multiple data sources has received limited attention in the OR and marketing literature. Our procedure is general and tackles several problems classes within this area of combinatorial data analysis. We conduct extensive experimentation with both artificial and real data (in a marketing-segmentation problem) to show the effectiveness of the proposed procedure.
The present survey provides the state-of-the-art of research, copiously devoted to Evolutionary Approach (EAs) for clustering exemplified with a diversity of evolutionary computations. The Survey provides a nomenclature that highlights some aspects that are very important in the context of evolutionary data clustering. The paper missions the clustering trade-offs branched out with wide-ranging Multi Objective Evolutionary Approaches (MOEAs) methods. Finally, this study addresses the potential challenges of MOEA design and data clustering, along with conclusions and recommendations for novice and researchers by positioning most promising paths of future research. MOEAs have substantial success across a variety of MOP applications, from pedagogical multifunction optimization to real-world engineering design. The survey paper noticeably organizes the developments witnessed in the past three decades for EAs based metaheuristics to solve multiobjective optimization problems (MOP) and to derive significant progression in ruling high quality elucidations in a single run. Data clustering is an exigent task, whose intricacy is caused by a lack of unique and precise definition of a cluster. The discrete optimization problem uses the cluster space to derive a solution for Multiobjective data clustering. Discovery of a majority or all of the clusters (of illogical shapes) present in the data is a long-standing goal of unsupervised predictive learning problems or exploratory pattern analysis.
Neurocomputing, 2010
Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK-Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions.
ACM Transactions on Knowledge Discovery from Data, 2018
We present a new multiview clustering approach based on multiobjective optimization. In contrast to existing clustering algorithms based on multiobjective optimization, it is generally applicable to data represented by two or more views and does not require specifying the number of clusters a priori . The approach builds upon the search capability of a multiobjective simulated annealing based technique, AMOSA, as the underlying optimization technique. In the first version of the proposed approach, an internal cluster validity index is used to assess the quality of different partitionings obtained using different views. A new way of checking the compatibility of these different partitionings is also proposed and this is used as another objective function. A new encoding strategy and some new mutation operators are introduced. Finally, a new way of computing a consensus partitioning from multiple individual partitions obtained on multiple views is proposed. As a baseline and for compa...
Communications in computer and information science, 2021
We present a data-driven analysis of MOCK, ∆-MOCK, and MOCLE. These are three closely related approaches that use multiobjective optimization for crisp clustering. More specifically, based on a collection of 12 datasets presenting different proprieties, we investigate the performance of MOCLE and MOCK compared to the recently proposed ∆-MOCK. Besides performing a quantitative analysis identifying which method presents a good/poor performance with respect to another, we also conduct a more detailed analysis on why such a behavior happened. Indeed, the results of our analysis provide useful insights on the strengths and weaknesses of the methods investigated.
2000
We develop a metaheuristic procedure for multiobjective clustering problems. Our goal is to find good approximations of the efficient frontier for this class of problems and provide a means for improving decision making in multiple areas of application and in particular those related to marketing. The procedure is based on the tabu and scatter search methodologies. Clustering problems have been
arXiv (Cornell University), 2021
We present a data-driven analysis of MOCK, ∆-MOCK, and MOCLE. These are three closely related approaches that use multiobjective optimization for crisp clustering. More specifically, based on a collection of 12 datasets presenting different proprieties, we investigate the performance of MOCLE and MOCK compared to the recently proposed ∆-MOCK. Besides performing a quantitative analysis identifying which method presents a good/poor performance with respect to another, we also conduct a more detailed analysis on why such a behavior happened. Indeed, the results of our analysis provide useful insights on the strengths and weaknesses of the methods investigated.
IEEE Access
Evolutionary multiobjective algorithms have become a popular choice to tackle the clustering problem. On the one hand, the simultaneous optimization of complementary clustering criteria offers an increased robustness to changes in data characteristics. On the other hand, the evolutionary search is able to approximate the Pareto optimal front and deliver a set of trade-offs between these criteria in a single algorithm execution. Decision making is the concluding stage of the pipeline, having as its goal the selection of a single, final solution from the set of candidate trade-offs produced. This is a complex task for which a definitive answer does not seem to be available, as the underlying assumptions of existing techniques may not hold for all applications. In this paper, we investigate an alternative approach to address this challenge: posing it as a learning problem. The key idea is to build a model that, given a proper characterization of solutions and their context (defined by the full approximation solution set and the specific clustering task at hand), is able to estimate quality and facilitate the identification of the best choice. To evaluate the suitability of this approach, we conduct a series of experiments over diverse synthetic and real-world datasets, including comparisons against a range of representative decision-making strategies from the literature. Our proposal exhibits greater flexibility in dealing with problems of varying characteristics, consistently outperforming the reference methods considered. This study demonstrates that it is possible to learn from the decision-making process in example settings and generalize the acquired knowledge to new scenarios.
Pattern Recognition Letters, 2016
Multiobjective evolutionary clustering algorithms are based on the optimization of several objective functions that guide the search following a cycle based on evolutionary algorithms. Their capabilities allow them to find better solutions than with conventional clustering algorithms when more than one criterion is necessary to obtain understandable patterns from the data. However, these kind of techniques are expensive in terms of computational time and memory usage, and specific strategies are required to ensure their successful scalability when facing large-scale data sets. This work proposes the application of a data subset approach for scaling-up multiobjective clustering algorithms and it also analyzes the impact of three stratification methods. The experiments show that the use of the proposed data subset approach improves the performance of multiobjective evolutionary clustering algorithms without considerably penalizing the accuracy of the final clustering solution.
IEEE Congress on Evolutionary Computation, 2007. …, 2007
Categorical data clustering has been gaining significant attention from researchers since the last few years, because most of the real life data sets are categorical in nature. In contrast to numerical domain, no natural ordering can be found among the elements of a categorical domain. Hence no inherent distance measure, like the Euclidean distance, would work to compute the distance between two categorical objects. Most of the clustering algorithms designed for categorical data are based on optimizing a single objective function. However, a single objective function is often not applicable for different kinds of categorical data sets. Motivated by this fact, in this article, the categorical data clustering problem has been modeled as a multiobjective optimization problem. A popular multiobjective genetic algorithm has been used in this regard to optimize two objectives simultaneously, thus generating a set of non-dominated solutions. The performance of the proposed algorithm has been compared with that of different well known categorical data clustering algorithms and demonstrated for a variety of synthetic and real life categorical data sets. Also a statistical significance test has been performed to establish the superiority of the proposed algorithm. 1296 1-4244-1340-0/07/$25.00 c 2007 IEEE
2019
Supervised clustering organizes data instances into clusters on the basis of similarities between the data instances as well as class labels for the data instances. Supervised clustering seeks to meet multiple objectives, such as compactness of clusters, homogeneity of data in clusters with respect to their class labels, and separateness of clusters. With these objectives in mind, a new supervised clustering algorithm based on a multi-objective crowding genetic algorithm, named SC-MOGA, is proposed in this paper. The algorithm searches for the optimal clustering solution that simultaneously achieves the three objectives mentioned above. The SC-MOGA performs very well on a small dataset, but for a large dataset it may not be able to converge to an optimal solution or can take a very long running time to converge to a solution. Hence, a data sampling method based on the Bisecting K-Means algorithm is also introduced, to find representatives for supervised clustering. This method group...
2020
Clustering is the bunching of the data into groups of identical objects. Here each bunch is known as a cluster, each object is identical to its objects of the same cluster and different from other clusters. In this paper, we are doing an experimental study for comparing clustering algorithms using multiple-objective functions. We have investigated K-means a Partitioning-based clustering, Hierarchical clustering, Spectral clustering, Gaussian Mixture Model Clustering, and Clustering using Hidden Markov Model. The performance of these methods was compared using multiple objective functions. Multiple objectives have two core objectives: Cluster Homogeneity and separation. These multiple objective functions will be a great help to discover robust clusters in a more efficient way.
ArXiv, 2021
This article presents how the studies of the evolutionary multi-objective clustering have been evolving over the years, based on a mapping of the indexed articles in the ACM, IEEE, and Scopus. We present the most relevant approaches considering the highimpact journals and conferences to provide an overview of this study field. We analyzed the algorithms based on the features and components presented in the proposed general architecture of the evolutionary multi-objective clustering. These algorithms were grouped considering common clustering strategies and applications. Furthermore, issues regarding the difficulty in defining appropriate clustering criteria applied to evolutionary multi-objective clustering and the importance of the evolutionary process evaluation to have a clear view of the optimization efficiency are discussed. It is essential to observe these aspects besides specific clustering properties when designing new approaches or selecting/using the existing ones. Finally...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.