Papers by Alexander Topchy
Methods and Apparatus to Track Web Browsing Sessions
Methods and Apparatus to Measure Exposure to Mobile Advertisements

Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
Numerous clustering algorithms, their taxonomies and evaluation studies are available in the lite... more Numerous clustering algorithms, their taxonomies and evaluation studies are available in the literature. Despite the diversity of different clustering algorithms, solutions delivered by these algorithms exhibit many commonalities. An analysis of the similarity and properties of clustering objective functions is necessary from the operational/user perspective. We revisit conventional categorization of clustering algorithms and attempt to relate them according to the partitions they produce. We empirically study the similarity of clustering solutions obtained by many traditional as well as relatively recent clustering algorithms on a number of real-world data sets. Sammon's mapping and a complete-link clustering of the inter-clustering dissimilarity values are performed to detect a meaningful grouping of the objective functions. We find that only a small number of clustering algorithms are sufficient to represent a large spectrum of clustering criteria. For example, interesting groups of clustering algorithms are centered around the graph partitioning, linkage-based and Gaussian mixture model based algorithms.

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2000
Conventional clustering algorithms utilize a single criterion that may not conform to the diverse... more Conventional clustering algorithms utilize a single criterion that may not conform to the diverse shapes of the underlying clusters. We offer a new clustering approach that uses multiple clustering objective functions simultaneously. The proposed multiobjective clustering is a two-step process. It includes detection of clusters by a set of candidate objective functions as well as their integration into the target partition. A key ingredient of the approach is a cluster goodness function that evaluates the utility of multiple clusters using re-sampling techniques. Multiobjective data clustering is obtained as a solution to a discrete optimization problem in the space of clusters. At meta-level, our algorithm incorporates conflict resolution techniques along with the natural data constraints. An empirical study on a number of artificial and real-world data sets demonstrates that multiobjective data clustering leads to valid and robust data partitions.

Third IEEE International Conference on Data Mining, 2000
A data set can be clustered in many ways depending on the clustering algorithm employed, paramete... more A data set can be clustered in many ways depending on the clustering algorithm employed, parameter settings used and other factors. Can multiple clusterings be combined so that the final partitioning of data provides better clustering? The answer depends on the quality of clusterings to be combined as well as the properties of the fusion method. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. As a result, we show that the consensus function is related to the classical intra-class variance criterion using the generalized mutual information definition. Second, we show the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. We analyze the combination accuracy as a function of parameters controlling the power and resolution of component partitions as well as the learning dynamics vs. the number of clusterings involved. Finally, some empirical studies compare the effectiveness of several consensus functions.
A Mixture Model for Clustering Ensembles
Proceedings of the 2004 SIAM International Conference on Data Mining, 2004
Clustering ensembles have emerged as a powerful method for improving both the robustness and the ... more Clustering ensembles have emerged as a powerful method for improving both the robustness and the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, ...
Lecture Notes in Computer Science, 2003
Feature extraction based on evolutionary search offers new possibilities for improving classifica... more Feature extraction based on evolutionary search offers new possibilities for improving classification accuracy and reducing measurement complexity in many data mining and machine learning applications. We present a family of genetic algorithms for feature synthesis through clustering of discrete attribute values. The approach uses new compact graph-based encoding for cluster representation, where size of GA search space is reduced exponentially with respect to the number of items in partitioning, as compared to original idea of Park and Song. We apply developed algorithms and study their effectiveness for DNA fingerprinting in population genetics and text categorization.
Methods and Apparatus for Generating Signaures
This paper describes two algorithms based on cooperative evolution of internal hidden network rep... more This paper describes two algorithms based on cooperative evolution of internal hidden network representations and a combination of global evolutionary and local search procedures. The obtained experimental results are better in comparison with prototype methods. It is demonstrated, that the applications of pure gradient or pure genetic algorithms to the network training problem is much worse than hybrid procedures, which reasonably combine the advantages of global as well as local search.
Sdm, 2005
The problem of clustering with constraints is receiving increasing attention. Many existing algor... more The problem of clustering with constraints is receiving increasing attention. Many existing algorithms assume the specified constraints are correct and consistent. We take a new approach and model the uncertainty of constraints in a principled manner by treating the constraints as random variables. The effect of specified constraints on a subset of points is propagated to other data points by biasing the search for cluster boundaries. By combining the a posteriori enforcement of constraints with the log-likelihood, we obtain a new objective function. An EM-type algorithm derived by variational method is used for efficient parameter estimation. Experimental results demonstrate the usefulness of the proposed algorithm. In particular, our approach can identify the desired clusters even when only a small portion of data participates in constraints.
Gecco, 2001
We examine the effectiveness of gradient search optimization of numeric leaf values for Genetic P... more We examine the effectiveness of gradient search optimization of numeric leaf values for Genetic Programming. Genetic search for tree-like programs at the population level is complemented by the optimization of terminal values at the individual level. Local adaptation of individuals is made easier by algorithmic differentiation. We show how conventional random constants are tuned by gradient descent with minimal overhead. Several experiments with symbolic regression problems are performed to demonstrate the approach's effectiveness. Effects of local learning are clearly manifest in both improved approximation accuracy and selection changes when periods of local and global search are interleaved. Special attention is paid to the low overhead of the local gradient descent. Finally, the inductive bias of local learning is quantified.

Model-based Clustering With Soft And Probabilistic Constraints
The problem of clustering with constraints has received a lot of attention lately. Many existing ... more The problem of clustering with constraints has received a lot of attention lately. Many existing algorithms as- sume the specifled constraints are correct and consis- tent. We take a new approach and model a constraint as a random variable. This enables us to model the uncertainty of constraints in a principled manner. The efiect of constraints can be readily propagated to the neighborhood by biasing the search of the optimal pa- rameters in each cluster. This enforces \smooth" cluster labels. The posterior probabilities of these constraint random variables represent the a posteriori enforcement of the corresponding constraints. By combining these probability values with the data likelihood, we arrive at an objective function for parameter estimation. An EM algorithm that maximizes the lower bound of the objective function is derived for e-cient parameter es- timation, using the variational method. Experimental results demonstrate the usefulness of the proposed al- gorithm. In particular, our approach can identify the desired clusters when only a small portion of data par- ticipate in constraints.
Genetic algorithm for solution of the traveling salesman problem with new features against premature convergence. Working paper
Methods and Apparatus to Control a State of Data Collection Devices
Methods and apparatus to present supplemental media on a second screen
Methods and Apparatus for Performing Variable Block Length Watermarking of Media
Methods and Apparatus to Encode and Decode Audio for Shopper Location and Advertisement Presentation Tracking
Methods and Apparatus to Predict Audience Composition And/Or Solicit Audience Members
Methods and Apparatus to Select Media Based on Engagement Levels
Methods and Apparatus for Generating Signatures
Uploads
Papers by Alexander Topchy