Papers by Wojciech Czarnecki
Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs informa... more Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs information theoretic concept in order to create a multithreshold maximum margin model. In this paper we analyze its consistency over multithreshold linear models and show that its objective function upper bounds the amount of misclassified points in a similar manner like hinge loss does in support vector machines. For further confirmation we also conduct some numerical experiments on five datasets.
Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel metho... more Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -namely Renyi's quadratic entropy and Cauchy-Schwarz divergence.
Online Extreme Entropy Machines for Streams Classification and Active Learning
Advances in Intelligent Systems and Computing, 2016
Lecture Notes in Computer Science, 2015
Representation learning is currently a very hot topic in modern machine learning, mostly due to t... more Representation learning is currently a very hot topic in modern machine learning, mostly due to the great success of the deep learning methods. In particular low-dimensional representation which discriminates classes can not only enhance the classification procedure, but also make it faster, while contrary to the high-dimensional embeddings can be efficiently used for visual based exploratory data analysis.
Pattern Analysis and Applications, 2015
Most of the existing classification methods are aimed at minimization of empirical risk (through ... more Most of the existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach this problem in a more information theoretic way by investigating applicability of entropy measures as a classification model objective function. We focus on quadratic Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the construction of Extreme Entropy Machines (EEM).
Multithreshold Entropy Linear Classifier (MELC) is a density based model which searches for a lin... more Multithreshold Entropy Linear Classifier (MELC) is a density based model which searches for a linear projection maximizing the Cauchy-Schwarz Divergence of dataset kernel density estimation. Despite its good empirical results, one of its drawbacks is the optimization speed. In this paper we analyze how one can speed it up through solving an approximate problem. We analyze two methods, both similar to the approximate solutions of the Kernel Density Estimation querying and provide adaptive schemes for selecting a crucial parameters based on user-specified acceptable error. Furthermore we show how one can exploit well known conjugate gradients and L-BFGS optimizers despite the fact that the original optimization problem should be solved on the sphere. All above methods and modifications are tested on 10 real life datasets from UCI repository to confirm their practical usability.
Journal of Cheminformatics, 2015
Background: Support Vector Machine has become one of the most popular machine learning tools used... more Background: Support Vector Machine has become one of the most popular machine learning tools used in virtual screening campaigns aimed at finding new drug candidates. Although it can be extremely effective in finding new potentially active compounds, its application requires the optimization of the hyperparameters with which the assessment is being run, particularly the C and γ values. The optimization requirement in turn, establishes the need to develop fast and effective approaches to the optimization procedure, providing the best predictive power of the constructed model.
Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM
2015 IEEE Trustcom/BigDataSE/ISPA, 2015

Extremely Randomized Machine Learning Methods for Compound Activity Prediction
Molecules, 2015
Speed, a relatively low requirement for computational resources and high effectiveness of the eva... more Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called 'extremely randomized methods'-Extreme Entropy Machine and Extremely Randomized Trees-for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their 'non-extreme' competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.
Eprint Arxiv 1408 2869, Aug 12, 2014
We call the constructed method C$_k$RBF, where $k$ stands for the amount of clusters used in k-me... more We call the constructed method C$_k$RBF, where $k$ stands for the amount of clusters used in k-means. We show empirically on nine datasets from UCI repository that C$_2$RBF increases the stability of the grid search (measured as the probability of finding good parameters).
Weighted Tanimoto Extreme Learning Machine with Case Study in Drug Discovery
IEEE Computational Intelligence Magazine, 2015
Expert Systems with Applications, 2015
Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel metho... more Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -namely Renyi's quadratic entropy and Cauchy-Schwarz divergence.
In the classical Gaussian SVM classification we use the feature space projection transforming poi... more In the classical Gaussian SVM classification we use the feature space projection transforming points to normal distributions with fixed covariance matrices (identity in the standard RBF and the covariance of the whole dataset in Mahalanobis RBF). In this paper we add additional information to Gaussian SVM by considering local geometry-dependent feature space projection. We emphasize that our approach is in fact an algorithm for a construction of the new Gaussiantype kernel.

Two ellipsoid Support Vector Machines
Expert Systems with Applications, 2014
ABSTRACT In classification problems classes usually have different geometrical structure and ther... more ABSTRACT In classification problems classes usually have different geometrical structure and therefore it seems natural for each class to have its own margin type. Existing methods using this principle lead to the construction of the different (from SVM) optimization problems. Although they outperform the standard model, they also prevent the utilization of existing SVM libraries. We propose an approach, named 2eSVM, which allows use of such method within the classical SVM framework. This enables to perform a detailed comparison with the standard SVM. It occurs that classes in the resulting feature space are geometrically easier to separate and the trained model has better generalization properties. Moreover, based on evaluation on standard datasets, 2eSVM brings considerable profit for the linear classification process in terms of training time and quality. We also construct the 2eSVM kernelization and perform the evaluation on the 5-HT2A ligand activity prediction problem (real, fingerprint based data from the cheminformatic domain) which shows increased classification quality, reduced training time as well as resulting model’s complexity.
Lecture Notes in Computer Science, 2013
Uncertainty of the input data is a common issue in machine learning. In this paper we show how on... more Uncertainty of the input data is a common issue in machine learning. In this paper we show how one can incorporate knowledge on uncertainty measure regarding particular points in the training set. This may boost up models accuracy as well as reduce overfitting. We show an approach based on the classical training with jitter for Artificial Neural Networks (ANNs). We prove that our method, which can be applied to a wide class of models, is approximately equivalent to generalised Tikhonov regularisation learning. We also compare our results with some alternative methods. In the end we discuss further prospects and applications.
InFeST – ImageJ Plugin for Rapid Development of Image Segmentation Pipelines
Advances in Intelligent Systems and Computing, 2014
Uncertainty of the in Vitro Experiments in the Construction of Predictive Models

Exploiting uncertainty measures in compounds activity prediction using support vector machines
Bioorganic & Medicinal Chemistry Letters, 2015
The great majority of molecular modeling tasks require the construction of a model that is then u... more The great majority of molecular modeling tasks require the construction of a model that is then used to evaluate new compounds. Although various types of these models exist, at some stage, they all use knowledge about the activity of a given group of compounds, and the performance of the models is dependent on the quality of these data. Biological experiments verifying the activity of chemical compounds are often not reproducible; hence, databases containing these results often possess various activity records for a given molecule. In this study, we developed a method that incorporates the uncertainty of biological tests in machine-learning-based experiments using the Support Vector Machine as a classification model. We show that the developed methodology improves the classification effectiveness in the tested conditions.
Adaptive Active Learning as a Multi-armed Bandit Problem
Journal of Automation, Mobile Robotics & Intelligent Systems, 2014
Arrangements for a compe on are not only difficult in terms of logis cs of the event, but also re... more Arrangements for a compe on are not only difficult in terms of logis cs of the event, but also require an assurance of quality. In this paper we analyze limita ons which arise from design of the contest for robots equipped with a very poor sensor set. This issue is faintly explored -up to now research work usually has focused on results of a certain task and in addi on it assumed almost having a free hand with a choice of components. The discussed queson is significant on the grounds of primary principles: objec vity in grading, equal opportuni es among par cipants and preserva on of a rac veness of the tournament at the same me.
Uploads
Papers by Wojciech Czarnecki