Skip to main content

Wojciech Czarnecki

Jagiellonian University, Institute of Computer Science, Graduate Student

Followers

37

Following

56

Co-authors

3

Public Views

.

less

University of Florida

University of Exeter

University of Southern California

Joseph Verducci

Cainan Teixeira

Universidade Federal do Rio Grande do Norte

Institute for Infocomm Research

Interests

Uploads

Papers by Wojciech Czarnecki

On the consistency of Multithreshold Entropy Linear Classifier

Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs informa... more Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs information theoretic concept in order to create a multithreshold maximum margin model. In this paper we analyze its consistency over multithreshold linear models and show that its objective function upper bounds the amount of misclassified points in a similar manner like hinge loss does in support vector machines. For further confirmation we also conduct some numerical experiments on five datasets.

Multithreshold Entropy Linear Classifier

Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel metho... more Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -namely Renyi's quadratic entropy and Cauchy-Schwarz divergence.

Online Extreme Entropy Machines for Streams Classification and Active Learning

Advances in Intelligent Systems and Computing, 2016

Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation

Lecture Notes in Computer Science, 2015

Representation learning is currently a very hot topic in modern machine learning, mostly due to t... more Representation learning is currently a very hot topic in modern machine learning, mostly due to the great success of the deep learning methods. In particular low-dimensional representation which discriminates classes can not only enhance the classification procedure, but also make it faster, while contrary to the high-dimensional embeddings can be efficiently used for visual based exploratory data analysis.

Extreme entropy machines: robust information theoretic classification

Pattern Analysis and Applications, 2015

Most of the existing classification methods are aimed at minimization of empirical risk (through ... more Most of the existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach this problem in a more information theoretic way by investigating applicability of entropy measures as a classification model objective function. We focus on quadratic Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the construction of Extreme Entropy Machines (EEM).

Fast optimization of Multithreshold Entropy Linear Classifier

Multithreshold Entropy Linear Classifier (MELC) is a density based model which searches for a lin... more Multithreshold Entropy Linear Classifier (MELC) is a density based model which searches for a linear projection maximizing the Cauchy-Schwarz Divergence of dataset kernel density estimation. Despite its good empirical results, one of its drawbacks is the optimization speed. In this paper we analyze how one can speed it up through solving an approximate problem. We analyze two methods, both similar to the approximate solutions of the Kernel Density Estimation querying and provide adaptive schemes for selecting a crucial parameters based on user-specified acceptable error. Furthermore we show how one can exploit well known conjugate gradients and L-BFGS optimizers despite the fact that the original optimization problem should be solved on the sphere. All above methods and modifications are tested on 10 real life datasets from UCI repository to confirm their practical usability.

Robust optimization of SVM hyperparameters in the classification of bioactive compounds

by Wojciech Czarnecki and Sabina Smusz

Journal of Cheminformatics, 2015

Background: Support Vector Machine has become one of the most popular machine learning tools used... more Background: Support Vector Machine has become one of the most popular machine learning tools used in virtual screening campaigns aimed at finding new drug candidates. Although it can be extremely effective in finding new potentially active compounds, its application requires the optimization of the hyperparameters with which the assessment is being run, particularly the C and γ values. The optimization requirement in turn, establishes the need to develop fast and effective approaches to the optimization procedure, providing the best predictive power of the constructed model.

Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM

2015 IEEE Trustcom/BigDataSE/ISPA, 2015

Extremely Randomized Machine Learning Methods for Compound Activity Prediction

Molecules, 2015

Speed, a relatively low requirement for computational resources and high effectiveness of the eva... more Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;extremely randomized methods&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;-Extreme Entropy Machine and Extremely Randomized Trees-for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;non-extreme&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39; competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

Cluster based RBF Kernel for Support Vector Machines

Eprint Arxiv 1408 2869, Aug 12, 2014

We call the constructed method C$_k$RBF, where $k$ stands for the amount of clusters used in k-me... more

Weighted Tanimoto Extreme Learning Machine with Case Study in Drug Discovery

IEEE Computational Intelligence Magazine, 2015

Multithreshold Entropy Linear Classifier: Theory and applications

Expert Systems with Applications, 2015

Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel metho... more Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -namely Renyi's quadratic entropy and Cauchy-Schwarz divergence.

Cluster based RBF Kernel for Support Vector Machines

In the classical Gaussian SVM classification we use the feature space projection transforming poi... more In the classical Gaussian SVM classification we use the feature space projection transforming points to normal distributions with fixed covariance matrices (identity in the standard RBF and the covariance of the whole dataset in Mahalanobis RBF). In this paper we add additional information to Gaussian SVM by considering local geometry-dependent feature space projection. We emphasize that our approach is in fact an algorithm for a construction of the new Gaussiantype kernel.

$The most interesting effect of using the proposed method is easier metaparame- ters selection. In practice, many applied researchers (for example in cheminformatics (17\{4\[12}) neglect the metaparameters optimization and use its default values. In most of existing SVM libraries (including 1ibSVM, WEKA), the default value of the C metapa- rameter is 1. Table|4|/shows accuracy obtained by considered models once we narrow down to the optimization of only y. C2RBF obtaines significantly better results than both RBF and mRBF in most cases. It achieves worse performance than RBF kernel only in two tests, where also mRBF behaved worse, which simply shows, that in these datasets, covariance based geometry is not a good kernel building base. Table 4: Comparision of accuracy obtained by different kernels when using (default parameter value C = 1. For C,; RBF k-means is used as the clustering technique$

Two ellipsoid Support Vector Machines

Expert Systems with Applications, 2014

ABSTRACT In classification problems classes usually have different geometrical structure and ther... more ABSTRACT In classification problems classes usually have different geometrical structure and therefore it seems natural for each class to have its own margin type. Existing methods using this principle lead to the construction of the different (from SVM) optimization problems. Although they outperform the standard model, they also prevent the utilization of existing SVM libraries. We propose an approach, named 2eSVM, which allows use of such method within the classical SVM framework. This enables to perform a detailed comparison with the standard SVM. It occurs that classes in the resulting feature space are geometrically easier to separate and the trained model has better generalization properties. Moreover, based on evaluation on standard datasets, 2eSVM brings considerable profit for the linear classification process in terms of training time and quality. We also construct the 2eSVM kernelization and perform the evaluation on the 5-HT2A ligand activity prediction problem (real, fingerprint based data from the cheminformatic domain) which shows increased classification quality, reduced training time as well as resulting model’s complexity.

Machine Learning with Known Input Data Uncertainty Measure

Lecture Notes in Computer Science, 2013

Uncertainty of the input data is a common issue in machine learning. In this paper we show how on... more Uncertainty of the input data is a common issue in machine learning. In this paper we show how one can incorporate knowledge on uncertainty measure regarding particular points in the training set. This may boost up models accuracy as well as reduce overfitting. We show an approach based on the classical training with jitter for Artificial Neural Networks (ANNs). We prove that our method, which can be applied to a wide class of models, is approximately equivalent to generalised Tikhonov regularisation learning. We also compare our results with some alternative methods. In the end we discuss further prospects and applications.

InFeST – ImageJ Plugin for Rapid Development of Image Segmentation Pipelines

Advances in Intelligent Systems and Computing, 2014

Uncertainty of the in Vitro Experiments in the Construction of Predictive Models

by D. Warszycki and Wojciech Czarnecki

Exploiting uncertainty measures in compounds activity prediction using support vector machines

by D. Warszycki, Sabina Smusz, and Wojciech Czarnecki

Bioorganic & Medicinal Chemistry Letters, 2015

The great majority of molecular modeling tasks require the construction of a model that is then u... more The great majority of molecular modeling tasks require the construction of a model that is then used to evaluate new compounds. Although various types of these models exist, at some stage, they all use knowledge about the activity of a given group of compounds, and the performance of the models is dependent on the quality of these data. Biological experiments verifying the activity of chemical compounds are often not reproducible; hence, databases containing these results often possess various activity records for a given molecule. In this study, we developed a method that incorporates the uncertainty of biological tests in machine-learning-based experiments using the Support Vector Machine as a classification model. We show that the developed methodology improves the classification effectiveness in the tested conditions.

Adaptive Active Learning as a Multi-armed Bandit Problem

Designing a competition for autonomous robots with a restricted set of sensors with a case study of LEGO NXT

by Wojciech Czarnecki and Andrzej Wójtowicz

Journal of Automation, Mobile Robotics & Intelligent Systems, 2014

Arrangements for a compe on are not only difficult in terms of logis cs of the event, but also re... more Arrangements for a compe on are not only difficult in terms of logis cs of the event, but also require an assurance of quality. In this paper we analyze limita ons which arise from design of the contest for robots equipped with a very poor sensor set. This issue is faintly explored -up to now research work usually has focused on results of a certain task and in addi on it assumed almost having a free hand with a choice of components. The discussed queson is significant on the grounds of primary principles: objec vity in grading, equal opportuni es among par cipants and preserva on of a rac veness of the tournament at the same me.

On the consistency of Multithreshold Entropy Linear Classifier

Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs informa... more Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs information theoretic concept in order to create a multithreshold maximum margin model. In this paper we analyze its consistency over multithreshold linear models and show that its objective function upper bounds the amount of misclassified points in a similar manner like hinge loss does in support vector machines. For further confirmation we also conduct some numerical experiments on five datasets.

Multithreshold Entropy Linear Classifier

Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel metho... more Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -namely Renyi's quadratic entropy and Cauchy-Schwarz divergence.

Online Extreme Entropy Machines for Streams Classification and Active Learning

Advances in Intelligent Systems and Computing, 2016

Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation

Lecture Notes in Computer Science, 2015

Representation learning is currently a very hot topic in modern machine learning, mostly due to t... more Representation learning is currently a very hot topic in modern machine learning, mostly due to the great success of the deep learning methods. In particular low-dimensional representation which discriminates classes can not only enhance the classification procedure, but also make it faster, while contrary to the high-dimensional embeddings can be efficiently used for visual based exploratory data analysis.

Extreme entropy machines: robust information theoretic classification

Pattern Analysis and Applications, 2015

Most of the existing classification methods are aimed at minimization of empirical risk (through ... more Most of the existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach this problem in a more information theoretic way by investigating applicability of entropy measures as a classification model objective function. We focus on quadratic Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the construction of Extreme Entropy Machines (EEM).

Fast optimization of Multithreshold Entropy Linear Classifier

Multithreshold Entropy Linear Classifier (MELC) is a density based model which searches for a lin... more Multithreshold Entropy Linear Classifier (MELC) is a density based model which searches for a linear projection maximizing the Cauchy-Schwarz Divergence of dataset kernel density estimation. Despite its good empirical results, one of its drawbacks is the optimization speed. In this paper we analyze how one can speed it up through solving an approximate problem. We analyze two methods, both similar to the approximate solutions of the Kernel Density Estimation querying and provide adaptive schemes for selecting a crucial parameters based on user-specified acceptable error. Furthermore we show how one can exploit well known conjugate gradients and L-BFGS optimizers despite the fact that the original optimization problem should be solved on the sphere. All above methods and modifications are tested on 10 real life datasets from UCI repository to confirm their practical usability.

Robust optimization of SVM hyperparameters in the classification of bioactive compounds

by Wojciech Czarnecki and Sabina Smusz

Journal of Cheminformatics, 2015

Background: Support Vector Machine has become one of the most popular machine learning tools used... more Background: Support Vector Machine has become one of the most popular machine learning tools used in virtual screening campaigns aimed at finding new drug candidates. Although it can be extremely effective in finding new potentially active compounds, its application requires the optimization of the hyperparameters with which the assessment is being run, particularly the C and γ values. The optimization requirement in turn, establishes the need to develop fast and effective approaches to the optimization procedure, providing the best predictive power of the constructed model.

Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM

2015 IEEE Trustcom/BigDataSE/ISPA, 2015

Extremely Randomized Machine Learning Methods for Compound Activity Prediction

Molecules, 2015

Speed, a relatively low requirement for computational resources and high effectiveness of the eva... more Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;extremely randomized methods&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;-Extreme Entropy Machine and Extremely Randomized Trees-for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;non-extreme&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39; competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

Cluster based RBF Kernel for Support Vector Machines

Eprint Arxiv 1408 2869, Aug 12, 2014

We call the constructed method C$_k$RBF, where $k$ stands for the amount of clusters used in k-me... more

Weighted Tanimoto Extreme Learning Machine with Case Study in Drug Discovery

IEEE Computational Intelligence Magazine, 2015

Multithreshold Entropy Linear Classifier: Theory and applications

Expert Systems with Applications, 2015

Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel metho... more Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -namely Renyi's quadratic entropy and Cauchy-Schwarz divergence.

Cluster based RBF Kernel for Support Vector Machines

In the classical Gaussian SVM classification we use the feature space projection transforming poi... more In the classical Gaussian SVM classification we use the feature space projection transforming points to normal distributions with fixed covariance matrices (identity in the standard RBF and the covariance of the whole dataset in Mahalanobis RBF). In this paper we add additional information to Gaussian SVM by considering local geometry-dependent feature space projection. We emphasize that our approach is in fact an algorithm for a construction of the new Gaussiantype kernel.

$The most interesting effect of using the proposed method is easier metaparame- ters selection. In practice, many applied researchers (for example in cheminformatics (17\{4\[12}) neglect the metaparameters optimization and use its default values. In most of existing SVM libraries (including 1ibSVM, WEKA), the default value of the C metapa- rameter is 1. Table|4|/shows accuracy obtained by considered models once we narrow down to the optimization of only y. C2RBF obtaines significantly better results than both RBF and mRBF in most cases. It achieves worse performance than RBF kernel only in two tests, where also mRBF behaved worse, which simply shows, that in these datasets, covariance based geometry is not a good kernel building base. Table 4: Comparision of accuracy obtained by different kernels when using (default parameter value C = 1. For C,; RBF k-means is used as the clustering technique$

Two ellipsoid Support Vector Machines

Expert Systems with Applications, 2014

ABSTRACT In classification problems classes usually have different geometrical structure and ther... more ABSTRACT In classification problems classes usually have different geometrical structure and therefore it seems natural for each class to have its own margin type. Existing methods using this principle lead to the construction of the different (from SVM) optimization problems. Although they outperform the standard model, they also prevent the utilization of existing SVM libraries. We propose an approach, named 2eSVM, which allows use of such method within the classical SVM framework. This enables to perform a detailed comparison with the standard SVM. It occurs that classes in the resulting feature space are geometrically easier to separate and the trained model has better generalization properties. Moreover, based on evaluation on standard datasets, 2eSVM brings considerable profit for the linear classification process in terms of training time and quality. We also construct the 2eSVM kernelization and perform the evaluation on the 5-HT2A ligand activity prediction problem (real, fingerprint based data from the cheminformatic domain) which shows increased classification quality, reduced training time as well as resulting model’s complexity.

Machine Learning with Known Input Data Uncertainty Measure

Lecture Notes in Computer Science, 2013

Uncertainty of the input data is a common issue in machine learning. In this paper we show how on... more Uncertainty of the input data is a common issue in machine learning. In this paper we show how one can incorporate knowledge on uncertainty measure regarding particular points in the training set. This may boost up models accuracy as well as reduce overfitting. We show an approach based on the classical training with jitter for Artificial Neural Networks (ANNs). We prove that our method, which can be applied to a wide class of models, is approximately equivalent to generalised Tikhonov regularisation learning. We also compare our results with some alternative methods. In the end we discuss further prospects and applications.

InFeST – ImageJ Plugin for Rapid Development of Image Segmentation Pipelines

Advances in Intelligent Systems and Computing, 2014

Uncertainty of the in Vitro Experiments in the Construction of Predictive Models

by D. Warszycki and Wojciech Czarnecki

Exploiting uncertainty measures in compounds activity prediction using support vector machines

by D. Warszycki, Sabina Smusz, and Wojciech Czarnecki

Bioorganic & Medicinal Chemistry Letters, 2015

The great majority of molecular modeling tasks require the construction of a model that is then u... more The great majority of molecular modeling tasks require the construction of a model that is then used to evaluate new compounds. Although various types of these models exist, at some stage, they all use knowledge about the activity of a given group of compounds, and the performance of the models is dependent on the quality of these data. Biological experiments verifying the activity of chemical compounds are often not reproducible; hence, databases containing these results often possess various activity records for a given molecule. In this study, we developed a method that incorporates the uncertainty of biological tests in machine-learning-based experiments using the Support Vector Machine as a classification model. We show that the developed methodology improves the classification effectiveness in the tested conditions.

Adaptive Active Learning as a Multi-armed Bandit Problem

Designing a competition for autonomous robots with a restricted set of sensors with a case study of LEGO NXT

by Wojciech Czarnecki and Andrzej Wójtowicz

Journal of Automation, Mobile Robotics & Intelligent Systems, 2014

Arrangements for a compe on are not only difficult in terms of logis cs of the event, but also re... more Arrangements for a compe on are not only difficult in terms of logis cs of the event, but also require an assurance of quality. In this paper we analyze limita ons which arise from design of the contest for robots equipped with a very poor sensor set. This issue is faintly explored -up to now research work usually has focused on results of a certain task and in addi on it assumed almost having a free hand with a choice of components. The discussed queson is significant on the grounds of primary principles: objec vity in grading, equal opportuni es among par cipants and preserva on of a rac veness of the tournament at the same me.