Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, Machine Learning
Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classification-based objective functions, an intuitive approach to training artificial neural networks on classification problems. Classification-based learning attempts to guide the network directly to correct pattern classification rather than using an implicit search of common error minimization heuristics, such as sum-squared-error (SSE) and cross-entropy (CE). CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86% and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7% and 91.3% using optimized SSE and CE networks, respectively, to 92.1% for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.
Neural Processing Letters, 2006
Effective backpropagation training of multi-layer perceptrons depends on the incorporation of an appropriate error or objective function. Classification-based (CB) error functions are heuristic approaches that attempt to guide the network directly to correct pattern classification rather than using common error minimization heuristics, such as sum-squared error and cross-entropy, which do not explicitly minimize classification error. This work presents CB3, a novel CB approach that learns the error function to be used while training. This is accomplished by learning pattern confidence margins during training, which are used to dynamically set output target values for each training pattern. On eleven applications, CB3 significantly outperforms previous CB error functions, and also reduces average test error over conventional error metrics using 0-1 targets without weight decay by 1.8%, and by 1.3% over metrics with weight decay. CB3 also exhibits lower model variance and tighter mean confidence interval.
Neural Net Classifier (NNC). The objective is to compare the performances of BP, SA, and BCGA in terms of yielding accurate outputs in less time. Four standard datasets, e.g. Iris, Diabetes, Glass and Teaching Assistant Evaluation (TAE) are used for their implementations and testing. It has been observed that BP loses its merit as a good NNC optimizer due to its tendency of being trapped into local minima. SA, on the other hand, consumes more time for convergence, but yields better classification accuracy, compared to BP. BCGA, being a global search algorithm, is able to find better solution from all available solutions and thus yields the best accuracy, but with slow convergence rate. Based on these observations, the paper suggests that SA might be a reasonable choice for NNC optimization to start with, when both accuracy and convergence speed are considered.
Biological and Artificial Intelligence Environments, 2005
One way of using the entropy criteria in learning systems is to minimize the entropy of the error between two variables: typically, one is the output of the learning system and the other is the target. This framework has been used for regression. In this paper we show how to use the minimization of the entropy of the error for classification. The minimization of the entropy of the error implies a constant value for the errors. This, in general, does not imply that the value of the errors is zero. In regression, this problem is solved by making a shift of the final result such that it's average equals the average value of the desired target. We prove that, under mild conditions, this algorithm, when used in a classification problem, makes the error converge to zero and can thus be used in classification.
Neural Networks, IEEE …, 2002
Constructive learning algorithms o er an attractive approach for the incremental construction of near-minimal neural network architectures for pattern classi cation. They help overcome the need for ad hoc and often inappropriate choices of network topology in the use of algorithms that search for suitable weights in a priori xed network architectures. Several such algorithms have been proposed in the literature and are shown to converge to zero classi cation errors (under certain assumptions) on tasks that involve learning a binary to binary mapping (i.e., classi cation problems involving binary valued input attributes and two output categories). We present two constructive learning algorithms MPyramid-real and MTiling-real that extend the pyramid 19] and tiling 32] algorithms respectively for learning real to M-ary mappings (i.e., classi cation problems involving real valued input attributes and multiple output classes). We prove the convergence of these algorithms and empirically demonstrate their applicability on practical pattern classi cation problems. Additionally, we show how the incorporation of a local pruning step can eliminate several redundant neurons from MTiling-real networks.
2020 IEEE Congress on Evolutionary Computation (CEC), 2020
As the complexity of neural network models has grown, it has become increasingly important to optimize their design automatically through metalearning. Methods for discovering hyperparameters, topologies, and learning rate schedules have lead to significant increases in performance. This paper shows that loss functions can be optimized with metalearning as well, and result in similar improvements. The method, Genetic Lossfunction Optimization (GLO), discovers loss functions de novo, and optimizes them for a target task. Leveraging techniques from genetic programming, GLO builds loss functions hierarchically from a set of operators and leaf nodes. These functions are repeatedly recombined and mutated to find an optimal structure, and then a covariance-matrix adaptation evolutionary strategy (CMA-ES) is used to find optimal coefficients. Networks trained with GLO loss functions are found to outperform the standard cross-entropy loss on standard image classification tasks. Training with these new loss functions requires fewer steps, results in lower test error, and allows for smaller datasets to be used. Loss function optimization thus provides a new dimension of metalearning, and constitutes an important step towards AutoML.
International Journal of Computing and Digital Systems, 2019
ANN is a very well-known approach used for classification based on supervised machine learning. This approach faces some issues, notably the local minima problem, which leads to diminished accuracy in the results. To solve the problem of local minima, a new algorithm called RPSOGAC has been proposed. The proposed algorithm combines the strengths of both optimization algorithms PSO and GA to improve the classification accuracy of ANNs. RPSOGAC starts by finding the best weights that lead to the best ANN classification result using the backpropagation algorithm and adds these weights to the initial population. The other individuals of the population are randomly generated. Based on randomness, the algorithm reciprocally and continually switches between applying the GA and PSO algorithms until it reaches to the best solution. Two major differences between RPSOGAC and other previous algorithms, firstly is the random selection of GA and PSO, which gives equal opportunities for them to improve the classification. Secondly, during PSO, a competition between two population sets are performed to come up with a new population having the best individuals, this gives a chance for expected to improve individuals to enhance in the future if possible. Various experiments on six different datasets related to four domains have been conducted to show the classification accuracy of RPSOGAC. Also, a comparative study has been performed to compare the accuracy performance of classification between RPSOGAC and other algorithms. The obtained results show that RPSOGAC outperforms other approaches in four datasets and in the other two the results are very close.
American Journal of Physics and Applications, 2014
In a complex and changing a remote sensing system, which requires taking quick and informed decisions environment, connectionist methods have shown their great contribution in particular the reduction and classification of spectral data. In this context, this paper proposes to study the parameters that optimize the results of an artificial neural network ANN multilayer perceptron based, for classification of chemical agents on multi-spectral images. The mean squared error cost function remains one of the major parameters of the network convergence at its learning phase and a challenge that will face our approach to improve the gradient descent by the conjugate gradient method that seems fast and efficient.
1998
Various techniques of optimizing the multiple class cross-entropy error function to train single hidden layer neural network classi ers with softmax output transfer functions are investigated on a real-world multispectral pixel-by-pixel classi cation problem that is of fundamental importance in remote sensing. These techniques include epoch-based and batchv ersions of backpropagation of gradient descent, PR-conjugate gradient and BFGS quasi-Newton errors. The method of choice depends upon the nature of the learning task and whether one wants to optimize learning for speed or generalization performance. It was found that, comparatively considered, gradient descent error backpropagation provided the best and most stable out-of-sample performance results across batch and epoch-based modes of operation. If the goal is to maximize learning speed and a sacri ce in generalisation is acceptable, then PR-conjugate gradient error backpropagation tends to be superior. If the training set is very large, stochastic epoch-based versions of local optimizers should be chosen utilizing a larger rather than a smaller epoch size to avoid inacceptable instabilities in the generalization results.
2009
The main problem for Supervised Multi-layer Neural Network (SMNN) model such as Back propagation network lies in finding the suitable weights during training in order to improve training time as well as achieve high accuracy. The important issue in the training process of the existing SMNN model is initialization of the weights which is random and creates paradox, and leads to low accuracy with high training time. In this paper, a new Supervised Feed Forward Multi-layer Neural Network (SFFMNN) model for classification problem is proposed. It consists of a new preprocessing technique which combines data preprocessing and pre-training that offer a number of advantages; training cycle, gradient of mean square error function, and updating weights are not needed in this model. In new SMFFNN model, thresholds of training set and test set are computed by using input values and potential weights. In training set each instance has one special threshold and class label. In test set the threshold of each instance will be compared with the range of thresholds of training set and the class label of each instance will be predicted. To evaluate the performance of the proposed SMFFNN model, a series of experiment on XOR problem and two datasets, which are SPECT Heart and SPECTF Heart was implemented with 10fold cross-validation. As quoted by literature, these two datasets are difficult for classification and most of the conventional methods do not process well on these datasets. Our results, however, show that the proposed model has been given high accuracy in one epoch without training cycle.
2021
A novel model called error loss network (ELN) is proposed to build an error loss function for supervised learning. The ELN is in structure similar to a radial basis function (RBF) neural network, but its input is an error sample and output is a loss corresponding to that error sample. That means the nonlinear input-output mapper of ELN creates an error loss function. The proposed ELN provides a unified model for a large class of error loss functions, which includes some information theoretic learning (ITL) loss functions as special cases. The activation function, weight parameters and network size of the ELN can be predetermined or learned from the error samples. On this basis, we propose a new machine learning paradigm where the learning process is divided into two stages: first, learning a loss function using an ELN; second, using the learned loss function to continue to perform the learning. Experimental results are presented to demonstrate the desirable performance of the new me...
1991
One connectionist approach to the classification problem, which has gained popularity in recent years, is the use of backpropagation-trained feed-forward neural networks. In practice, however, we find that the rate of convergence of net output error is especially low when training networks for multi-class problems. In this paper, we show that while backpropagation will reduce the Euclidean distance between the actual and desired output vectors, the difference between some of the components of these vectors will actually increase in the first iteration. Furthermore, the magnitudes of subsequent weight changes in each iteration are very small, so that many iterations are required to compensate for the increased error in some components in the initial iterations. We describe a modular network architecture to improve the rate of learning for such classification problems._ Our basic approach is to reduce a K-class problem to set of K two-class problems with a separately trained network f...
Neural Networks, 2001
The Constraint Based Decomposition (CBD) is a constructive neural network technique that builds a three or four layer network, has guaranteed convergence and can deal with binary, n-ary, class labeled and real-value problems. CBD is shown to be able to solve complicated problems in a simple, fast and reliable manner. The technique is further enhanced by two modi®cations (locking detection and redundancy elimination) which address the training speed and the ef®ciency of the internal representation built by the network. The redundancy elimination aims at building more compact architectures while the locking detection aims at improving the training speed. The computational cost of the redundancy elimination is negligible and this enhancement can be used for any problem. However, the computational cost of the locking detection is exponential in the number of dimensions and should only be used in low dimensional spaces. The experimental results show the performance of the algorithm presented in a series of classical benchmark problems including the 2-spiral problem and the Iris, Wine, Glass, Lenses, Ionosphere, Lung cancer, Pima Indians, Bupa, TicTacToe, Balance and Zoo data sets from the UCI machine learning repository. CBD's generalization accuracy is compared with that of C4.5, C4.5 with rules, incremental decision trees, oblique classi®ers, linear machine decision trees, CN2, learning vector quantization (LVQ), backpropagation, nearest neighbor, Q* and radial basis functions (RBFs). CBD provides the second best average accuracy on the problems tested as well as the best reliability (the lowest standard deviation). q
Enfoque UTE, 2024
Artificial neural networks (ANNs) have become indispensable tools for solving classification tasks across various domains. This systematic literature review explores the landscape of ANN utilization in classification, addressing three key research questions: the types of architectures employed, their accuracy, and the data utilized. The review encompasses 30 studies published between 2019 and 2024, revealing Convolutional Neural Networks (CNNs) as the predominant architecture in image-related tasks, followed by Multilayer Perceptron (MLP) architectures for general classification tasks. Feed Forward Neural Networks (FFNN) exhibited the highest average accuracy with a 97.12%, with specific studies achieving exceptional results across diverse classification tasks. Moreover, the review identifies digitized images as a commonly utilized data source, reflecting the broad applicability of ANNs in tasks such as medical diagnosis and remote sensing. The findings underscore the importance of machine learning approaches, highlight the robustness of ANNs in achieving high accuracy, and suggest avenues for future research to enhance interpretability, efficiency, and generalization capabilities, as well as address challenges related to data quality.
arXiv (Cornell University), 2021
We design a new adaptive learning algorithm for misclassification cost problems that attempt to reduce the cost of misclassified instances derived from the consequences of various errors. Our algorithm (adaptive cost sensitive learning-AdaCSL) adaptively adjusts the loss function such that the classifier bridges the difference between the class distributions between subgroups of samples in the training and test data sets with similar predicted probabilities (i.e., local trainingtest class distribution mismatch). We provide some theoretical performance guarantees on the proposed algorithm and present empirical evidence that a deep neural network used with the proposed AdaCSL algorithm yields better cost results on several binary classification data sets that have class-imbalanced and classbalanced distributions compared to other alternative approaches.
The standard back propagation algorithm for training artificial neural networks utilizes two terms, a learning rate and a momentum factor. The major limitations of this standard algorithm are the existence of temporary, local minima resulting from the saturation behaviour of the activation function, and the slow rates of convergence. Previous research demonstrated that in ‘feed forward’ algorithm, the slope of the activation function is directly influenced by a parameter referred to as ‘gain’. This research proposed an algorithm for improving the performance of the back propagation algorithm by introducing the adaptive gain of the activation function. The efficiency of the proposed algorithm is compared with conventional Gradient Descent Method and verified by means of simulation on four classification problems. In learning the patterns, the simulations result demonstrate that the proposed method converged faster on Wisconsin breast cancer and diabetes classification problem with an improvement ratio of nearly 2.8 and 1.2, 65% better on thyroid data sets and 97% success on IRIS classification problem. The results clearly show that the proposed algorithm significantly improves the learning speed of the conventional back-propagation algorithm.
While multilayer neural networks (NN's) are a powerful tool for supervised classification their intrinsic nonlinearity often leads to slow convergence or divergence when the training sets include multimodal and/or overlapping classes. Well known optimization techniques improves classification performance and convergence rate and reduces the tendency for divergence. Optimization techniques are also applied to the development of a noniterative perceptron-like algorithm, called vector valued perceptrons (VVP). A comparison of the VVP and the backprop-agation (BP) algorithms for supervised classification indicates that the performance of VVP's is comparable to BP. VVP's are capable of solving multiclass classification problems such as the exclusive-or problem, but require significantly less time (by as much as several orders of magnitude) than BP. This is especially the case for sample data with overlapping classes where BP may converge very slowly, perform poorly or diverge. VVP's applied as an adjunct and preprocessor for NN's in such cases result in improved NN classification performance and reduction in computational time.
Pattern Recognition, 2014
This paper presents a new loss function for neural network classification, inspired by the recently proposed similarity measure called Correntropy. We show that this function essentially behaves like the conventional square loss for samples that are well within the decision boundary and have small errors, and L 0 or counting norm for samples that are outliers or are difficult to classify. Depending on the value of the kernel size parameter, the proposed loss function moves smoothly from convex to non-convex and becomes a close approximation to the misclassification loss (ideal 0-1 loss). We show that the discriminant function obtained by optimizing the proposed loss function in the neighborhood of the ideal 0-1 loss function to train a neural network is immune to overfitting, more robust to outliers, and has consistent and better generalization performance as compared to other commonly used loss functions, even after prolonged training. The results also show that it is a close competitor to the SVM. Since the proposed method is compatible with simple gradient based online learning, it is a practical way of improving the performance of neural network classifiers.
IEEE Transactions on Systems, Man, and Cybernetics, 1992
While multilayer neural networks (NN's) are a powerful tool for supervised classification their intrinsic nonlinearity often leads to slow convergence or divergence when the training sets include multimodal and/or overlapping classes. Well known optimization techniques improves classification performance and convergence rate and reduces the tendency for divergence. Optimization techniques are also applied to the development of a noniterative perceptron-like algorithm, called vector valued perceptrons (VVP). A comparison of the VVP and the backpropagation (BP) algorithms for supervised classification indicates that the performance of VVP's is comparable to BP. VVP's are capable of solving multiclass classification problems such as the exclusive-or problem, but require significantly less time (by as much as several orders of magnitude) than BP. This is especially the case for sample data with overlapping classes where BP may converge very slowly, perform poorly or diverge. VVP's applied as an adjunct and preprocessor for NN's in such cases result in improved NN classification performance and reduction in computational time.
1996 8th European Signal Processing Conference (EUSIPCO 1996), 1996
This paper presents a novel architecture based on a constructive algorithm that allows the network to grow attending to both supervised and unsupervised criteria. The main goal is to end up with a set of discriminant functions able to solve a multi-class classification problem. The main difference with well-known NN-classificators lean on the fact that training is performed over labeled sets of patterns that we call high-level-structures (HLS). Every set contain patterns linked each other by some physical evidence, like neighbor pixels in a subimage or a time-sequence of frequency vectors in a speech utterance, but the membership of every individual pattern in the high-level-structure can not be so clear. This architecture has been tested on a number of artificial data sets and real data sets with very good results. We are now applying the algorithm to classification of real images drawn from the DataBase created for the ALINSPEC project.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.