Proceedings of the sixth annual conference on Computational learning theory - COLT '93, 1993
We investigate the implications of noise in the equivalence query model. Besides some results for... more We investigate the implications of noise in the equivalence query model. Besides some results for general target and hypotheaea classes, we prove bounds on the learning complexity of ddimensional rectangles (of size at most nd) in the case where only rectangles are allowed aa hypotheses. Our noise model assumea that a certain fraction of the examplea is noisy. We show that d-dimensional rectangles are learnable if and only if the fraction of noisy examplea is less than l/(d + 1), where learnable means that the learner can learn the target by a finite number of examples. Besides this structural result we present an algorithm which learns rectangles in poly(-*j) time using 0(-$) examples if the fraction of noise r is less than &. Aa a related result we prove for the noise-free case that the number of examplea necessary to learn is at least Q(A log n), where the best known upper bound on the learning complexity is 0(d2 log n).
Proceedings of IEEE 36th Annual Foundations of Computer Science
Littlestone developed a simple deterministic on-line learning algorithm for learning k-literal di... more Littlestone developed a simple deterministic on-line learning algorithm for learning k-literal disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule T is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence. We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However, the amortized analysis needed for obtaining worst-case mistake bounds requires new techniques. In some cases our lower bounds show that the upper bounds of our algorithm have the right constant in front of the leading term in the mistake bound and almost the right constant in front of the second leading term. Computer experiments support our theoretical findings.
We present theoretical results in terms of lower and upper bounds on the query complexity of nois... more We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a target out of n items can be found in O(log n) queries. We also show the surprising result that for k possible answers per query, the speedup is not log k (as for k-ary search) but only log log k in some cases.
We present a multiclass classification system for gray value images through boosting. The feature... more We present a multiclass classification system for gray value images through boosting. The feature selection is done using the LPBoost algorithm which selects suitable features of adequate type. In our experiments we use up to nine different kinds of feature types simultaneously. Furthermore, a greedy search strategy within the weak learner is used to find simple geometric relations between selected features from previous boosting rounds. The final hypothesis can also consist of more than one geometric model for an object class. Finally, we provide a weight optimization method for combining the learned one-vs-one classifiers for the multiclass classification. We tested our approach on a publicly available data set and compared our results to other state-of-the-art approaches, such as the "bag of keypoints" method.
We investigate a variant of the on-line learning model for classes of {0, 1}-valued functions (co... more We investigate a variant of the on-line learning model for classes of {0, 1}-valued functions (concepts) in which the labels of a certain amount of the input instances are corrupted by adversarial noise. We propose an extension of a general learning strategy, known as "Closure Algorithm", to this noise model, and show a worst-case mistake bound of m+(d+1)K for learning an arbitrary intersection-closed concept class C, where K is the number of noisy labels, d is a combinatorial parameter measuring C's complexity, and m is the worst-case mistake bound of the Closure Algorithm for learning C in the noise-free model. For several concept classes our extended Closure Algorithm is efficient and can tolerate a noise rate up to the information-theoretic upper bound. Finally, we show how to efficiently turn any algorithm for the on-line noise model into a learning algorithm for the PAC model with malicious noise.
We investigate models for content-based image retrieval with relevance feedback, in particular fo... more We investigate models for content-based image retrieval with relevance feedback, in particular focusing on the exploration-exploitation dilemma. We propose quantitative models for the user behavior and investigate implications of these models. Three search algorithms for efficient searches based on the user models are proposed and evaluated. In the first model a user queries a database for the most (or a sufficiently) relevant image. The user gives feedback to the system by selecting the most relevant image from a number of images presented by the system. In the second model we consider a filtering task where relevant images should be extracted from a database and presented to the user. The feedback of the user is a binary classification of each presented image as relevant or irrelevant. While these models are related, they differ significantly in the kind of feedback provided by the user. This requires very different mechanisms to trade off exploration (finding out what the user wants) and exploitation (serving images which the system believes relevant for the user).
We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at... more We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "real-world" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAC-learning algorithm is shown to be applicable to "real-world" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2-level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for which no guarantee can be given about their performance on new datasets. We also demonstrate that this algorithm T2 can be used as a diagnostic tool for the investigation of the expressive limits of 2-level decision trees. Finally, T2, in combination with new bounds on the VC-dimension of decision trees of bounded depth that we derive, provides us now for the first time with the tools necessary for comparing learning curves of decision trees for "real-world" datasets with the theoretical estimates of PAClearning theory.
One of the striking differences between current reinforcement learning algorithms and early human... more One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in a manner appropriate to their current level of skills. An important intuition for autonomously motivated exploration was proposed by Schmidhuber [1,2]: an agent should be interested in making observations that reduce its uncertainty about future observations. However, there is not yet a theoretical analysis of the usefulness of autonomous exploration in respect to the overall performance of a learning agent. We discuss models for a learning agent's autonomous exploration and present some recent results. In particular, we investigate the exploration time for navigating effectively in a Markov Decsion Process (MDP) without rewards, and we consider extensions to MDPs with infinite state spaces.
In content-based image retrieval (CBIR) with relevance feedback we would like to retrieve relevan... more In content-based image retrieval (CBIR) with relevance feedback we would like to retrieve relevant images based on their content features and the feedback given by users. In this paper we view CBIR as an Exploration-Exploitation problem and apply a kernel version of the LinRel algorithm to solve it. By using multiple feature extraction methods and utilising the feedback given by users, we adopt a strategy of multiple kernel learning to find a relevant feature space for the kernel LinRel algorithm. We call this algorithm LinRelMKL. Furthermore, when we have access to eye movement data of users viewing images we can enrich our (multiple) feature spaces by using a tensor kernel SVM. When learning in this enriched space we show that we can significantly improve the search results over the LinRel and LinRelMKL algorithms. Our results suggest that the use of exploration-exploitation with multiple feature spaces is an efficient way of constructing CBIR systems, and that when eye movement features are available, they should be used to help improve CBIR.
We present a PAC-learning algorithm and an on-line learning algorithm for nested differences of i... more We present a PAC-learning algorithm and an on-line learning algorithm for nested differences of intersection-closed classes. Examples of intersection-closed classes include axis-parallel rectangles, monomials, and linear sub-spaces. Our PAC-learning algorithm uses a pruning technique that we rigorously proof correct. As a result we show that the tolerable noise rate for this algorithm does not depend on the complexity (VC-dimension) of the target class but only on the VC-dimension of the underlying intersection-closed class. For our on-line algorithm we show an optimal mistake bound in the sense that there are concept classes for which each on-line learning algorithm (using nested differences as hypotheses) can be forced to make at least that many mistakes.
RAIRO - Theoretical Informatics and Applications, 2006
We study the problem of designing a distributed voting scheme for electing a candidate that maxim... more We study the problem of designing a distributed voting scheme for electing a candidate that maximizes the preferences of a set of agents. We assume the preference of agent i for candidate j is a real number xi,j, and we do not make any assumptions on the mechanism generating these preferences. We show simple randomized voting schemes guaranteeing the election of a candidate whose expected total preference is nearly the highest among all candidates. The algorithms we consider are designed so that each agent has to disclose only a few bits of information from his preference table. Finally, in the important special case in which each agent is forced to vote for at most one candidate we show that our voting scheme is essentially optimal.
The majority of results in computational learning theory are concerned with concept learning, i.e... more The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range {0, 1}. Much less is known about the theory of learning functions with a larger fange such as Nor IR. In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models. We introduce in this paper the notion of a binaly branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not invoive any "learning" model). Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individual function classes. Furthermore, we exhibit an efficient leaming algorithm for learning convex piecewise linear functions from Rd into IR. Previously, the class of linear functions from 1R d into R was the only class of functions with multidimensional domain that was known to be learnable within the rigorous framework of a formal model for online leaming. Finally we give a sufficient condition for an arbitrary class 5 ~ of functions from IR into R that allows us to learn the class of all functions that can be written as the pointwise maximum of k functions from 5 r. This allows us to exhibit a number of further nontrivial classes of functions from ~ into R for which there exist eflicient ]earning algorithms.
For hyper-rectangles in R d Auer (1997) proved a PAC bound of O(1 ε (d + log 1 δ)), where ε and δ... more For hyper-rectangles in R d Auer (1997) proved a PAC bound of O(1 ε (d + log 1 δ)), where ε and δ are the accuracy and confidence parameters. It is still an open question whether one can obtain the same bound for intersection-closed concept classes of VC-dimension d in general. We present a step towards a solution of this problem showing on one hand a new PAC bound of O(1 ε (d log d + log 1 δ)) for arbitrary intersection-closed concept classes, complementing the well-known bounds O(1 ε (log 1 δ + d log 1 ε)) and O(d ε log 1 δ) of Blumer et al. (1989) and Haussler, Littlestone and Warmuth (1994). Our bound is established using the closure algorithm, that generates as its hypothesis the intersection of all concepts that are consistent with the positive training examples. On the other hand, we show that many intersection-closed concept classes including e.g. maximum intersection-closed classes satisfy an additional combinatorial property that allows a proof of the optimal bound of O(1 ε (d + log 1 δ)). For such improved bounds the choice of the learning algorithm is crucial, as there are consistent learning algorithms that need (1 ε (d log 1 ε + log 1 δ)) examples to learn some particular maximum intersection-closed concept classes.
We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning... more We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an algorithm that can ask membership queries. Previously known results imply that the constant factor in our bound is best possible.
We study on-line learning in the linear regression framework. Most of the performance bounds for ... more We study on-line learning in the linear regression framework. Most of the performance bounds for on-line algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the whole sequence of examples and thus it is not available to any strictly on-line algorithm. We introduce new techniques for adaptively tuning the learning rate as the data sequence is progressively revealed. Our techniques allow us to prove essentially the same bounds as if we knew the optimal learning rate in advance. Moreover, such techniques apply to a wide class of on-line algorithms, including p-norm algorithms for generalized linear regression and Weighted Majority for linear regression with absolute loss. Our adaptive tunings are radically different from previous techniques, such as the so-called doubling trick. Whereas the doubling trick restarts the on-line algorithm several times using a constant learning rate for each run, our methods save information by changing the value of the learning rate very smoothly. In fact, for Weighted Majority over a finite set of experts our analysis provides a better leading constant than the doubling trick.
We give an adversary strategy that forces the Perceptron algorithm to make a(kN) mistakes in lear... more We give an adversary strategy that forces the Perceptron algorithm to make a(kN) mistakes in learning monotone disjunctions over N variables with at most k literals. In contrast, Littlestone's algorithm Winnow makes at most 0(k log N) mistakes for the same problem. Both algorithms use thresholded linear functions as their hypotheses. However, Winnow does multiplicative updates to its weight vector instead of the additive updates of the Perceptron algorithm. In general, we call an algorithm additive if its weight vector is always a sum of a fixed initial weight vector and some linear combination of already seen instances. Thus, the Perceptron algorithm is an example of an additive algorithm. We show that an adversary can force any additive algorithm to make (N + k-1) /2 mistakes in learning a monotone disjunction of at most k literals. Simple experiments show that for k < N, Winnow clearly outperforms the Perceptron algorithm also on nonadversarial random data. @ 1997 Elsevier Science B.V.
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total... more For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s there is a policy which moves from s to s in at most D steps (on average). We present a reinforcement learning algorithm with total regretÕ(DS √ AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Ω(√ DSAT) on the total regret of any learning algorithm.
We consider reinforcement learning in changing Markov Decision Processes where both the state-tra... more We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.
2021 13th International Conference on Knowledge and Smart Technology (KST), 2021
Autonomous driving cars are important due to improved safety and fuel efficiency. Various techniq... more Autonomous driving cars are important due to improved safety and fuel efficiency. Various techniques have been described to consider only a single task, for example, recognition, prediction, and planning with supervised learning techniques. Some limitations of previous studies are: (1) human bias from human demonstration; (2) the need for multiple components such as localization, road mapping etc. with a complicated fusion logic; (3) in reinforcement learning, the focus was mostly on the learning algorithms but less on the evaluation of different sensors and reward functions. We describe end-to-end reinforcement learning for an autonomous car, which used only a single reinforcement learning model to create the autonomous car. Further, we designed a new efficient reward function to make the agent learn faster (18% improvement for all settings compared to the baseline reward function) and build the car with only the necessary perceptions and sensors. We show that it performed better with state-of-the-art off-policy reinforcement learning for continuous action (SAC, TD3).
Proceedings of the sixth annual conference on Computational learning theory - COLT '93, 1993
We investigate the implications of noise in the equivalence query model. Besides some results for... more We investigate the implications of noise in the equivalence query model. Besides some results for general target and hypotheaea classes, we prove bounds on the learning complexity of ddimensional rectangles (of size at most nd) in the case where only rectangles are allowed aa hypotheses. Our noise model assumea that a certain fraction of the examplea is noisy. We show that d-dimensional rectangles are learnable if and only if the fraction of noisy examplea is less than l/(d + 1), where learnable means that the learner can learn the target by a finite number of examples. Besides this structural result we present an algorithm which learns rectangles in poly(-*j) time using 0(-$) examples if the fraction of noise r is less than &. Aa a related result we prove for the noise-free case that the number of examplea necessary to learn is at least Q(A log n), where the best known upper bound on the learning complexity is 0(d2 log n).
Proceedings of IEEE 36th Annual Foundations of Computer Science
Littlestone developed a simple deterministic on-line learning algorithm for learning k-literal di... more Littlestone developed a simple deterministic on-line learning algorithm for learning k-literal disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule T is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence. We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However, the amortized analysis needed for obtaining worst-case mistake bounds requires new techniques. In some cases our lower bounds show that the upper bounds of our algorithm have the right constant in front of the leading term in the mistake bound and almost the right constant in front of the second leading term. Computer experiments support our theoretical findings.
We present theoretical results in terms of lower and upper bounds on the query complexity of nois... more We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a target out of n items can be found in O(log n) queries. We also show the surprising result that for k possible answers per query, the speedup is not log k (as for k-ary search) but only log log k in some cases.
We present a multiclass classification system for gray value images through boosting. The feature... more We present a multiclass classification system for gray value images through boosting. The feature selection is done using the LPBoost algorithm which selects suitable features of adequate type. In our experiments we use up to nine different kinds of feature types simultaneously. Furthermore, a greedy search strategy within the weak learner is used to find simple geometric relations between selected features from previous boosting rounds. The final hypothesis can also consist of more than one geometric model for an object class. Finally, we provide a weight optimization method for combining the learned one-vs-one classifiers for the multiclass classification. We tested our approach on a publicly available data set and compared our results to other state-of-the-art approaches, such as the "bag of keypoints" method.
We investigate a variant of the on-line learning model for classes of {0, 1}-valued functions (co... more We investigate a variant of the on-line learning model for classes of {0, 1}-valued functions (concepts) in which the labels of a certain amount of the input instances are corrupted by adversarial noise. We propose an extension of a general learning strategy, known as "Closure Algorithm", to this noise model, and show a worst-case mistake bound of m+(d+1)K for learning an arbitrary intersection-closed concept class C, where K is the number of noisy labels, d is a combinatorial parameter measuring C's complexity, and m is the worst-case mistake bound of the Closure Algorithm for learning C in the noise-free model. For several concept classes our extended Closure Algorithm is efficient and can tolerate a noise rate up to the information-theoretic upper bound. Finally, we show how to efficiently turn any algorithm for the on-line noise model into a learning algorithm for the PAC model with malicious noise.
We investigate models for content-based image retrieval with relevance feedback, in particular fo... more We investigate models for content-based image retrieval with relevance feedback, in particular focusing on the exploration-exploitation dilemma. We propose quantitative models for the user behavior and investigate implications of these models. Three search algorithms for efficient searches based on the user models are proposed and evaluated. In the first model a user queries a database for the most (or a sufficiently) relevant image. The user gives feedback to the system by selecting the most relevant image from a number of images presented by the system. In the second model we consider a filtering task where relevant images should be extracted from a database and presented to the user. The feedback of the user is a binary classification of each presented image as relevant or irrelevant. While these models are related, they differ significantly in the kind of feedback provided by the user. This requires very different mechanisms to trade off exploration (finding out what the user wants) and exploitation (serving images which the system believes relevant for the user).
We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at... more We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "real-world" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAC-learning algorithm is shown to be applicable to "real-world" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2-level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for which no guarantee can be given about their performance on new datasets. We also demonstrate that this algorithm T2 can be used as a diagnostic tool for the investigation of the expressive limits of 2-level decision trees. Finally, T2, in combination with new bounds on the VC-dimension of decision trees of bounded depth that we derive, provides us now for the first time with the tools necessary for comparing learning curves of decision trees for "real-world" datasets with the theoretical estimates of PAClearning theory.
One of the striking differences between current reinforcement learning algorithms and early human... more One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in a manner appropriate to their current level of skills. An important intuition for autonomously motivated exploration was proposed by Schmidhuber [1,2]: an agent should be interested in making observations that reduce its uncertainty about future observations. However, there is not yet a theoretical analysis of the usefulness of autonomous exploration in respect to the overall performance of a learning agent. We discuss models for a learning agent's autonomous exploration and present some recent results. In particular, we investigate the exploration time for navigating effectively in a Markov Decsion Process (MDP) without rewards, and we consider extensions to MDPs with infinite state spaces.
In content-based image retrieval (CBIR) with relevance feedback we would like to retrieve relevan... more In content-based image retrieval (CBIR) with relevance feedback we would like to retrieve relevant images based on their content features and the feedback given by users. In this paper we view CBIR as an Exploration-Exploitation problem and apply a kernel version of the LinRel algorithm to solve it. By using multiple feature extraction methods and utilising the feedback given by users, we adopt a strategy of multiple kernel learning to find a relevant feature space for the kernel LinRel algorithm. We call this algorithm LinRelMKL. Furthermore, when we have access to eye movement data of users viewing images we can enrich our (multiple) feature spaces by using a tensor kernel SVM. When learning in this enriched space we show that we can significantly improve the search results over the LinRel and LinRelMKL algorithms. Our results suggest that the use of exploration-exploitation with multiple feature spaces is an efficient way of constructing CBIR systems, and that when eye movement features are available, they should be used to help improve CBIR.
We present a PAC-learning algorithm and an on-line learning algorithm for nested differences of i... more We present a PAC-learning algorithm and an on-line learning algorithm for nested differences of intersection-closed classes. Examples of intersection-closed classes include axis-parallel rectangles, monomials, and linear sub-spaces. Our PAC-learning algorithm uses a pruning technique that we rigorously proof correct. As a result we show that the tolerable noise rate for this algorithm does not depend on the complexity (VC-dimension) of the target class but only on the VC-dimension of the underlying intersection-closed class. For our on-line algorithm we show an optimal mistake bound in the sense that there are concept classes for which each on-line learning algorithm (using nested differences as hypotheses) can be forced to make at least that many mistakes.
RAIRO - Theoretical Informatics and Applications, 2006
We study the problem of designing a distributed voting scheme for electing a candidate that maxim... more We study the problem of designing a distributed voting scheme for electing a candidate that maximizes the preferences of a set of agents. We assume the preference of agent i for candidate j is a real number xi,j, and we do not make any assumptions on the mechanism generating these preferences. We show simple randomized voting schemes guaranteeing the election of a candidate whose expected total preference is nearly the highest among all candidates. The algorithms we consider are designed so that each agent has to disclose only a few bits of information from his preference table. Finally, in the important special case in which each agent is forced to vote for at most one candidate we show that our voting scheme is essentially optimal.
The majority of results in computational learning theory are concerned with concept learning, i.e... more The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range {0, 1}. Much less is known about the theory of learning functions with a larger fange such as Nor IR. In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models. We introduce in this paper the notion of a binaly branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not invoive any "learning" model). Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individual function classes. Furthermore, we exhibit an efficient leaming algorithm for learning convex piecewise linear functions from Rd into IR. Previously, the class of linear functions from 1R d into R was the only class of functions with multidimensional domain that was known to be learnable within the rigorous framework of a formal model for online leaming. Finally we give a sufficient condition for an arbitrary class 5 ~ of functions from IR into R that allows us to learn the class of all functions that can be written as the pointwise maximum of k functions from 5 r. This allows us to exhibit a number of further nontrivial classes of functions from ~ into R for which there exist eflicient ]earning algorithms.
For hyper-rectangles in R d Auer (1997) proved a PAC bound of O(1 ε (d + log 1 δ)), where ε and δ... more For hyper-rectangles in R d Auer (1997) proved a PAC bound of O(1 ε (d + log 1 δ)), where ε and δ are the accuracy and confidence parameters. It is still an open question whether one can obtain the same bound for intersection-closed concept classes of VC-dimension d in general. We present a step towards a solution of this problem showing on one hand a new PAC bound of O(1 ε (d log d + log 1 δ)) for arbitrary intersection-closed concept classes, complementing the well-known bounds O(1 ε (log 1 δ + d log 1 ε)) and O(d ε log 1 δ) of Blumer et al. (1989) and Haussler, Littlestone and Warmuth (1994). Our bound is established using the closure algorithm, that generates as its hypothesis the intersection of all concepts that are consistent with the positive training examples. On the other hand, we show that many intersection-closed concept classes including e.g. maximum intersection-closed classes satisfy an additional combinatorial property that allows a proof of the optimal bound of O(1 ε (d + log 1 δ)). For such improved bounds the choice of the learning algorithm is crucial, as there are consistent learning algorithms that need (1 ε (d log 1 ε + log 1 δ)) examples to learn some particular maximum intersection-closed concept classes.
We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning... more We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an algorithm that can ask membership queries. Previously known results imply that the constant factor in our bound is best possible.
We study on-line learning in the linear regression framework. Most of the performance bounds for ... more We study on-line learning in the linear regression framework. Most of the performance bounds for on-line algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the whole sequence of examples and thus it is not available to any strictly on-line algorithm. We introduce new techniques for adaptively tuning the learning rate as the data sequence is progressively revealed. Our techniques allow us to prove essentially the same bounds as if we knew the optimal learning rate in advance. Moreover, such techniques apply to a wide class of on-line algorithms, including p-norm algorithms for generalized linear regression and Weighted Majority for linear regression with absolute loss. Our adaptive tunings are radically different from previous techniques, such as the so-called doubling trick. Whereas the doubling trick restarts the on-line algorithm several times using a constant learning rate for each run, our methods save information by changing the value of the learning rate very smoothly. In fact, for Weighted Majority over a finite set of experts our analysis provides a better leading constant than the doubling trick.
We give an adversary strategy that forces the Perceptron algorithm to make a(kN) mistakes in lear... more We give an adversary strategy that forces the Perceptron algorithm to make a(kN) mistakes in learning monotone disjunctions over N variables with at most k literals. In contrast, Littlestone's algorithm Winnow makes at most 0(k log N) mistakes for the same problem. Both algorithms use thresholded linear functions as their hypotheses. However, Winnow does multiplicative updates to its weight vector instead of the additive updates of the Perceptron algorithm. In general, we call an algorithm additive if its weight vector is always a sum of a fixed initial weight vector and some linear combination of already seen instances. Thus, the Perceptron algorithm is an example of an additive algorithm. We show that an adversary can force any additive algorithm to make (N + k-1) /2 mistakes in learning a monotone disjunction of at most k literals. Simple experiments show that for k < N, Winnow clearly outperforms the Perceptron algorithm also on nonadversarial random data. @ 1997 Elsevier Science B.V.
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total... more For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s there is a policy which moves from s to s in at most D steps (on average). We present a reinforcement learning algorithm with total regretÕ(DS √ AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Ω(√ DSAT) on the total regret of any learning algorithm.
We consider reinforcement learning in changing Markov Decision Processes where both the state-tra... more We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.
2021 13th International Conference on Knowledge and Smart Technology (KST), 2021
Autonomous driving cars are important due to improved safety and fuel efficiency. Various techniq... more Autonomous driving cars are important due to improved safety and fuel efficiency. Various techniques have been described to consider only a single task, for example, recognition, prediction, and planning with supervised learning techniques. Some limitations of previous studies are: (1) human bias from human demonstration; (2) the need for multiple components such as localization, road mapping etc. with a complicated fusion logic; (3) in reinforcement learning, the focus was mostly on the learning algorithms but less on the evaluation of different sensors and reward functions. We describe end-to-end reinforcement learning for an autonomous car, which used only a single reinforcement learning model to create the autonomous car. Further, we designed a new efficient reward function to make the agent learn faster (18% improvement for all settings compared to the baseline reward function) and build the car with only the necessary perceptions and sensors. We show that it performed better with state-of-the-art off-policy reinforcement learning for continuous action (SAC, TD3).
Uploads
Papers by Peter Auer