Skip to main content

Christian Osendorfer

TUM, Computer Science, Department Member

Followers

23

Following

18

Co-authors

5

Public Views

Address: Germany

less

Rossano Schifanella

Università degli Studi di Torino

Brooklyn College of CUNY

SMKN 1 Kediri Indonesia

K.Sercan Bayram

Hasan Kalyoncu University

Merve Ayyüce Kızrak, Ph.D.

Bahcesehir University

City, University of London

COMSATS Institute of Information Technology

Emmanouil Benetos

Queen Mary, University of London

Interests

Uploads

Papers by Christian Osendorfer

Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine

2011 10th International Conference on Machine Learning and Applications and Workshops, 2011

Existing content-based music similarity estimation methods largely build on complex hand-crafted ... more Existing content-based music similarity estimation methods largely build on complex hand-crafted feature extractors, which are difficult to engineer. As an alternative, unsupervised machine learning allows to learn features empirically from data. We train a recently proposed model, the mean-covariance Restricted Boltzmann Machine [1], on music spectrogram excerpts and employ it for music similarity estimation. In k-NN based genre retrieval experiments on three datasets, it clearly outperforms MFCC-based methods, beats simple unsupervised feature extraction using k-Means and comes close to the stateof-the-art. This shows that unsupervised feature extraction poses a viable alternative to engineered features.

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Image Super-Resolution with Fast Approximate Convolutional Sparse Coding

by Patrick van der Smagt and Christian Osendorfer

We present a computationally efficient architecture for image super-resolution that achieves stat... more We present a computationally efficient architecture for image
super-resolution that achieves state-of-the-art results on images with large spatial extend. Apart from utilizing Convolutional Neural Networks, our approach leverages recent advances in fast approximate inference for sparse coding. We empirically show that upsampling methods work much better on latent representations than in the original spatial domain. Our experiments indicate that the proposed architecture can serve as a basis for additional future improvements in image superresolution.

Sequential Feature Selection for Classification

Lecture Notes in Computer Science, 2011

In most real-world information processing problems, data is not a free resource; its acquisition ... more In most real-world information processing problems, data is not a free resource; its acquisition is rather time-consuming and/or expensive. We investigate how these two factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Our method performs a sequential feature selection that learns which features are most informative at each timestep, choosing the next feature depending on the already selected features and the internal belief of the classifier. Experiments on a handwritten digits classification task show significant reduction in required data for correct classification, while a medical diabetes prediction task illustrates variable feature cost minimization as a further property of our algorithm.

Estimating finger grip force from an image of the hand using Convolutional Neural Networks and Gaussian processes

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

2014 IEEE International Conference on Robotics and Automation (ICRA), 2014

ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.

Unsupervised Feature Learning for low-level Local Image Descriptors

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Unsupervised feature learning has shown impressive results for a wide range of input modalities, ... more Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the...

Fast image super-resolution utilizing convolutional neural networks

Improving approximate RPCA with a k-sparsity prior

Variational inference of latent state sequences using Recurrent Networks

by Justin Bayer and Christian Osendorfer

ABSTRACT Recent advances in the estimation of deep directed graphical models and recurrent networ... more ABSTRACT Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate inference, iii) enable the design of rich emission models which iv) will naturally lead to structured outputs. Two different paths of introducing latent state sequences are pursued, leading to the variational recurrent auto encoder (VRAE) and the variational one step predictor (VOSP). The use of independent Wiener processes as priors on the latent state sequence is a viable compromise between efficient computation of the Kullback-Leibler divergence from the variational approximation of the posterior and maintaining a reasonable belief in the dynamics. We verify our methods empirically, obtaining results close or superior to the state of the art. We also show qualitative results for denoising and missing value imputation.

Learning Stochastic Recurrent Networks

by Justin Bayer and Christian Osendorfer

ABSTRACT Leveraging advances in variational inference, we propose to enhance recurrent neural net... more ABSTRACT Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deterministic recurrent neural networks. We evaluate the method on four polyphonic musical data sets and motion capture data.

On Fast Dropout and its Applicability to Recurrent Networks

by Patrick van der Smagt, Christian Osendorfer, and Nutan Chen

Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent w... more Recurrent Neural Networks (RNNs) are rich models for the processing of sequential
data. Recent work on advancing the state of the art has been focused on the
optimization or modelling of RNNs, mostly motivated by adressing the problems
of the vanishing and exploding gradients. The control of overfitting has seen considerably
less attention. This paper contributes to that by analyzing fast dropout,
a recent regularization method for generalized linear models and neural networks
from a back-propagation inspired perspective. We show that fast dropout implements
a quadratic form of an adaptive, per-parameter regularizer, which rewards
large weights in the light of underfitting, penalizes them for overconfident predictions
and vanishes at minima of an unregularized training loss. The derivatives
of that regularizer are exclusively based on the training error signal. One consequence
of this is the absence of a global weight attractor, which is particularly
appealing for RNNs, since the dynamics are not biased towards a certain regime.
We positively test the hypothesis that this improves the performance of RNNs on
four musical data sets.

Computing grip force and torque from finger nail images using Gaussian processes

by Patrick van der Smagt, Christian Osendorfer, and Sebastian Urban

We demonstrate a simple approach with which finger force can be measured from nail coloration. By... more We demonstrate a simple approach with which finger force can be measured from nail coloration. By automatically extracting features from nail images of a finger-mounted CCD camera, we can directly relate these images to the force measured by a force-torque sensor. The method automatically corrects orientation and illumination di erences.

Model-free robot anomaly detection

by Rachel Hornung, Patrick van der Smagt, and Christian Osendorfer

Safety is one of the key issues in the use of robots, especially when human–robot interaction is ... more Safety is one of the key issues in the use of
robots, especially when human–robot interaction is targeted.
Although unforeseen environment situations, such as collisions
or unexpected user interaction, can be handled with specially
tailored control algorithms, hard- or software failures typically
lead to situations where too large torques are controlled, which
cause an emergency state: hitting an end stop, exceeding
a torque, and so on—which often halts the robot when it
is too late. No sufficiently fast and reliable methods exist
which can early detect faults in the abundance of sensor and
controller data. This is especially difficult since, in most cases,
no anomaly data are available. In this paper we introduce a new
robot anomaly detection system (RADS) which can cope with
abundant data in which no or very little anomaly information
is present.

Convolutional Neural Networks learn compact local image descriptors

by Patrick van der Smagt, Sebastian Urban, and Christian Osendorfer

ICONIP

We investigate if a deep Convolutional Neural Network can learn representations of local image pa... more We investigate if a deep Convolutional Neural Network can learn representations of local image patches that are usable in the impor- tant task of keypoint matching. We examine several possible loss func- tions for this correspondance task and show emprically that a newly suggested loss formulation allows a Convolutional Neural Network to find compact local image descriptors that perform comparably to state- of-the-art approaches.

Learning Sequence Neighbourhood Metrics

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Lecture Notes in Computer Science, 2012

Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood com... more Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R n .

Training Neural Networks with Implicit Variance

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Lecture Notes in Computer Science, 2013

We present a novel method to train predictive Gaussian distributions p(z|x) for regression proble... more We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of mean squared error and likelihood.

Minimizing data consumption with sequential online feature selection

International Journal of Machine Learning and Cybernetics, 2013

In most real-world information processing problems, data is not a free resource. Its acquisition ... more In most real-world information processing problems, data is not a free resource. Its acquisition is often expensive and time-consuming. We investigate how such cost factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Depending on previously selected features and the internal belief of the classifier, a next feature is chosen by a sequential online feature selection that learns which features are most informative at each time step. Experiments on toy datasets and a handwritten digits classification task show significant reduction in required data for correct classification, while a medical diabetes prediction task illustrates variable feature cost minimization as a further property of our algorithm.

Unsupervised Feature Learning for local image descriptors

by Justin Bayer, Patrick van der Smagt, and Christian Osendorfer

Unsupervised learning of low-level audio features for music similarity estimation

While there is an enormous amount of music data available, the field of music analysis almost exc... more While there is an enormous amount of music data available, the field of music analysis almost exclusively uses manually designed features. In this work we learn features from music data in a completely unsupervised way and evaluate them on a musical genre classification task. We achieve results very close to state-of-the-art performance which relies on highly hand-tuned feature extractors.

Policy Gradients for Cryptanalysis

So-called Physical Unclonable Functions are an emerging, new cryptographic and security primitive... more So-called Physical Unclonable Functions are an emerging, new cryptographic and security primitive. They can potentially replace secret binary keys in vulnerable hardware systems and have other security advantages. In this paper, we deal with the cryptanalysis of this new primitive by use of machine learning methods. In particular, we investigate to what extent the security of circuit-based PUFs can be challenged by a new machine learning technique named Policy Gradients with Parameter-based Exploration. Our findings show that this technique has several important advantages in cryptanalysis of Physical Unclonable Functions compared to other machine learning fields and to other policy gradient methods.

Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine

2011 10th International Conference on Machine Learning and Applications and Workshops, 2011

Existing content-based music similarity estimation methods largely build on complex hand-crafted ... more Existing content-based music similarity estimation methods largely build on complex hand-crafted feature extractors, which are difficult to engineer. As an alternative, unsupervised machine learning allows to learn features empirically from data. We train a recently proposed model, the mean-covariance Restricted Boltzmann Machine [1], on music spectrogram excerpts and employ it for music similarity estimation. In k-NN based genre retrieval experiments on three datasets, it clearly outperforms MFCC-based methods, beats simple unsupervised feature extraction using k-Means and comes close to the stateof-the-art. This shows that unsupervised feature extraction poses a viable alternative to engineered features.

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Image Super-Resolution with Fast Approximate Convolutional Sparse Coding

by Patrick van der Smagt and Christian Osendorfer

We present a computationally efficient architecture for image super-resolution that achieves stat... more We present a computationally efficient architecture for image
super-resolution that achieves state-of-the-art results on images with large spatial extend. Apart from utilizing Convolutional Neural Networks, our approach leverages recent advances in fast approximate inference for sparse coding. We empirically show that upsampling methods work much better on latent representations than in the original spatial domain. Our experiments indicate that the proposed architecture can serve as a basis for additional future improvements in image superresolution.

Sequential Feature Selection for Classification

Lecture Notes in Computer Science, 2011

In most real-world information processing problems, data is not a free resource; its acquisition ... more In most real-world information processing problems, data is not a free resource; its acquisition is rather time-consuming and/or expensive. We investigate how these two factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Our method performs a sequential feature selection that learns which features are most informative at each timestep, choosing the next feature depending on the already selected features and the internal belief of the classifier. Experiments on a handwritten digits classification task show significant reduction in required data for correct classification, while a medical diabetes prediction task illustrates variable feature cost minimization as a further property of our algorithm.

Estimating finger grip force from an image of the hand using Convolutional Neural Networks and Gaussian processes

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

2014 IEEE International Conference on Robotics and Automation (ICRA), 2014

ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.

Unsupervised Feature Learning for low-level Local Image Descriptors

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Unsupervised feature learning has shown impressive results for a wide range of input modalities, ... more Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the...

Fast image super-resolution utilizing convolutional neural networks

Improving approximate RPCA with a k-sparsity prior

Variational inference of latent state sequences using Recurrent Networks

by Justin Bayer and Christian Osendorfer

ABSTRACT Recent advances in the estimation of deep directed graphical models and recurrent networ... more ABSTRACT Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate inference, iii) enable the design of rich emission models which iv) will naturally lead to structured outputs. Two different paths of introducing latent state sequences are pursued, leading to the variational recurrent auto encoder (VRAE) and the variational one step predictor (VOSP). The use of independent Wiener processes as priors on the latent state sequence is a viable compromise between efficient computation of the Kullback-Leibler divergence from the variational approximation of the posterior and maintaining a reasonable belief in the dynamics. We verify our methods empirically, obtaining results close or superior to the state of the art. We also show qualitative results for denoising and missing value imputation.

Learning Stochastic Recurrent Networks

by Justin Bayer and Christian Osendorfer

ABSTRACT Leveraging advances in variational inference, we propose to enhance recurrent neural net... more ABSTRACT Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deterministic recurrent neural networks. We evaluate the method on four polyphonic musical data sets and motion capture data.

On Fast Dropout and its Applicability to Recurrent Networks

by Patrick van der Smagt, Christian Osendorfer, and Nutan Chen

Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent w... more Recurrent Neural Networks (RNNs) are rich models for the processing of sequential
data. Recent work on advancing the state of the art has been focused on the
optimization or modelling of RNNs, mostly motivated by adressing the problems
of the vanishing and exploding gradients. The control of overfitting has seen considerably
less attention. This paper contributes to that by analyzing fast dropout,
a recent regularization method for generalized linear models and neural networks
from a back-propagation inspired perspective. We show that fast dropout implements
a quadratic form of an adaptive, per-parameter regularizer, which rewards
large weights in the light of underfitting, penalizes them for overconfident predictions
and vanishes at minima of an unregularized training loss. The derivatives
of that regularizer are exclusively based on the training error signal. One consequence
of this is the absence of a global weight attractor, which is particularly
appealing for RNNs, since the dynamics are not biased towards a certain regime.
We positively test the hypothesis that this improves the performance of RNNs on
four musical data sets.

Computing grip force and torque from finger nail images using Gaussian processes

by Patrick van der Smagt, Christian Osendorfer, and Sebastian Urban

We demonstrate a simple approach with which finger force can be measured from nail coloration. By... more We demonstrate a simple approach with which finger force can be measured from nail coloration. By automatically extracting features from nail images of a finger-mounted CCD camera, we can directly relate these images to the force measured by a force-torque sensor. The method automatically corrects orientation and illumination di erences.

Model-free robot anomaly detection

by Rachel Hornung, Patrick van der Smagt, and Christian Osendorfer

Safety is one of the key issues in the use of robots, especially when human–robot interaction is ... more Safety is one of the key issues in the use of
robots, especially when human–robot interaction is targeted.
Although unforeseen environment situations, such as collisions
or unexpected user interaction, can be handled with specially
tailored control algorithms, hard- or software failures typically
lead to situations where too large torques are controlled, which
cause an emergency state: hitting an end stop, exceeding
a torque, and so on—which often halts the robot when it
is too late. No sufficiently fast and reliable methods exist
which can early detect faults in the abundance of sensor and
controller data. This is especially difficult since, in most cases,
no anomaly data are available. In this paper we introduce a new
robot anomaly detection system (RADS) which can cope with
abundant data in which no or very little anomaly information
is present.

Convolutional Neural Networks learn compact local image descriptors

by Patrick van der Smagt, Sebastian Urban, and Christian Osendorfer

ICONIP

We investigate if a deep Convolutional Neural Network can learn representations of local image pa... more We investigate if a deep Convolutional Neural Network can learn representations of local image patches that are usable in the impor- tant task of keypoint matching. We examine several possible loss func- tions for this correspondance task and show emprically that a newly suggested loss formulation allows a Convolutional Neural Network to find compact local image descriptors that perform comparably to state- of-the-art approaches.

Learning Sequence Neighbourhood Metrics

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Lecture Notes in Computer Science, 2012

Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood com... more Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R n .

Training Neural Networks with Implicit Variance

by Patrick van der Smagt, Justin Bayer, and Christian Osendorfer

Lecture Notes in Computer Science, 2013

We present a novel method to train predictive Gaussian distributions p(z|x) for regression proble... more We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of mean squared error and likelihood.

Minimizing data consumption with sequential online feature selection

International Journal of Machine Learning and Cybernetics, 2013

In most real-world information processing problems, data is not a free resource. Its acquisition ... more In most real-world information processing problems, data is not a free resource. Its acquisition is often expensive and time-consuming. We investigate how such cost factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Depending on previously selected features and the internal belief of the classifier, a next feature is chosen by a sequential online feature selection that learns which features are most informative at each time step. Experiments on toy datasets and a handwritten digits classification task show significant reduction in required data for correct classification, while a medical diabetes prediction task illustrates variable feature cost minimization as a further property of our algorithm.

Unsupervised Feature Learning for local image descriptors

by Justin Bayer, Patrick van der Smagt, and Christian Osendorfer

Unsupervised learning of low-level audio features for music similarity estimation

While there is an enormous amount of music data available, the field of music analysis almost exc... more While there is an enormous amount of music data available, the field of music analysis almost exclusively uses manually designed features. In this work we learn features from music data in a completely unsupervised way and evaluate them on a musical genre classification task. We achieve results very close to state-of-the-art performance which relies on highly hand-tuned feature extractors.

Policy Gradients for Cryptanalysis

So-called Physical Unclonable Functions are an emerging, new cryptographic and security primitive... more So-called Physical Unclonable Functions are an emerging, new cryptographic and security primitive. They can potentially replace secret binary keys in vulnerable hardware systems and have other security advantages. In this paper, we deal with the cryptanalysis of this new primitive by use of machine learning methods. In particular, we investigate to what extent the security of circuit-based PUFs can be challenged by a new machine learning technique named Policy Gradients with Parameter-based Exploration. Our findings show that this technique has several important advantages in cryptanalysis of Physical Unclonable Functions compared to other machine learning fields and to other policy gradient methods.