0 Bewertungen 0% fanden dieses Dokument nützlich (0 Abstimmungen) 76 Ansichten 53 Seiten 4,5
Das Dokument behandelt die Grundlagen und Entwicklungen von Deep Feedforward Networks und Deep Learning, einschließlich der Geschichte, der Herausforderungen und der Motivation hinter diesen Technologien. Es erläutert die verschiedenen Lernmethoden, die in Deep Learning verwendet werden, sowie deren Anwendungen in verschiedenen Bereichen wie Medizin, Finanzwesen und industrielle Automatisierung. Zudem werden die Unterschiede zwischen Machine Learning und Deep Learning sowie die Vor- und Nachteile von Deep Learning diskutiert.
KI-verbesserte Beschreibung
Copyright
© © All Rights Reserved
Verfügbare Formate
Als PDF herunterladen oder online auf Scribd lesen
Zu den vorherigen Artikeln Weiter zu den nächsten Artikeln
Deep Feedforward
Networks
syllabus
FO oe pugation Probabilistic Theory of Deep Leming. Gradient Learning - Chain
Fn ee ‘egularization: Dataset Augmentation - Noise Robusiness -Early
sopping, Bagging ropout - batch normalization- VC Dimension and Neural Nets.
Contents
44 History of Deep Learing
42. AProbabilistic Theory of Deep Learning
43 Deep Networks
44. Challenges and Motivation of Deep Leaming
45 Gradient Learning
46 Chain Rule and Backpropagation
47 Regularization : Dataset Augmentation
48 Bagging and Dropout
49 VC Dimension
4.10 Two Marks Questions with Answers
eeROIS and Woop Lenrmly ~~
| 4.4 | History of Deep Learning
943, when Waller Pitts ang y,
ural networks of the human 4
“ it
they called “threshold log
ck to 1
‘The history of Deep Leaming ean be traced ba
odel based on the net
MeCulloch created a computer 1m .
: ithematics
“They used a combination of algorithms and m=
mimic the thought process. .
f human brain; they used algorith,
Thi i ‘ imic thought process ims
ere basic aim was to mimic thous! uman thought POCSS. Alen 7
, ic
mathematics to make the threshold logie t© mimi
+ 195 = machines would not take much i
called the father of Al concluded in 1951 that the machin ich, time ,
i vould be able to talk to
started thinking of their own; at some point of time, they would eihogy
1d take the control of the universe.
a combination of mathematics and algorithms
rocess. Since then, deep leaming has ey, le
aks in its development,
and it is also expected that they woul
Warreni McCulloch and Walter Pitts used
called threshold logic to mimic the thought p
steadily, over the years with two significant bre:
‘uous Back Propagation Model is credited to Heay
er version based only on the chy
in the early 1960s but only becane
The development of the basics of a cont
J. Kelley in 1960. Stuart Dreyfus came up with a simp
rule in 1962. The concept of back propagation existed i
useful until 1985.
‘The next significant evolutionary step for deep leaming took place in 1999, when computes
started becoming faster at processing data and graphics processing units were developed.
Neural networks also have the advantage of continuing to improve as more training datais
added. 4
Around the year 2000, The Vanishing Gradient Problem appeared. It was discoverd
features" formed in lower layers were not being leamed by the upper layers, because 0
learning signal reached these layers. 3 i
In 2001, a research report by META Group described the challenges and opportunist
data growth as three-dimensional. The report described the increasing volume of data cs)
the increasing speed of data as increasing the range of data sources and types. This wis?
call to prepare for the onslaught of Big Data, which was just starting.
In 2009, Fei-Fei Li, an Al professor at Stanford launched ImageNet, assembled |
database of more than 14 million labeled images. The Internet is and was, full of
images. Labeled images were necded to “train” neural nets,
By 2011, the speed of GPUs had increased significantly, making it possible ©
convolutional neural networks “without” the layer-by-layer pre-training. With the i
TECHNICAL PUBLICATIONS® . an up.thnust for knowledgeworks and Deep Leaming 4-3
Deep Feedforward Networks
computing speed, it became obvious deep leaming had significant advantages in terms of
efficiency and speed,
«Generative Adversarial Network (GAN) is a class of machine learning system invented by
Jan Goodfellow and his colleagues in 2014, Cothing up in history in 2016 Google DeepMind
challenge match between Alpha Go versus Lee Sedal, the AlphaGo win all the matches from
aworld champion Lee Sedol,
AlfuGo and AlfaZero are computer programs developed by artificial intelligence research
company called DeepMind in (2016 - 2017); it plays the board game Go.
The transformer introduced in 2017
Natural Language Processing (NLP),
~ 19 a deep leaming model used specially used for
Although there is a lot of community contributed to the deep leaming but Yann LeCun,
Geoffrey Hinton, and Yoshua Bengio have received Turing awards in 2018,
a Probabilistic Theory of Deep Learning
Probabilistic modeling isthe application ofthe principles of statisti to data analysis, It was
one of the earliest forms of machine learning and it is still widely used to this day. One of
the best-known algorithms in this category is the Naive Bayes algorithm,
Naive Bayes is a type of machine-leaming classifier based on applying Baye's theorem
while assuming that the features in the input data are all independent, This form of data
analysis predates computers and was applied by hand decades before its first computer
implementation. '
A closely related model is the logistic regression, which is sometimes considered to be the
“hello world” of modem machine leaming. Much like Naive Bayes, logistic regression
predates computing by a long time, yet itis still useful to this day.
Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior
probability, the probabilitics of observing various data given the hypothesis and the observed
data itself.
Baye’s theorem is a method to revise the probability of an event given additioral
information, ,
Baye's theorem ealeulates a conditional probability -called a posterior or revised
Probability.
Baye's theorem is a result in probability. theory that relates conditional probabilities. 1f A
and B denote two events, P(A[B) denotes the conditional probability of A occurring, given
that B occurs, The two conditional probabilities P(A[B) and P(BIA) are in general different.
a erDeep Feedforward
‘Neural Networks and Doep Loarning 424 Notverg
| P(BIA). An important appl
© This th etween P(AIB) and ion ay
the strengths of evidence.
ate or revise
Baye's theorem is that Hate OFF
belief in light of new evidence a posterior.
a relation
s a rule how to upd
rem gives
‘© A prior probability is an initial probability datue originally obtained BIE 2Y edition
information is obtained. i i
* A posterior probability is a probability value that has been revised by using adlition
information that is later obtained.
© IfA and B are two random variables
PCAIB) = PBLAYPIAL
© Inthe context of classifier hypothesis h and training data T
ran =2 rn h)
where (iy = Prior probability of hypothesis h
(1) = Prior probability of training data I
P(h/t) = Probability of h given |
PUM) = Probability of I given h
ikely a random variable or set of random
© A probability distribution is a description of how |
The way we describe probability
variables is to take on cach of its possible states.
distributions depends on whether the variables are discrete or continuous.
© A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon.
Deep Networks
, The term “deep” usually refers to the number of hidden layers in the n
Deep learning is a subset of machine learning, which is predicated on idea of learning for
example. In machine leaming, instead of teaching a computer a massive list of rules to sot
lem, we give it a model with which it can evaluate examples, and a small set of
cural network.
the prol
instructions to modify the model when it makes a mistake.
The basic idea of deep learning is that repeated composition of functions can often redurt
the requirements on the number of base functions (computational units) by a factor that 8
exponentially related to the number of layers in the network.
Deep learning eliminates some of data pre-processing that is typically involved wit
machine learning.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgesyoras and Deep Leaming aes
Pe saws elation berweny ay gg nn tera Newer
3.1 shows relation between AI
; + ML and deep learning,
Fig. 4.3.1 Relation between Al, ML and deep learning
For example, let's say that we had a set of photos of different pets, and we wanted to
categorize by “cat” and
° “dog”. Deep learning algorithms can determine which features (e.g.
ears) are most important to distinguish each animal from another, In machine learning, this
hierarchy of features is established manually by a human expert.
In deep learning, a computer model learns to perform classification tasks directly from
images, text, or sound. Deep leaming models can achieve state-of-the-art accuracy,
sometimes exceeding human-level performance. Models are trained by using a large set of
labeled data and neural network architectures that contain many layers.
Deep learning classifies information through layers of neural networks, which have a set of
inputs that receive raw data, For example, if'a neural network is trained with images of birds,
it can be used to recognize images of birds. More layers enable more precise results, such as
distinguishing a crow from a raven as compared to distinguishing a crow froma chicken.
Deep learning consists of the following methods and their variations :
a) Unsupervised Teaming systems such as Boltzman machines for preliminary training,
auto-encoders, generative adversarial network.
b) Supervised learning such as Convolution neural networks which brought technoogy of
Patter recognition to a new level.
¢) Recurrent neural networks, allowing to train on processes in time.
d) Recursive neural networks, allowing to include feedback between circuit elements and
chains,
143.4] Reasons for using Deep Learning
|. Analyzing unstructured data Deep learning algorithms can be trained to look at text
data by analyzing social media posts, news, and surveys to provide valuable business and
Customer insights.
TECHNICAL PUBLICATIONS®
n up-thrust for knowledgeDeep Fe
a 1p Feedforwarg jy
Noural Networks and Deep Learning
yele
data on its own.
d data for trainin: Once traineg
ing requires tab :
2. Data labelling + Deep learn .
label new data and identify different types 0 “
deep leaning algorith
Ily from ravw data.
i i trained, it
4. Efficiency : When a deep learning algorithm is properly trained, it can nie
thousands of tasks over and over again. faster than humans. -
avorks used in deep learning have the ability to be appig
applications. Additionally, a deep learning model can aizy
ym can save time because it og
Py
bs
Feature engineering : A
require humans to extract features manus
5. Training : The neural net
many different data types and
by retraining it with new data.
Application of Deep Learning
1. Acrospace and defense : Deep learning is utilized extensively to help satellites identify
specific objects or areas of interest and classify them as safe or unsafe for soldiers,
nancial institutions regularly use predictive analytics to drive
sks for loan approvals, detect fraud, and
for clients.
2. Financial services
algorithmic trading of stocks, assess busines
help manage credit and investment portfoli :
3, Medical research : The medical research ficld uses deep learning extensively. For
example, in ongoing cancer research, deep learning is used to detect the presence of
cancer cells automatically. Y
4, Industrial automation : The heavy machinery sector is one that requires a large number
of safety measures. Deep leaming helps with the improvement of worker safety in such
environments by detecting any person or objects that comes within the unsafe radius oft
heavy machine. ;
5, Facial recognition : This feature utilizing deep learning is being used not just for a rangt
of security purposes but will soon enable purchases at stores. Facial recognition is
already being extensively used in airports to enable seamless, paperless check-ins. ‘
_ EE] Difference between Machine Learning and Deep Learning
Machine Learning
‘Machine learning uses algorithms to parse data,
earn from that data, and make informed to-create an “artificial neural
decisions based on what it has teamed. can learn and make intelligent decis
its own. Gite
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgevas and Deep Learning
neti 47
Machine Jcarning gives lesser accuracy, Deep Feedtonward Networks
Machine learning requires less time for Detp lang gives mow ai
‘ . Deep
raining Teaming
training, Tequires more time for
‘ews accurately identified features by hums
a
intervention. ean create new features:
Machine learning models mostly re
require data in i
wroctured form, Deep Leaning models can work with
Structured and unstructured data both as they
rely on the layers of the Artificial neural
(network,
‘ ‘Algorithms are detected by data analysts to
examine specific variables in data sets,
Algorithms are largely self-directed on data
analysis once they are put into production.
Machine lesming ean Work on low-end Deep learning model needs ab tof
needs a buge amount of
machines.
data to work efficiently, so they need GPU's
_ and hence the high-end machine
Feature
extraction
ature extraction
+
Classification
Classification
[EEE Difference between ML, Al and Data Science
‘Artificial Intelligence [Dita Science:
1. Focuses on providing a means for Focuses on giving machines | Focuses on extracting
algorithms and systemsto leam cognitive andintellectual’ | | information needles from
capabilities similar to those | data haystacks to aid in”
Sr.No, ‘Machine Learning
from experience with data and ust k
that experience to improve over of humans. decision-making and
time. . planning.
2. Machine Leaming uses statistical
models.
“Artificial Intelligence uses | RENE ie
jopic and decision tees. structured data. E
A hich { Development of fThe process of using ”
Se A att computerized applications | sdvanced analytics to
data and find patterns... ~ | that simulate human } extract relevant | ahh
Ren intelligence and interaction, information from data.)
TECHNICAL PUBLICATIONS®[Neural Networks and Deep Learning 4-8 Deep Feedlorward Netverg
4 Objective is to maximize Objective is to maximize the Objective is to extract,
meeacye chance of success. actionable insights from,
the data,
©S. Mean be done through Alencompasses a collection | Uses statistics,
supervised, unsupervised or of intelligence concepts, mathematics, data
reinfareement learning including elements of wrangling, big data
approaches. perception, planning and analytics, machine
prediction. Jeaming and various other
methods to answer.
analytics questions.
6. ML is concemed with knowledge Al is concerned with Data science isall. about
accumulation. knowledge dissemination __ data engineering,
and conscious machine
actions.
EEE] Ditference between Al, ML and Deep Learning
Sr. al ML DL "]
No.
1. Al aims towards building MLaims to lear through data DLaimstobuildneural
machines that are capable toto solve the problem. networks that autornatically
think like humans, discover patterns for feature
detection.
2. , Alis subset of data science. ML is subset of Al and data DL is subsct of AI, ML and
seience. data science,
3. Alllsystems of artificial ML algorithms can be broadly Deep learning architectures are
intelligence fall into three classified into three categories as follows:
‘Types: a) Supervised leaming a) Convolutional Neural
a) Antficial Narrow +) Unsupervised learning Networks
Intelligence ¢) Reinforcement leaming _) Recurrent Neural Networks
; b) Artificial General ) Recursive Neural Networks
F Intelligence .
c) Artificial Super Intelligence
4. Making machines intel ‘These algorithms can work Algorithms are dependent on
may or may not need high easily on normal low high performance hardware
) 2
computational power, performance computers components that include
without GPUs. GPUs.
EES Advantages and Disadvantages of Deep Learning
Advantages of Deep Learning
© No need for feature engineering,
‘© DL solves the problem on the end-to-end basis.
Deep learning gives more accuracy.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge‘Deep Learning
anes Deep Feedtorward Networks
tages of Deep Learning
pp needs high-performance hardware,
‘ ,ismuch more time to train,
, ot nec
Hise difficult to assess its performance in real world applications.
ris vey hard to understand.
. .
challenges and Motivation of Deep Learning
,, The development of deep learning was motivated in part by the failure of traditional
algorithms to generalize well on such AI tasks.
pole Curse of Dimensionality
Many machine learning problems become exceedingly difficult when the number of
dimensions in the data is high. This phenomenon is known as the curse of dimensionality.
The curse of dimensionality refers to the phenomena that occur when classifying,
organizing, and analyzing high dimensional data that docs not occur in low dimensional
spaces, specifically the issue of data sparsity and “closeness” of data.
«The volume of the space represented grows so quickly that the data cannot keep up and thus
becomes sparse, as shown in Fig. 4.4.1. The sparsity issue is a major one for anyone whose
goal has some statistical significance.
T 20)
: 20
' 15} 45
H ? 10
' ' 5
i 0--
| { °
' i 20
jo woo ose yo
os o 5 0 15 (20 200
(2) 1D -4 regions {b) 20-16 regions (€) 30-66 regions
Fig. 4.4.1 As the number of relevant dimensions of the data Increases
* AS the data space seen above moves from one dimension to two dimensions and finally to
three dimensions, the given data fills less and less of the data space. In order to maintain an
Accurate representation of the space, the data for analysis grows exponentially,
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeFoedl
Neural Networks and Deep Learning 4-10 Boop Fee S208 Nate,
ing the data. In low dimensio,
* The second issne that arises is related to sorting or clas ma
sn very similar bu the higher the dimension the further (hese data pg,
spaces, data may
may s
to be.
ry Local Constancy and Smoothness Regularization
J to be guided by prior delieg,
In order to generalize well, machine learning algorithms nee
about what kind of function they should learn, Among the most widely used priors jg,
smoothness or local constancy prior.
* There are many different ways to implicitly or explicitly express @ prior belief thatthe
learned function should be smooth or locally constant, AU of these different methods ae
designed to encourage the learning process to leam a function f+ that satisfies the conditgg
F(x) = F(x +e),
© If we know a good answer for an input x, then that answer is probably good in the
neighborhood of x. If we have several good answers in some neighborhood we woul
many of them as possible,
combine them to produce an answer that agrees with
© Aneextreme example of the local constancy approach is the k «nearest neighbors family of
learning algorithms,
ing examples, mos
kere! machines interpolate between training set outputs associated with nearby training
examples. An important class of kemels is the family of local kernels where k(u, v) is large
when u = v and decreases as u and v grow farther apart from each other.
© The k-nearest neighbor's algorithm copies the output from nearby tral
© A local kemel can be thought of as a similarity function that performs template matching, by
measuring how closely a test example x resembles each training example x").
© Decision trees also suffer from the limitations of exclusively smoothness-based leaming
because they break the input space into as many regions as there are leaves and use &
separate parameter in each region.
EXE] Manifold Learning
© Manifold leaming is an approach to non-linear dimensionality reduction. Algorithms for this
task are based on the idea that the dimensionality of many data sets is only artificially high.
© Manifold leaming was introduced in the case of continuous-valued data and the
unsupervised leaming setting, although this probability concentration, idea can te
generalized to both discrete data and the supervised learning setting : The key assumptin
remains that probability mass is highly concentrated,
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeid Deep Learning 4-
works ft Doop Feedforward Networks
1 ooted to be very difficult to visualize. While data in two or three
sca lo ir
ns can plotted to show the inherent structure of the data, equivalent high-
nal plots are much less intui
igh
e. To aid visualization of the structure of a datasct,
ienension® $
ine gimension must be reduced in some way.
rye simplest WaY to accomplish this dimensionality reduction is by taking a random
peajection of the data. Though this allows some degree of visualization of the data structure,
the randomness of the choice leaves much to be desired. In a random projection, it is likely
ipa he more interesting structure within the data will be lost.
«When the data lies on a low-dimensional manifold, it can be most natural for machi
g algorithms to represent the data in terms of coordinates on the manifold, rather than
ine
Jearnins
jnterms of co-ordinates in R",
[Bsradient Learning
« Designing and training a neural network is not much different from training any other
machine learning model with gradient descent. Choices for gradient learning are as follows :
2) We must choose a cost function
b) We must choose how to represent the output of the model
©) We now visit these design consideration:
gradient-based optimizers. Gradient-
« Neural networks are usually trained by using iterative,
wich easier to minimize a reasonably
based learning draws on the fact that it is generally rm
smooth, continuous function than a discrete function.
estimating the impact of small variations of the
* The loss function can be minimized by
ting from any
parameter values on the loss function. Convex optimization converges sta:
initial parameters. Stochastic gradient descent applied to non-convex loss functions has no
such convergence guarantee and is sensitive to the values of the initial parameters.
ortant to initialize all weights to small random
all positive Values. The iterative
* For feedforward neural networks, it is imp¢
values, The biases may be initialized to zero or to sm
gradient-based optimization algorithms used to train feedforward networks and almost all
other deep models.
15.1) Cost Function
Important aspect of the design of deep neuri
. those for parametric models such as lit
fines a distribution p(y[x ; 8) and simply use
al networks is the cost function. They are similar
near models. In most cases, parametric model
the principle of maximum likelihood.
TECHNICAL PUBLIGATIONS® tp-thrust for knowledgei
‘Neural Networks and Deep Learning 4-12 Deep Feedforward Netyr,
a
The neuf eosentoy been the taining data and he model's predictions 3 he gy,
function, Most modem neural networks are trained using maximum likelihood, M
Cost function is given by,
JQ) =
Rasta !O8 Ppanses( VES)
This means cost is simply negative log-likelihood and equivalently, cross-entropy between
training set and model distribution. Specific form of cost function changes from mode] to
mode! depending on the form of log Py.
Cost function with Gaussian model : if
Prrote(¥1X) = N(yIP(x: 0), 1)
then using maximum likelihood the mean squared error cost is,
1
JO) = ~ FE, gly — AiO)? + const
‘Where “const” depends on the variance of Gaussian.
Advantage of this approach to cost is that deriving cost from maximum likelihood Temoves
the burden of designing cost functions for cach model.
Desirable property of gradient : Gradient must be large and predictable enough to serve as
a good guide to the learning algorithm.
Cross entropy and regularization : A property of cross-entropy cost used for MLE is that
it does not have a minimum value. For discrete output variables, they cannot Tepresent
Probability of zero or one but come arbitrarily close. Logistic regression is an example,
For real-valued output variables it becomes possible to assign extremely high density to
correct training sct outputs, ¢.g, by learning the variance parameter of Gaussian output and
the resulting cross-entropy approaches negative infinity.
Learning conditional statistics : Instead of learning a full probability distribution, we
often want to lear just one conditional statistic of y given x. :
Learning a function : If we have a sufficiently powerful neural network, we can think of it
as being powerful cnough to determine any function “f". This function is limited only by
boundedness and continuity.
From this point of view, cost function is a function rather than a function.
View cost as a functional, not a function. We can think of learning as a task of choosing @
function rather than a set of parameters. We can design our cost function to have its
minimum occur at a specific function we desire. For example, design the cost functional to
have its minimum lic on the function that maps x to the expected valuc of y given x.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeSolving an optimizati
ization problem wit .
Ealled calculus of variation, with respect to a function requires a mathematical tool
[Mean squared error and
ient-based optimiza Mean absolute error often lead to poor results when used with
i . :
combined with these ¢ ion, Some output units saturate produce very small gradients when
‘ost functions. This is one reason cross-entropy cost is more popular
cross-entro ee
5 the my between data distribution and model distribution. Choice of how to”
a ol .
Bion o utput then determines the form of the erss-entropy function. In logistic
, Output is binary-valued. Any kind of neural network unit that may be used as an
can also be used as a hidden unit.
mplete the task that the network must perform. One simple kind of output unit is an output
based_on an affine transformation with no nonlinearity. These are often just called
Softmax units for Multinoulli output.
| Other output types.
nits for Gaussian output distributions
unit : Simple output based on affine transformation with no nonlinearity. Given
bes h, a layer of lincar output units produces a vector J=Wihtb.
units are often used to produce mean § of a conditional Gaussian distribution
a
G)=NOi5.D-
faximizing the log-likelihood is equivalent to
ce of a Gaussian too, or the covariance to be a
minimizing the mean squared error.
nits can be used to learn the covariant
Gnction of the input. However covariance needs to be constrained to be a positive definiteNeural Networks and Deep Learning
2. Sigmold units for bernoulll output distributions
Many tasks require predicting the value of a binary variable y. Classification Problems yg
: mum-tikelihood approach is (0 ding
a
two classes can solve this problem. The mavi
Remoulh distribution over y conditioned on x.
Hernoutli distribution is defined by a single number, The neural net needs to predic, a
Diy > Hs). For this number to be a valid probability, it must lic in the interval [0, 1}.
© To ensure a strong gradient whenever the model has the wrong answer, use SigMOid outy
units, A sigmoid output unit has two components ¢
a) A linear layer to compute 2 = W'h + b.
b) Use sigmoid activation function to convert 2 into a probability.
# Probability distribution using Sigmotd : Describe probability distribution over y using
27° W'h +b yisoutput, z is input.
© Probability distributions based on exponentiation and normalization are common throughout
ribution over binary variables
statistical modeling. The 2’ riable defining such a
called a logit.
3. Softplus function
the function is
© Sigmoid saturates when its argument is very positive or very negative, i.
insensitive to small changes in input. Any time we want a probability distribution overa
disercte variable with n valucs we may use the softmax function.
© Compare it to the softplus function :
E(x) = log(1 + exp(x))
© Softmax functions are most often used as the output of a classifier, to represent the
probability distribution over n different classes. Softmax functions can be used inside the te |
modcl itself, if we wish the model to choose between one of n different options for some
internal variable.
«Like the sigmoid, the sofimax activation can saturate. The sigmoid function has a singe
output that saturates when its input is extremely negative or extremely positive. In the case
of the softmax, there are multiple output values.
© These output values can saturate when the differences between input values
extreme. When the softmax saturates, many cost functions based on the softmak
saturate, unless they are able to invert the saturating activating function.
Das könnte Ihnen auch gefallen