0% found this document useful (0 votes)
198 views27 pages

Probability Density Function - Wikipedia

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views27 pages

Probability Density Function - Wikipedia

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Probability density

function
In probability theory, a probability density function (PDF), density function, or density of an
absolutely continuous random variable, is a function whose value at any given sample (or point) in
the sample space (the set of possible values taken by the random variable) can be interpreted as
providing a relative likelihood that the value of the random variable would be equal to that sample. [2]
[3]
Probability density is the probability per unit length, in other words, while the absolute likelihood
for a continuous random variable to take on any particular value is 0 (since there is an infinite set of
possible values to begin with), the value of the PDF at two different samples can be used to infer, in
any particular draw of the random variable, how much more likely it is that the random variable
would be close to one sample compared to the other sample.
Box plot and probability density function of a normal distribution
N(0, σ2).

Geometric visualisation of
the mode, median and
mean of an arbitrary
unimodal probability
density function.[1]

In a more precise sense, the PDF is used to specify the probability of the random variable falling
within a particular range of values, as opposed to taking on any one value. This probability is given by
the integral of this variable's PDF over that range—that is, it is given by the area under the density
function but above the horizontal axis and between the lowest and greatest values of the range. The
probability density function is nonnegative everywhere, and the area under the entire curve is equal
to 1.

The terms probability distribution function and probability function have also sometimes been used to
denote the probability density function. However, this use is not standard among probabilists and
statisticians. In other sources, "probability distribution function" may be used when the probability
distribution is defined as a function over general sets of values or it may refer to the cumulative
distribution function, or it may be a probability mass function (PMF) rather than the density. "Density
function" itself is also used for the probability mass function, leading to further confusion.[4] In
general though, the PMF is used in the context of discrete random variables (random variables that
take values on a countable set), while the PDF is used in the context of continuous random variables.
Example

Examples of four continuous


probability density functions.

Suppose bacteria of a certain species typically live 20 to 30 hours. The probability that a bacterium
lives exactly 5 hours is equal to zero. A lot of bacteria live for approximately 5 hours, but there is no
chance that any given bacterium dies at exactly 5.00... hours. However, the probability that the
bacterium dies between 5 hours and 5.01 hours is quantifiable. Suppose the answer is 0.02 (i.e., 2%).
Then, the probability that the bacterium dies between 5 hours and 5.001 hours should be about
0.087, since this time interval is one-tenth as long as the previous. The probability that the bacterium
dies between 5 hours and 5.0001 hours should be about 0.0087, and so on.

In this example, the ratio (probability of living during an interval) / (duration of the interval) is
approximately constant, and equal to 2 per hour (or 2 hour−1). For example, there is 0.02 probability
of dying in the 0.01-hour interval between 5 and 5.01 hours, and (0.02 probability / 0.01 hours) = 2
hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours. Therefore,
the probability that the bacterium dies at 5 hours can be written as (2 hour −1) dt. This is the
probability that the bacterium dies within an infinitesimal window of time around 5 hours, where dt
is the duration of this window. For example, the probability that

it lives longer than 5 hours, but shorter than (5 hours + 1 nanosecond), is (2 hour −1)×(1 nanosecond) ≈
6 × 10−13 (using the unit conversion 3.6 × 1012 nanoseconds = 1 hour).

There is a probability density function f with f(5 hours) = 2 hour−1. The integral of f over any window
of time (not only infinitesimal windows but also large windows) is the probability that the bacterium
dies in that window.
Absolutely continuous univariate
distributions
A probability density function is most commonly associated with absolutely continuous
univariate distributions. A random variable has density , where is a non-negative
Lebesgue-integrable function, if:

is the cumulative distribution functio n of , then:


Hence, if

and (if is

continuous at )

Intuitively, one can think of as being the probability of falling within the
infinitesimal interval .
Formal definition
Formal definition
(This definition may be extended to any probability distribution using the measure-theoretic definition
of probability.)

A random variable with values in a measurable space (usually with the Borel sets as measurable
subsets) has as probability distribution the measure X∗P on : the density of with respect to a
reference measure on is the Radon–Nikodym derivative:

That is, f is any measurable function with the property that:

for any measurable set

Discussion
In the continuous univariate case above, the reference measure is the Lebesgue measure. The
probability mass function of a discrete random variable is the density with respect to the counting
measure over the sample space (usually the set of integers, or some subset thereof).

It is not possible to define a density with reference to an arbitrary measure (e.g. one can not choose
the counting measure as a reference for a continuous random variable). Furthermore, when it does
exist, the density is almost unique, meaning that any two such densities coincide almost everywhere.
Further details
Unlike a probability, a probability density function can take on values greater than one; for
example, the continuous uniform distribution on the interval [0, 1/2] has probability density f(x) =
2 for 0 ≤ x ≤ 1/2 and f(x) = 0 elsewhere.

The standard normal distribution has probability density

If a random variable X is given and its distribution admits a probability density function f, then the
expected value of X (if the expected value exists) can be calculated as

Not every probability distribution has a density function: the distributions of discrete random
variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e.,
does not assign positive probability to any individual point.

A distribution has a density function if and only if its cumulative distribution function F(x) is
absolutely continuous. In this case: F is almost everywhere differentiable, and its derivative can be
used as probability density:

If a probability distribution admits a density, then the probability of every one-point set { a} is zero;
the same holds for finite and countable sets.

Two probability densities f and g represent the same probability distribution precisely if they differ
only on a set of Lebesgue measure zero.

In the field of statistical physics, a non-formal reformulation of the relation above between the
derivative of the cumulative distribution function and the probability density function is generally
used as the definition of the probability density function. This alternate definition is the following:
If dt is an infinitely small number, the probability that X is included within the interval (t, t + dt) is
equal to f(t) dt, or:

Link between discrete and


continuous distributions
It is possible to represent certain discrete random variables as well as random variables involving
both a continuous and a discrete part with a generalized probability density function using the Dirac
delta function. (This is not possible with a probability density function in the sense defined above, it
may be done with a distribution.) For example, consider a binary discrete random variable having the
1
Rademacher distribution—that is, taking −1 or 1 for values, with probability ⁄2 each. The density of
probability associated with this variable is:

More generally, if a discrete variable can take n different values among real numbers, then the
associated probability density function is:

where are the discrete


values accessible to the variable and
are the probabilities
associated with these values.
This substantially unifies the treatment of discrete and continuous probability distributions. The
above expression allows for determining statistical characteristics of such a discrete variable (such
as the mean, variance, and kurtosis), starting from the formulas given for a continuous distribution
of the probability.

Families of densities
It is common for probability density functions (and probability mass functions) to be parametrized
—that is, to be characterized by unspecified parameters. For example, the normal distribution is
parametrized in terms of the mean and the variance, denoted by and respectively, giving the
family of densities

Different values of the parameters describe


different distributions of different random
variables on the same sample space (the
same set of all possible values of the
variable); this sample space is the domain of
the family of random variables that this
family of distributions describes. A given set
of parameters describes a single distribution
within the family sharing the functional form
of the density. From the perspective of a
given distribution, the parameters are
constants, and terms in a density function
that contain only parameters, but not
variables, are part of the normalization
factor of a distribution (the multiplicative
factor that ensures that the area under the
density—the probability of something in the
domain occurring— equals 1). This
normalization factor is outside the kernel of
the distribution.
Since the parameters are constants, reparametrizing a density in terms of different parameters to
give a characterization of a different random variable in the family, means simply substituting the
new parameter values into the formula in place of the old ones.

Densities associated with


multiple variables
For continuous random variables X1, ..., Xn, it is also possible to define a probability density function
associated to the set as a whole, often called joint probability density function. This density function
is defined as a function of the n variables, such that, for any domain D in the n-dimensional space of
the values of the variables X1, ..., Xn, the probability that a realisation of the set variables falls inside
the domain D is

If F(x1, ..., xn) = Pr(X1 ≤ x1, ..., Xn ≤ xn) is the cumulative distribution function of the vector
(X1, ..., Xn), then the joint probability density function can be computed as a partial derivative

Marginal densities
For i = 1, 2, ..., n, let fXi(xi) be the probability density function associated with variable Xi alone. This is
called the marginal density function, and can be deduced from the probability density associated with
the random variables X1, ..., Xn by integrating over all values of the other n − 1 variables:

Independence
Continuous random variables X1, ..., Xn admitting a joint density are all independent from each other if
and only if

Corollary
If the joint probability density function of a vector of n random variables can be factored into a
product of n functions of one variable
(where each fi is not necessarily a density)
then the n variables in the set are all
independent from each other, and the
marginal probability density function of
each of them is given by

Example
This elementary example illustrates the above definition of multidimensional probability density
functions in the simple case of a function of a set of two variables. Let us call a 2dimensional
random vector of coordinates (X, Y): the probability to obtain in the quarter plane of positive x and
y is

Function of random variables


and change of variables in the
probability density function
If the probability density function of a random variable (or vector) X is given as fX(x), it is possible (but
often not necessary; see below) to calculate the probability density function of some variable Y = g(X).
This is also called a "change of variable" and is in practice used to generate a random variable of
arbitrary shape fg(X) = fY using a known (for instance, uniform) random number generator.

It is tempting to think that in order to find the expected value E(g(X)), one must first find the
probability density fg(X) of the new random variable Y = g(X). However, rather than computing

one may find instead

The values of the two integrals are the same in all cases in which both X and g(X) actually have
probability density functions. It is not necessary that g be a one-to-one function. In some cases the
latter integral is computed much more easily than the former. See Law of the unconscious
statistician.

Scalar to scalar
Let be a monotonic function, then the resulting density function is[5]

Here g−1 denotes the inverse function.

This follows from the fact that the probability contained in a differential area must be invariant under
change of variables. That is,
or

For functions that are not monotonic, the probability density function for y is

where n(y) is the number of solutions in x

for the equation , and are

these solutions. Vector to vector


Suppose x is an n-dimensional random variable with joint density f. If y = G(x), where G is a bijective,
differentiable function, then y has density pY:

with the differential regarded as the


Jacobian of the inverse of G(⋅),
evaluated at y.[6]
For example, in the 2-dimensional case x = (x1, x2), suppose the transform G is given as y1 =
G1(x1, x2), y2 = G2(x1, x2) with inverses x1 = G1−1(y1, y2), x2 = G2−1(y1, y2). The joint distribution for y
= (y1, y2) has density[7]

Vector to scalar
Let be a differentiable function and be a random vector taking values in , be the probability
density function of and be the Dirac delta function. It is
possible to use the formulas above to determine , the probability density function of , which will
be given by

This result leads to the law of the unconscious statistician:

Proof:
which is an upper triangular matrix with
ones on the main diagonal, therefore its
determinant is 1. Applying the change of
variable theorem from the previous section
we obtain that

which if marginalized over leads to the


desired probability density function.

Sums of independent random


variables
The probability density function of the sum of two independent random variables U and V, each of
which has a probability density function, is the convolution of their separate density functions:

It is possible to generalize the previous relation to a sum of N independent random variables, with
densities U1, ..., UN:

This can be derived from a two-way change of variables involving Y = U + V and Z = V, similarly to the
example below for the quotient of independent random variables.

Products and quotients of


independent random variables
Given two independent random variables U and V, each of which has a probability density
function, the density of the product Y = UV and quotient Y = U/V can be computed by a change
of variables.

Example: Quotient distribution


To compute the quotient Y = U/V of two independent random variables U and V, define the following
transformation:
Then, the joint density p(y,z) can be computed by a change of variables from U,V to Y,Z, and Y can be
derived by marginalizing out Z from the joint density.

The inverse transformation is

The absolute value of the Jacobian matri x determinant of this


transformation is:

Thus:

And the distribution of Y can be computed by marginalizing out Z:

This method crucially requires that the transformation from U,V to Y,Z be bijective. The above
transformation meets this because Z can be mapped directly back to V, and for a given V the quotient
U/V is monotonic. This is similarly the case for the sum U + V, difference U − V and product UV.

Exactly the same method can be used to compute the distribution of other functions of multiple
independent random variables.
Example: Quotient of two standard
normals
Given two standard normal variables U and V, the quotient can be computed as follows. First, the
variables have the following density functions:

We transform as described above:

This leads to:


This is the density of a standard Cauchy distribution.

See also
Density estimation – Estimate of an
unobservable underlying probability
density function
Kernel density estimation – Estimator
Likelihood function – Function related to
statistics and probability theory List of
probability distributions
Probability amplitude – Complex
number whose squared absolute value is a
probability
Probability mass function –
Discretevariable probability distribution
Secondary measure
Uses as position probability density:
Atomic orbital – Function describing
an electron in an atom
Home range – The area in which an
animal lives and moves on a
periodic basis

References

1. "AP Statistics Review - Density Curves and


the Normal Distributions" ([Link]
[Link]/web/20150402183703/[Link]
[Link]/post/500586152
36/density-curves-and-the-normal-distribu
tions) . Archived from the original (http
s://[Link]/post/50058
615236/density-curves-and-the-normal-di
.
stributions) on 2 April 2015. Retrieved 16
March 2015.
2 Grinstead, Charles M.; Snell, J. Laurie
(2009). "Conditional Probability - Discrete
Conditional" ([Link]
~chance/teaching_aids/books_articles/pr
obability_book/[Link]) (PDF).
Grinstead & Snell's Introduction to
Probability. Orange Grove Texts. ISBN 978-
1616100469. Archived (https://
[Link]/web/20030425090244/
[Link]
hing_aids/books_articles/probability_boo
k/[Link]) (PDF) from the original on
2003-04-25. Retrieved 2019-07-25.
.
3. "probability - Is a uniformly random number
over the real line a valid distribution?"
([Link]
[Link]/q/541479) . Cross Validated.
Retrieved 2021-10-06.
4 Ord, J.K. (1972) Families of Frequency
Distributions, Griffin. ISBN 0-85264-137-0
(for example, Table 5.1 and Example 5.4)
5. Siegrist, Kyle. "Transformations of Random
Variables" ([Link]
[Link]/Bookshelves/Probability_Theory/Pr
obability_Mathematical_Statistics_and_St
ochastic_Processes_%28Siegrist%29/0
3%3A_Distributions/3.07%3A_Transforma
tions_of_Random_Variables#The_Change
.
_of_Variables_Formula) . LibreTexts
Statistics. Retrieved 22 December 2023.
. Devore, Jay L.; Berk, Kenneth N. (2007).
Modern Mathematical Statistics with
Applications ([Link]
ooks?id=3X7Qca6CcfkC&pg=PA263) .
Cengage. p. 263. ISBN 978-0-534-404734.
7 David, Stirzaker (2007-01-01). Elementary
Probability. Cambridge University Press.
ISBN 978-0521534284. OCLC 851313783
([Link]
783) .

Further reading
.
Billingsley, Patrick (1979). Probability and
Measure. New York, Toronto, London:
John Wiley and Sons. ISBN 0471-00710-2.
Casella, George; Berger, Roger L. (2002).
Statistical Inference
(Second ed.). Thomson Learning. pp.
34–37. ISBN 0-534-24312-6.
Stirzaker, David (2003). Elementary
Probability ([Link]
elementaryprobab0000stir) . Cambridge
University Press. ISBN 0521-42028-8.
Chapters 7 to 9 are about continuous
variables.

External links

Ushakov, N.G. (2001) [1994], "Density of a


probability distribution" ([Link]
[Link]/[Link] p?
title=Density_of_a_probability_distri
bution) , Encyclopedia of Mathematics,
EMS Press
Weisstein, Eric W. "Probability density
function" ([Link]
om/[Link]) .
MathWorld.

Retrieved from
"[Link]
title=Probability_density_function&oldid=121797
7674"

This page was last edited on 9 April 2024, at


01:15 (UTC). •
Content is available under CC BY-SA 4.0 unless
otherwise noted.

You might also like