0% found this document useful (0 votes)
13 views18 pages

Content

This document discusses the T2 control chart with estimated parameters. It provides three key points: 1) When parameters are estimated from data rather than known, the Average Run Length (ARL) of the T2 control chart will be larger than if the parameters were known. This increases the chance of a false alarm. 2) To design an efficient T2 chart with estimated parameters, the ARL function needs to be approximated. Existing approaches use extensive simulations, while this paper proposes an analytic approximation in limited cases. 3) Each T2 statistic becomes dependent when using estimated parameters instead of known parameters. This dependence must be accounted for when setting control limits, as it will affect the actual ARL

Uploaded by

chu.xujohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

Content

This document discusses the T2 control chart with estimated parameters. It provides three key points: 1) When parameters are estimated from data rather than known, the Average Run Length (ARL) of the T2 control chart will be larger than if the parameters were known. This increases the chance of a false alarm. 2) To design an efficient T2 chart with estimated parameters, the ARL function needs to be approximated. Existing approaches use extensive simulations, while this paper proposes an analytic approximation in limited cases. 3) Each T2 statistic becomes dependent when using estimated parameters instead of known parameters. This dependence must be accounted for when setting control limits, as it will affect the actual ARL

Uploaded by

chu.xujohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

On the T 2 control chart with estimated

parameters
Jiaqi Chen1∗and Hualong Yang2 , Jianfeng Yao2
1
Department of Mathematics, Harbin Institute of Technology
2
Department of Statistics and Actuarial Science, University of Hong Kong

November 20, 2016

Abstract
Statistical monitoring of multivariate processes is becoming increasingly important
in modern manufacturing environments. Typical equipment may have multiple key
variables to be measured continuously. Hotelling’s T 2 chart was originally applied
for monitoring the mean vector of multivariate quality measurements. In practical
problems, estimated parameters are needed and their use will modify the properties
of control charts. The Average Run Length (ARL), an indicator of the performance
of the control charts, will be larger when the estimated parameters are used. As one
contribution of the paper, we provide a rigorous proof of this phenomenon which has
been reported in several empirical studies. Furthermore, in order to design an efficient
T 2 chart with estimated parameters, it is necessary to have a method to calculate or
approximate the ARL function. An existing approach in the literature is based on
extensive Monte-Carlo simulations. In this paper, we propose a novel approach by
providing an analytic approximation of the ARL function in the however limited case
of univariate observations.

Keywords: Average Run Length; Estimated parameters; Multivariate statistical process


control; T 2 chart


Corresponding author. E-mail:[email protected]

1
1 Introduction
Statistical monitoring of multivariate processes is becoming increasingly important in mod-
ern manufacturing environments. Typical equipment may have multiple key variables to
be measured continuously. Hotelling’s T 2 chart (Hotelling , 1947) was originally applied
to the problem of monitoring the mean vector of multivariate quality measurements. In
practice, parameters of the observation process need to be estimated first during a Phase I
analysis. As a consequence, the use of such estimated parameters will modify the properties
of control charts in Phase II study. The performance of multivariate charts with estimated
parameters has been studied in the literature. Ryan (2011) analyzed the dependence of T 2
chart on estimated parameters. Lowry and Montgomery (1995) provided recommended
sample sizes for multivariate T 2 control charts. A more detailed study on the effects of
the parameter estimation on multivariate T 2 charts with χ2 based control limits was done
by Nedumaran and Pignatiello (1999). Champ et al. (2005) studied the T 2 charts with
corrected control limits. Jensen et al. (2006) and Bersimis et al. (2006) give a thorough
review of the literature concerning the effects of estimation on univariate and multivariate
charts with estimated parameters.
Throughout the paper, we assume that the base vector X of quality measurements is
distributed as the p-variate normal distribution Np (µ, Σ0 ). The process is said in-control
when µ = µ0 , a preassigned mean vector. When the in-control parameters µ0 and Σ0 are
known, Hotelling’s T 2 chart uses the statistics at time k = 1, 2, . . . ,

T
Tk2 = n Xk − µ0 Σ−1

0 X k − µ0 , (1.1)

where Xk is the mean of the k-th sample with sample size n. The chart gives an out-of-
control signal at time i when Ti is above a control limit h.
A widely used method to measure the performance of a control chart is through the
average run length (ARL), which is the expected number of the plotted chart statistics
before a signal is observed. When µ0 and Σ0 are unknown, we replace them by some
sample estimates for them. Sample estimators µ̂0 and Σ̂0 are based on data acquired prior
to the on-going inspection process. Assume that there are m such independent random
samples {Xi,1 , Xi,2 , . . . , Xi,n } of size n (i = 1, 2, . . . , m ) from an in-control process. Then
the sample estimators µ̂0 and Σ̂0 are:

m n
1 XX
µ̂0 = X = Xi,j ,
mn i=1 j=1
m n
1 XX  T
Σ̂0 = S = Xi,j − Xi Xi,j − Xi .
m(n − 1) i=1 j=1

2
The T 2 statistics are then replaced with
T
Tk2 = n Xk − µ̂0 Σ̂−1

0 Xk − µ̂0 . (1.2)

The whole procedure of T 2 chart is then divided into two phases. In Phase One, the
chart is used to determine whether the m independent random samples each with size n is
generated from an in-control process and then the parameter estimators can be obtained
through these samples. In Phase Two, these estimated parameters are directly used in
testing whether new sample points deviate from the in-control status based on some control
limits. Sullivan and Woodall (1996) and Kim et al. (2003) have discussed the problem of
adapting control charts for the preliminary analysis of multivariate observations and also
recommend a method for preliminary analysis of multivariate observations that does not
require any simulation for the control limit for Phase One. For Phase Two, the main task
is to determine a control limit h which keeps the in-control ARL at a given level (e.g. 200
or 500). The out-of-control ARL are also to be estimated and they should be kept as small
as possible. If µ0 and Σ0 are known, each statistic Tk2 in (1.1) (k = 1, 2, . . .) follow a χ2
distribution independently with p degree of freedom and the RL of a chart is a geometric
random variable. For example, the multivariate Shewart control chart has an upper control
limit h = χ2p,1−α with an in-control ARL = 1/α. Several authors provided an analysis on
{Tk2 } when sample estimators µ̂0 and Σ̂0 are used as in (1.2). For example, Wierda (1994)
showed that [(mn − m − p + 1) /p (m + 1) (n − 1)] Tk2 is distributed as a Fisher distribution
Fp,mn−m−p+1 . In Phase Two, if the subsequent statistics Tk2 were independent at different
time point k = 1, 2, . . ., the control limit h∗ corresponding to an in-control ARL = 1/α
would be
p (m + 1) (n − 1)
h∗ = F1−α,p,mn−m−p+1, and P T12 > h∗ = α,

(1.3)
m (n − 1) + 1 − p

where F1−α,p,mn−m−p+1 is the α-th upper quantile of Fp,mn−m−p+1 . However, as estimators


are used in place of known parameters, the Tk2 ’s become dependent so that with the control
limit h∗ specified above, the actual in-control ARL indeed differs from the target value α−1 .
Champ et al. (2005) took the dependence among the Tk2 into consideration and offered
corrected control limit h using intensive simulations. More precisely, using the sample
estimators µ̂0 and Σ̂0 , the current statistic Tk2 of (1.2) with inspected sample mean X̄k
computed from the sample Xk = {Xk,1 , Xk,2 , . . . , Xk,n } can be written as:

T
√ √
  
1 −1 1
Tk2 = m (n − 1) Zk + nδ − √ Z0 W0 Zk + nδ − √ Z0 , (1.4)
m m
√   √ −1  k 
Z0 = mnP−1 0 X − µ0 ∼ N p (0, I), Zk = nP 0 X − µ −1
0 ∼ Np (0, I), δ = P0 (µ − µ0 ),
−1 T
W0 = m (n − 1) P−1

0 S P0 ∼ Wishartp (I, m (n − 1)), and the matrix P0 is defined by
T
the factorization Σ0 = P0 P0 . Moreover, Z0 , Zk , and W0 are independent in this repre-

3
sentation. If we further use an orthogonal transformation B such that Bδ = (d, 0, . . . , 0)T ,
we have
T
√ √
  
2 ∗ 1 ∗ ∗−1 ∗ 1 ∗
Tk = m (n − 1) Zk + nde − √ Z0 W0 Zk + nde − √ Z0 , (1.5)
m m

where e is the p × 1 vector (1, 0, . . . , 0)T , Z∗0 = BZ0 ∼ Np (0, I) , Z∗k = BZk ∼ Np (0, I) ,
W0∗ = BW0 BT ∼ Wishartp (I, mn − m) and they are independent. Notice that d2 =
kδk2 = (µ − µ0 )T Σ−1
0 (µ − µ0 ) is the non-central parameter. Two important features of the
T 2 chart with estimated parameters emerge from this representation. First, the distribution
of each Tk2 depends on the unknown parameters µ0 and Σ0 through d only; second, even
though the samples, say k = 1, 2 . . . , in Phase II are independent, the statistics {Tk2 }k>1
are dependent since they share the same random variables Z∗0 , {Z∗k }k>1 and W0∗ which are
functions of the estimated parameters. However, the {Tk2 }k>1 still have the same marginal
distribution.
Despite the interesting representation (1.5) , the joint distribution of {Tk2 }k>1 is far
from being known. This makes it a particular challenge to determine a control limit h for
a given in-control ARL. One attempt consists of keeping the same control limit h as if
the parameters were known and not estimated, namely the traditional limit h∗ . However,
it has been reported in the literature that this seemingly natural approach inflates the
actual ARL0 (in-control) which becomes larger than the target ARL0 , see e.g. Bersimis et
al. (2006). Moreover, Champ et al. (2005) provided detailed simulations to verify this
phenomenon. As first main contribution of the paper, we will provide a formal proof of
the phenomenon.
Furthermore, the design of an efficient T 2 chart with estimated parameters requires
an appropriate method to calculate or approximate the ARL function. Champ et al.
(2005) proposes an effective approach where the ARL function is tabulated using extensive
Monte-Carlo simulation. Typically, given a target ARL0 , say 200 with m samples of block
length n in Phase I, the corrected UCL h̄ can be found by large size simulations and
binary adjustments of h. Tables in Champ et al. (2005) report the corrected UCL h̄
for ARL0 = 200 and a wide range of parameters: dimension 2 6 p 6 10, sample size
30 6 m 6 100 and block length 3 6 n 6 15.
Although this Monte-Carlo approach is valuable and precious, one may search for other
approximation methods for several reasons. First, since the random ARL function has a
quite large variance, such simulation procedures may suffer from numerical instability.
Second, as for all tabulating approaches, a practitioner may face a problem where the
parameters p, m and n are not documented in the established tables. A method based on
analytic approximation relaxes the restriction on m and n and provides a more convenient
alternative to designing a T 2 chart. As second main contribution of this paper, we establish
such an analytic approximation for the ARL function. Due to the complexity of the
problem, we limit ourselves to the univariate case p = 1 and we establish a numerical
method for the determination of a control limit h when an arbitrary in-control ARL is
given.

4
2 Comparison between target and actual values of ARL
when using the traditional control limit
For a given upper control limit h, the ARL function of the T 2 chart is

X ∞
X
ARL (h) = kP (Run length = k) = P (Run length > k)
k=0 k=0

X
2
P T12 6 h, T22 6 h, . . . , Tk−1 6 h, Tk2 6 h .

= (2.1)
k=0

Meanwhile, the traditional control limit h∗ is derived by assuming the independence be-
tween the Tk2 ’s, which is

X k 1 1
P T12 6 h∗
 
ARL0 w = = , (2.2)
k=0
P (T12 ∗
>h ) α

where h∗ equals to the α-th upper quantile of T12 as given in (1.3). For example, if the
target ARL0 is 200, one finds α = 2001
and the value of h∗ is derived.
2
However, the statistics Tk ’s are indeed dependent. There will be therefore a bias be-
tween the actual ARL and the target ARLusing the independence approximation (2.2)
above. Actually, it has been observed for years that using estimated parameters leads to
an in-control ARL that is higher than the expected one.
To the best of our knowledge, this paper is the first to give a formal proof of this
phenomenon.
Theorem 1. For the T 2 chart in (1.5) using estimated parameters µ̂0 and Σ̂0 , we have
ARL (h) > P T12 >h for all h > 0. In particular, if the traditional control limit h∗ in (1.3) is
( 1 )
used, then the actual in-control ARL is always larger than the target ARL. i.e. ARL0 > α1 .
Proof. By (2.1),


X ∞ n o
 X
ARL (h) = P T12 6 h, . . . , Tk2 6h = E 1(T 2 6h) , . . . , 1(T 2 6h) . (2.3)
1 k
k=0 k=0

Here 1A denotes the indicator function of an event A. Note that in the representation
(1.5), the {Z∗k }k>1 are independent of Z∗0 and W0∗ . As a result, conditionally to Z∗0 and
W0∗ , the statistics T2k ’s are independent. Therefore, by the conditioning method,
n o n  o
∗ ∗
E 1(T 2 6h) , . . . , 1(T 2 6h) = E E 1(T 2 6h) , . . . , 1(T 2 6h) | Z0 , W0 (2.4)
1 k
n  1 k
ok 
= E E 1(T 2 6h) | Z∗0 , W0∗ .
1

5
 k k
After applying
 the Jensen’s
 inequality E X > {E [X]}h to the i nonnegative variables
∗ ∗
X = E 1(T 2 6h) | Z0 , W0 and observing that E [X] = E 1(T 2 6h) = P (T12 6 h), we have
1 1

n o n h  iok 
∗ ∗ k
= P T12 6 h

E 1(T 2 6h) , . . . , 1(T 2 6h) > E E 1(T 2 6h) | Z0 , W0 .
1 k 1

Hence,

X k 1
P T12 6 h
 
ARL (h) > = . (2.5)
k=0
P (T12 > h)

In particular, for the in-control ARL using traditional control limit as UCL,
1 1
ARL (h∗ ) > = .
P0 (T12 > h∗ ) α

Theorem 1 formally establishes the fact reported in the literature (Bersimis et al. ,
2006) that using the traditional limit h∗ in the presence of estimated parameters will
inflates the in-control ARL. Moreover, we observe that Equation (2.5) shows that the
ARL(h) function is increasing in h, so that the needed UCL h̄ for a given target in-control
ARL must be smaller than the traditional limit h∗ . As for the out-of-control ARL, again
by the monotonicity of the ARL function, the wrong use of h∗ instead of the needed UCL
h̄ will also increases the underlying out-of-control ARL.

3 Analytic approximations for the in-control ARL in the


univariate case
As explained in the introduction, we develop an approximation method for the in-control
ARL function with univariate observations. To begin with, we give an exact analytic
expression of the in-control ARL.
Proposition 1. When p = 1,
 
1
ARL0 = E   √   √  , (3.1)
Φ b 0 h + a0 + Φ b 0 h − a0

√ p p R∞  2
where a0 = Z∗0 / m , b0 = W∗0 / m (n − 1) , Φ (x) = x √1

exp − t2 dt and Z∗0 , W0∗
are the variables defined in (1.5) .
Proof. First, we rewrite (1.5) when p = 1 and assume all Phase II observation {Tk2 }k=1,2,...
are in-control (d = 0):

6
 2  ∗ 2
m (n − 1) ∗ 1 ∗ Zk − a0
Tk2= Zk − √ Z0 = , (3.2)
W0∗ m b0
√ p ∗ p
where a0 = Z∗0 / m , b0 = W0 / m (n − 1). Here Z∗0 ∼ N (0, 1) , W0∗ ∼ χ2m(n−1) .
 ∗ 2
Z −a
Particularly, T12 = 1b0 0 .
Denote the conditional probability P∗ = P (T12 > h | Z∗0 , W0∗ ) ∈ (0, 1). Then based on
the distribution of T12 above, we have


 √ ∗ ∗
  √ ∗ ∗

P = P T1 < − h | Z0 , W0 + P T1 > h | Z0 , W0
 √   √ 
= P Z∗1 < −b0 h + a0 | Z∗0 , W0∗ + P Z∗k > b0 h + a0 | Z∗0 , W0∗
 √   √ 
= Φ b 0 h + a0 + Φ b 0 h − a0 . (3.3)

According to (2.3), (2.4) and (3.3) above,



X hn  oik
ARL0 (h) = E E 1(T 2 6h) | Z∗0 , W0∗
1
k=0

X  k
P T12 6 h | Z∗0 , W0∗

= E
k=0
∞  
X
∗ k 1
= E [1 − P ] = E
k=0
P∗
 
1
= E  √   √ 
Φ b 0 h + a0 + Φ b 0 h − a0

We have obtained an analytic


  √  of ARL
expression  √0 , which
depends
−1
on the conditional
probabilities G (a0 , b0 , h) = Φ b0 h + a0 + Φ b0 h − a0 . We seek for an approx-
imation of G (a0 , b0 , h) in order to find an effective approximation of the ARL function
(3.1) .
There is a traditional approximation to evaluate the complementary cumulative distri-
bution function Φ (x) (see Abramowitz and Stegun (1965)):

7
( )
Z (x) 1 1·3 (−1)N1 · 1 · 3 · · · (2N1 − 1)
Φ (x) w 1 − 2 + 4 −··· +
x x x x2N1
N1
X (−1)i · 1 · 3 · · · (2i − 1)
= Z(x) , x 6= 0, (3.4)
i=0
x2i+1
2
where Z (x) = (2π)−1/2 e−x /2 is the standard normal density function and N1 ≥ 1 deter-
mines the degree of the expansion (here for i = 0, the product 1 · 3 · · · (2i − 1) is empty
so that its value equals to 1 by convention). The main idea is then to approximate the
denominator of the Gqfunction with (3.4).
√ W∗0 h Z∗0
Let y = b0 h = m(n−1) and z = a0 = √m where Z∗0 ∼ N (0, 1) , W0∗ ∼ χ2m(n−1) . We
1
find that z is small with high probability since E (z) = 0 and var (z) = m
, where m is
usually large. Expanding Φ around y , we can rewrite G as

1 1
G (a0 , b0 , h) = = − l (z, y) , (3.5)
2Φ (y) + m (z, y) 2Φ (y)

where m (z, y) and l (z, y) are the errors associated to these approximations which are
1
small compared with 2Φ(y) . Therefore, the procedure of approximating G (a0 , b0 , h) can be
1
decomposed into two steps. First, we estimate the main part 2Φ(y) . Second, we find an
estimation for l (z, y), which will helpfully lead to an approximation of G and finally of the
ARL function in (3.1) .

1
3.1 Approximation of 2Φ(y)

We rewrite (3.4) as
N1
Z (y) 1 X (−1)i · 1 · 3 · · · (2i − 1)
Φ (y) w , with = , (3.6)
f1 (y) f1 (y) i=0
y 2i+1

where y 6= 0 and Z (y) 6= 0. Then


1 1
w · f1 (y) . (3.7)
2Φ (y) 2Z (y)

Next, we expend the rational function f1 (y) as N 1−2i


P 2
i=0 ci y , a sum of negative powers in
y, where N2 is the order of the expansion to be fixed according to a desired accuracy. To
be clear, we list below two examples of f1 (y) (N1 = 2 or 3) and the associated expansions
using N2 = 3 or 5.

8
f1 (y) N2 = 3 N2 = 5
−1
N1 = 2 y −1 − y −3 + 3y −5 y+ y −1 − 2y −3 − 5y −5 y+ y −1 − 2y −3 − 5y −5 + y −7 + 16y −9
N1 = 3 y −1 − y −3 + 3y −5 − 15y −7 −1 y + y −1 − 2y −3 + 10y −5 y + y −1 − 2y −3 + 10y −5 + 31y −7 − 29y −9


Finally, we obtain the target approximation as follows


N2
!
1 1 X
' ci y 1−2i , (3.8)
2Φ (y) 2Z (y) i=0

and its accuracy will be determined by an appropriate choice of N1 > 0 and N2 > 1.

3.2 Approximation of l (z, y)


1 1
The remainder l (z, y) is the actual difference between Φ(y+z)+Φ(y−z)
and 2Φ(y)
. First, we
estimate Φ (y + z) and Φ (y − z) by rational functions in x. According to (3.4),

N1
X (−1)i 1 · 3 . . . (2i − 1)
Φ (y + z) w Z (y + z)
(y + z)2i+1
i=0
 N1
2yz + z 2 X (−1)i 1 · 3 . . . (2i − 1)

= Z (y) exp − 2i+1 .
2 i=0
(y + z)
 P
2 N1 (−1)i 1·3...(2i−1)
Next, we expand the term exp − 2yz+z
2 i=0 (y+z)2i+1
in a power series of z according
to z’s property (z is small) by keeping the four first powers as e0 (y) + e1 (y) z + e2 (y) z 2 +
e3 (y) z 3 . More powers in z may be used but we find that the approximation is accurate
enough with four terms. Then Φ (y + z) can be written as

Φ (y + z) = Z (y) e0 (y) + e1 (y) z + e2 (y) z 2 + e3 (y) z 3 + o z 3 ,


 
(3.9)

where each ei (y) (i = 0, 1, 2, 3) is a power function in y.


Similarly, Φ (y − z) can be written as

Φ (y − z) = Z (y) e0 (y) − e1 (y) z + e2 (y) z 2 − e3 (y) z 3 + o z 3 .


 
(3.10)

Φ(y) Z(y)
According to (3.9) and (3.10) and by assuming z = 0, we find e0 (y) = Z(y)
w · 1
f1 (y) Z(y)
=

9
1
f1 (y)
. Thus with the notation f2 (y) = (f1 (y))2 e1 (y), l (z, y) can be estimated as:

1 1
l (z, y) = −
2Φ (y) Φ (y + z) + Φ (y − z)
 
f1 (y) 1
≈ 1−
2Z (y) 1 + f1 (y) e2 (y) z 2 + o (z 3 )

f1 (y) X i
= (−1)i+1 f1 (y) e2 (y) z 2 + o z 3
2Z (y) i=1
z2
· f2 (y) + o z 3 .

= (3.11)
2Z (y)

PNFurthermore, by a combination of the expansions e2 (y) and f1 (y), we get f2 (y) of form
2 (3−2i)
i=0 di y . We list several examples of f2 (y) and their expansions with different values
of N1 and N2 (N1 = 2 or 3, N2 = 3 or 5).

N1 = 2 N1 = 3
e2 (y)
   
1y
+ 15
2 2
y −5 + 45y −7 1y
− 105
2 2
y −7 − 420y −9
−2 −2
f2 (y)
   
1 y + 15 y −5 + 45y −7 y −1 − y −3 + 3y −5 1 y − 105 y −7 − 420y −9 y −1 − y −3 + 3y −5 − 15y −7
2 2 2 2

N2 = 3 1 y3
2
3 y −1 + 1 y −3 + 58y −5
+y− 2 2
1 y3
2
+y− 3
2
y −1 + 8y −3 − 19
2
y −5

N2 = 5 1 y3
2
3 y −1 + 1 y −3 + 58y −5 + 189 y −7 − 401 y −9
+y− 2 2 2 2
1 y3
2
3 y −1 + 8y −3 − 19 y −5 − 543y −7 − 1391 y −9
+y− 2 2 2

The final approximation for l (z, y) is

N2
!
z2 X
l (z, y) ≈ di y (3−2i) . (3.12)
2Z (y) i=0

By combining (3.8) and (3.12) together with (3.5), finally, we get the following approx-
imation

N2
! N2
!
1 X z2 X
G (a0 , b0 , h) w ci y 1−2i − di y (3−2i) , (3.13)
2Z (y) i=0
2Z (y) i=0
q
W∗0 h Z∗
where y = m(n−1)
, z= √0
m
and Z∗0 ∼ N (0, 1) W0∗ ∼ χ2m(n−1) .

Proposition 2. Based on the approximation (3.13), an analytic approximation for ARL0


formula is given by
√ N2  
2π X di
ARL
\0 (h) = ci − gi (h) , (3.14)
2 i=0 m

10
where
mn−m 2i−3
 h
( 2i−3 − mn−m )
Γ 2
− 2
1 − mn−m 2 2

gi (h) = 2i−3  2i−3 .


mn−m h

2 2 Γ 2
2
mn−m
2
 
(Z∗0 ) 1
Proof. By (3.13), (3.1) and E (z 2 ) = E m
= m
, we have

" N2
! N2
!#
2
1 X z X
ARL
\0 (h) =E ci y (3−2i) − di y (3−2i)
2Z (y) i=0 2Z (y) i=0
√ N2     2 
2π X di y (3−2i)
= ci − E exp y .
2 i=0 m 2
h  2 i √ q ∗
W0 h
Let gi (h) = E exp y2 y (3−2i) . Since W0∗ ∼ χ2m(n−1) and y = b0 h = m(n−1)
is
bounded with high probability, for each i (0 6 i 6 N2 ) , we have
"∞ #
X 1  y 2 k
(3−2i)
gi (h) = E y
k=0
k! 2

X 1  (3−2i+2k) 
= k k!
E y
k=0
2
∞   3−2i+2k  (3−2i+2k) 
X 1 h 2

= k
E W0 2
k=0
2 k! mn − m
∞  3−2i+2k 3 mn−m



X 1 h 2 (3−2i+2k) Γ k + i +
2  2
= k k!
·2 2 mn−m
k=0
2 mn − m Γ 2
P∞ 1 h k
Γ k + mn−m − 2i−3
 
k=0 k! mn−m 2 2
= 2i−3  2i−3 . (3.15)
mn−m h

2 2 Γ 2 mn−m
2

m(n−1)+3
Here the condition N2 < 2
must be satisfied for the definition of the Gamma
function.

11
h mn−m 2i−3
Now, we calculate the numerator of (3.15) with the notation a = mn−m
, b= 2
− 2 .
∞ ∞ ∞
ak ak
X X Z
Γ (k + b) = exp (−t) tk+b−1 dt
k=0
k! k=0
k! 0
Z ∞ "X ∞
#
(at)k
= exp (−t) tb−1 dt
0 k!
Z ∞ k=0
= exp [− (1 − a) t] tb−1 dt
0
= Γ (b) (1 − a)−b , (3.16)
P∞ 1 h
k ( 2i−3 − mn−m )
Γ k + mn−m 2i−3 mn−m 2i−3 h
 
i.e. k=0 k! mn−m 2
− 2
=Γ 2
− 2
1− mn−m
2 2
.
Substituting (3.16) into (3.15) , we get

mn−m 2i−3
 h
( 2i−3 − mn−m )
Γ 2
− 2
1 − mn−m 2 2

gi (h) = 2i−3  2i−3 ,


mn−m h

2 2 Γ 2
2
mn−m
√ PN2
2π di

and finally we have ARL
\0 (h) =
2 i=0 ci − m
gi (h).

4 The approximation orders N1 and N2


Our approximation formula (3.14) depends on two parameters, namely, the orders N1 and
N2 introduced in the expansions. These parameters determine the coefficients {ci } and
{di } and finally the accuracy of the approximation (3.14). Below we give some indications
on the choice of these parameters.
First, we decide the value of N1 . The criterion is to choose N1 such that the approxi-
mation (3.6) of Φ (y) is accurate enough.
According to Abramowitz and Stegun (1965), the difference between Φ (y) and the
approximation (3.6)
N1
Z (y) X (−1)i 1 · 3 . . . (2i − 1)
= Z (y)
f1 (y) i=0
y 2i+1
has the form
Z ∞
Z(y) N1 +1 Z (t)
RN1 (y) = Φ (y) − f1 (y)
= (−1) · 1 · 3 · · · · · (2N1 + 1) dt
y t2N1 +2

whose absolute value is a decreasing function of y (y > 0) when N1 is given. So if we


choose an N1 to ensureq|RN1 (y0 )| < 10−3 at y0 , then this also holds for any y > y0, . Since
W∗0 h
in our application, y = m(n−1)
where W0∗ ∼ χ2m(n−1) , we can see that y has the magnitude

12

of h, which usually ranges from 2 to 10 (usually the h is ranging from 4 to 100). So we
P 1 (−1)i 1·3...(2i−1)
compare Φ (y) with the approximation fZ(y)
1 (y)
= Z (y) N
i=0 y 2i+1
at the point y0 = 2
in order to find an appropriate N1 .

Table 1: Comparison between RN1 (y0 ) and fZ(y 0)


1 (y0 )
when y0 = 2 (accurate to five decimal
places)
N1 1 2 3 4 5 6
RN1 (y0 ) 0.00250 -0.00256 0.00377 -0.00730 0.01761 -0.05090

According to Table 1, we can see that the absolute values of RN1 (y0 ) are larger than
−3
10 with alternate signs (y0 = 2), which are not negligible. In order to reduce the
error RN1 (y0 ), we decide to average the approximation fZ(y) 1 (y)
(N1 = 1, 2, 3, 4) by taking
advantage of the RN1 (y)’s signs. Equivalently, we average (3.14) when N1 = 1, 2, 3, 4
and y = 2. Moreover, the averaged approximation (3.4) will perform better when y > 2,
since RN1 (y) is smaller for larger y. As a result, we choose several N1 = 1, 2, 3, 4 rather
than choosing a single N1 because of RN1 (y0 )’s non-negligible values.
P −1
N1 (−1)i 1·3...(2i−1)
The choice of N2 directly affects the deviation of f1 (y) = i=0 y 2i+1
in
(3.7) from its Taylor expansions i=0 ci y 1−2i and the deviation of f2 (y) = (f1 (y))2 e2 (y)
PN2

in (3.11) from its Taylor expansions N (3−2i)


P 2
i=0 di y . The values of f1 (y) and f2 (y) =
2
(f1 (y)) e2 (y) converge to their Taylor expansions when y > 2. Therefore, we should
select large N2 in order to approximate f1 (y) and f2 (y) precisely by using their Taylor
expansions. Then the final approximation can also be more accurate by sharing the same
coefficients {ci } and {di }. We choose the smallest N2 ensuring f1 (y) − N 1−2i
P 2
i=0 ci y <
10−4 and f2 (y) − N (3−2i)
< 10−4 . By numerical computation, we obtain several
P 2
i=0 di y
N2 ’s corresponding to different values of y for 1 6 N1 6 4 and they are given in Table 2.

13
Table 2: Minimum values of N2 for different values of y when 1 6 N1 6 4 ensur-
ing the error bound f1 (y) − N 1−2i
< 10−4 (top), and ensuring the error bound
P 2
i=0 ci y

f2 (y) − N (3−2i)
< 10−4 (bottom).
P 2
i=0 di y

y 2 2.5 3 3.5 4 4.5

1
N1 = 1 f1 (y) = 
1
 N2 7 5 3 3 3 3
y
− 13
y

1
N1 = 2 f1 (y) = 
1
 N2 11 7 5 5 4 3
y
− 13 + 35
y y

1
N1 = 3 f1 (y) = 
1
 N2 17 10 7 5 4 4
y
− 13 + 35 − 157
y y y

1
N1 = 4 f1 (y) = 
1
 N2 36 13 9 6 5 5
− 13 + 35 − 157 + 105
y y y y y9

y 2 2.5 3 3.5 4 4.5

( 12 y−32 y−3 −6y −5


)
N1 = 1 f2 (y) = 1
 N2 10 7 6 5 5 4
y
− 13
y

( 12 y+ 152 y−5 +45y−7 )


N1 = 2 f2 (y) = 
1
2 N2 13 10 8 6 6 5
y
− 13 + 35
y y

( 12 y− 105
2
y −7 −420y −9 )
N1 = 3 f1 (y) = 
1
2 N2 28 14 10 8 7 5
y
− 13 + 35 − 157
y y y

( 12 y− 945
2
y −9 −4725y −11 )
N1 = 4 f1 (y) = 
1
2 N2 65 18 11 9 6 6
− 13 + 35 − 157 + 105
y y y y y9

14
From Table 2, we find that for a given N1 , the value of N2 decreases when y increases
to ensure an approximation error less than 10−4 for both f1 (y) and f2 (y), that is, the
appropriate value of N2 for y = 2 will remain appropriate for all y > 2. According to Table
2 and Table 3, we know that the optimal choice of N2 when y = 2 is N2 = 65 to satisfy all
the circumstances 1 6 N1 6 4.
Summarizing the discussion above, we have made our decision to approximate ARL0
by choosing 1 6 N1 6 4 , N2 = 65 and averaging this four cases as follows:
4
\0 (h) = 1
X
ARL \0 (h | N1 = i, N2 = 65)
ARL (4.1)
4 i=1

\0 (h | N1 = i, N2 = 65) (i = 1, 2, 3, 4) are from (3.14) . This is the final


where ARL
approximation used for the remaining of the paper.

5 Monte-Carlo evaluation of the proposed analytic ARL


approximation
The previous section lays the groundwork that allows us to approximate the values of ARL0
of the T 2 chart with estimated parameters. Here we focus on practical implementation of
the estimations.
According to the analytic approximation (4.1) , we can obtain a series of control limits
with different m and n by setting the right side of the equation (4.1) a given value, for
example, ARL0 = 200 or 500. Then by programming, we can compute the corresponding
control limit h and thus the control chart criterion can be confirmed with given m and n
when p = 1.
According to (3.14) and (4.1), we first compute and list each h (N1 = i, N2 = 65) en-
\0 (h | N1 = i, N2 = 65) = 200 and 500 (i = 1, 2, 3, 4) when m = 50, 70, 100
suring ARL
with size of subgroup n = 5, 10, 15. The computation is based on Monte Carlo. Then
for each group, we average four values of h (N1 = i, N2 = 65) (i = 1, 2, 3, 4) to get final
control limit h̄ that we propose to use in practice for given m and n. The results are given
in Tables 3 and 4 for ARL0 of value 200 and 500, respectively. In the last column of these
tables, the values of the corresponding traditional UCL h∗ are given for comparison. Recall
that given m, n and ARL0 , h∗ is determined in (1.3). These tables show in particular that
the actual UCL h, hence our approximated value h, are smaller than the traditional h∗ as
predicted by Theorem 1, Although these differences seem not that big, the corresponding
ARL values can be indeed very different due to the large variation of the ARL values.
After computing all the values of control limit h̄ for various combination of m and
n, we next check its accuracy by simulations. The simulation method is based on (1.5)
following Champ et al. (2005). For each case with specific m, n and ARL0 , the number
of independent repetitions is 10000.

15
Table 3: Control limits for each N1 = 1, 2, 3, 4 ensuring ARL0 = 200 using m subgroups
of size n when N2 = 65 and their averages.
m n h (N1 = 1) h (N1 = 2) h (N1 = 3) h (N1 = 4) h̄ h∗
50 5 7.8122 7.9149 7.8469 7.9115 7.8714 8.2183
10 7.9035 8.0093 7.9398 8.0052 7.9645 8.1169
15 7.9299 8.0366 7.9665 8.0323 7.9913 8.0882
70 5 7.8109 7.9121 7.8462 7.9078 7.8693 8.1202
10 7.8761 7.9796 7.9125 7.9748 7.9358 8.0486
15 7.8950 7.9991 7.9316 7.9941 7.9550 8.0283
100 5 7.8111 7.9113 7.8468 7.9062 7.8689 8.0473
10 7.8569 7.9585 7.8933 7.9532 7.9154 7.9976
15 7.8700 7.9721 7.9066 7.9666 7.9289 7.9835

Table 4: Control limits for each N1 = 1, 2, 3, 4 ensuring ARL0 = 500 using m subgroups
of size n when N2 = 65 and their averages.
m n h (N1 = 1) h (N1 = 2) h (N1 = 3) h (N1 = 4) h̄ h∗
50 5 9.4637 9.5318 9.4949 9.5238 9.5036 10.0023
10 9.5970 9.6680 9.6296 9.6594 9.6385 9.8557
15 9.6356 9.7074 9.6687 9.6987 9.6776 9.8143
70 5 9.4719 9.5395 9.5034 9.5312 9.5115 9.8708
10 9.5673 9.6369 9.5998 9.6282 9.6081 9.7675
15 9.5948 9.6650 9.6276 9.6562 9.6359 9.7383
100 5 9.4799 9.5471 9.5115 9.5386 9.5193 9.7734
10 9.5467 9.6154 9.5790 9.6065 9.5869 9.7018
15 9.5660 9.6350 9.5985 9.6261 9.6064 9.6815

16
Table 5: Empirical ARL0 performance of the approximate control limit h̄ with target
ARL0 = 200
m n h Empirical ARL (s.e.)
50 5 7.8714 202.2(243.9)
10 7.9645 200.8(216.7)
15 7.9913 198.5(213.0)
70 5 7.8693 200.4(225.5)
10 7.9358 199.3(213.9)
15 7.9550 198.3(209.8)
100 5 7.8689 205.1(220.0)
10 7.9154 196.0(207.1)
15 7.9289 200.2(203.6)

Table 6: Empirical ARL0 performance of the approximate control limit h̄ with target
ARL0 = 500
m n h Empirical ARL0 (s.e.)
50 5 9.5036 510.0(691.0)
10 9.6385 503.8(570.2)
15 9.6776 510.5(557.1)
70 5 9.5115 506.0(612.0)
10 9.6081 499.4(556.3)
15 9.6359 504.4(538.7)
100 5 9.5193 505(576)
10 9.5869 506.4(531.3)
15 9.6064 505.7(529.4)

Table 5 and Table 6 give empirical ARL0 values for several combinations of m and n
when p = 1 using the control limits h̄ for target ARL0 = 200 and 500, respectively. We
observe that in both Tables, the empirical values of the ARL0 from the approximation
UCL h̄ are very close to the target one within an error of 2% at most. One may also
observe quite large standard errors of the empirical ARL0 which are however common in
such Monte-Carlo experiments.

6 Concluding remarks
When estimated parameters from a Phase I study are used in the design of a Phase II
control chart, it has been widely observed in the literature that the properties of the chart
are modified. As a theoretical contribution of the paper, we have provided a rigorous
proof of this phenomenon in the paper. Next, in addition to existing Monte-Carlo based

17
determination method of control limits, we have proposed an analytic method based on
an approximation of the ARL function. However, due to the complexity of the approx-
imation problem, our solution is limited to univariate observations. How to extend such
approximation procedure to more general multivariate observations remains an important
question to explore in the future.

References
Abramowitz, M. and Stegun, I. (1965). Handbook of mathematical functions: with formulas,
graphs, mathematical tables, volume 55. Dover publications.

Bersimis, S., Psarakis, S. and Panaretos, J. (2006). Multivariate statistical process control
charts: an overview. Quality and Reliability Engineering International, 23(5), 517–543.

Champ, C., Jones-Farmer, L. and Rigdon, S. (2005). Properties of the t 2 control chart
when parameters are estimated. Technometrics, 47(4), 437–445.

Hotelling, H. (1947). A generalized t test and measure of multivariate dispersion.

Jensen, W., Jones-Farmer, L., Champ, C., Woodall, W., et al. (2006). Effects of pa-
rameter estimation on control chart properties: a literature review. Journal of Quality
Technology, 38(4), 349–364.

Kim, K., Mahmoud, M. and Woodall, W. (2003). On the monitoring of linear profiles.
Journal of Quality Technology, 35(3), 317–328.

Lowry, C. and Montgomery, D. (1995). A review of multivariate control charts. IIE


transactions, 27(6), 800–810.

Nedumaran, G. and Pignatiello, J. (1999). On constructing t2 control charts for on-line


process monitoring. IIE transactions, 31(6), 529–536.

Ryan, T. (2011). Statistical methods for quality improvement. Wiley.

Sullivan, J. and Woodall, W. (1996). A comparison of multivariate control charts for


individual observations. Journal of Quality Technology, 28(4), 398–408.

Wierda, S. (1994). Multivariate statistical process control: recent results and directions
for future research. Statistica neerlandica, 48(2), 147–168.

18

You might also like