0% found this document useful (0 votes)

58 views7 pages

Risk Assessment Methodology Guide

This section discusses applying the statistical methodology presented to other factors in the Exposure Factors Handbook. It categorizes the available data into four cases and provides recommendations for each case. For the case with 6 or more percentiles, it recommends using maximum likelihood estimation to fit 12 distribution models and goodness-of-fit tests. For the case with 3 to 5 statistics, it recommends fitting 2-parameter distributions by minimizing error and using bootstrapping for estimation, goodness-of-fit tests, and uncertainty analysis.

Uploaded by

uzzn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views7 pages

Risk Assessment Methodology Guide

Uploaded by

uzzn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

6

Discussion and Recommendations

This section discusses applicability on a large scale of the methodology presented here to other
factors in the Exposure Factors Handbook (EFH). Data quality issues are discussed first, then
recommendations are presented.

6.1

Adequacy of Data
As defined in Section 1, a statistical methodology is a combination of an experimental design or

data set, a class of models, and an approach to inference. Although each of these three factors is
important, their relative importance as determinants of the overall quality of the output is the same as the
order given. That is, the quality of the data is obviously the most important factor, the quality of the
models is second in importance, and the approach to inference is third (Cox, 1990; Johnson, 1978).

The greatest possible gains in overall risk assessment quality would come from designing and
conducting a survey of the population of interest for each risk assessment. Ideally, individuals selected
from the population by probability-based sampling would be monitored for periods of time, which would
allow both long-term and short-term parameter estimates. Duplicate diet techniques would be employed,
whereby exact copies of all foods and beverages consumed would be obtained, weighed, catalogued, and
chemically analyzed for each subject. Similarly representative direct tactics would be employed for other
routes of exposure for the same sample subjects. Probability-based surveys are uniquely qualified to
produce representative data on the population of interest and entail minimal assumptions.

However, the customized survey approach will rarely be used. In most cases, exposure assessors
must do the best they can, working with data from diverse sources, summarized in the fashion of the
EFH. For some of the EFH factors, it may be possible to update the key studies.

In some cases, EFH data are extremely limited and may consist of only a single number, such as
an estimated mean. In such a case, one is tempted to claim that the data are inadequate for choosing

6-1

Discussion and Recommendations

Research Triangle Institute

distributions. However, the risk assessor may not have this luxury. In many cases, something must be
done, no matter how limited the data are, or even in a complete absence of data. In fact, cases of such
limited data are precisely the kind where quantification of uncertainty is most important. Expert
judgment may have to be substituted for data, and sweeping, apparently unwarranted, assumptions may
be necessary. Sensitivity analysis is almost essential in such a case. In implementing the sensitivity
analyses, two or three plausible assignments should be made for the distribution of the factor, and a
corresponding number of risk assessment simulations should be done, based on each assignment. Of
course, if F factors each require D different distributions for sensitivity analysis, then F*D separate risk
assessment simulations are required, which could be prohibitively expensive.

Given only a mean and standard deviation, or only a mean and 99th percentile, one would produce
the corresponding gamma, lognormal, and Weibull distributions. Since each would fit the two given
numbers perfectly, one would have no data-based method for preferring one of the three, and each would
have to be used in risk assessment simulations to investigate sensitivity of conclusions to the type of
distribution. If only a point estimate, such as a mean, were available, one would try to obtain a plausible
population coefficient of variation (CV) or standard deviation by considering similar factors or eliciting
expert judgment. Then, one would determine the gamma, lognormal, and Weibull distributions with the
given mean and standard deviation and recommend that all three be used in risk assessment simulations
to investigate sensitivity of conclusions.

To conclude the discussion of data adequacy, it is important to acknowledge again that situations
will arise where distributional assumptions are difficult to justify. In some of these cases, empirical
distributions may be used.

6.2

Application of Methodology to Other Exposure Factors

The remainder of this section concerns the applicability of this methodology to other exposure

factors of the EFH. The available EFH data summaries can be roughly classified into four cases: (1) six
or more percentiles available, (2) three to five statistics available, (3) two statistics available, and (4) at
most one statistic available. Raw data are rarely available. However, if raw data were available, it would
probably be treated as case 1, unless the sample sizes were very small.

6-2

Research Triangle Institute

Discussion and Recommendations

6.2.1 Case 1: Percentile Data

Summary of Methodology

# Models: 12-model hierarchy based on generalized F with point mass at zero

# Estimation: maximum likelihood
# Goodness-of-fit (GOF) tests: chi-square and likelihood ratio tests (LRTs)
# Uncertainty: asymptotic normality for large samples, bootstrap or normalized likelihood for
small samples
Many EFH data summaries contain six or more empirical percentiles for the given population
and factor. In many cases, other information also is provided, which may include a sample mean,
standard deviation, sample size, and percent exposed or percent consuming. This is referred to as the
percentile case, even though other information besides percentiles is also usually available.

Because more percentiles than moments are available, it seems reasonable to focus the analysis
on the percentiles, using the moment information as a check or validation on the distribution estimated
from the percentiles. However, the possibility of tailoring the inference to all the available information is
not ruled out. For example, the tap water data of Section 3 include nine empirical percentiles, the sample
mean, and sample standard deviation for each age group. Model parameters could be estimated to
minimize the average percent error in all 11 of these quantities. The resulting nonstandard estimate
would not have a nice textbook distribution, but simulation or bootstrap techniques could be used to
approximate its distribution to obtain GOF tests and uncertainty parameter distributions.

The joint asymptotic distribution of any specified sample percentiles is known to be multivariate
normal, with known means, variances, and covariances (Serfling, 1980). The joint asymptotic
distribution of specified sample moments is also known (Serfling, 1980). Conceivably, the joint
asymptotic distribution of specified percentiles and moments also could be determined. This would
make it possible to apply a conventional type of asymptotic analysis that takes into account all of the
available sample percentile and moment information.

If six or more percentiles are available, the methods applied in Section 3 to the tap water data are
recommended. Specifically, use maximum likelihood estimation (MLE) to fit the five-parameter

6-3

Discussion and Recommendations

Research Triangle Institute

generalized F distribution and all of the special cases identified in Sections 1 and 2 and used in Section 3.
For formal GOF, use both the chi-square test of absolute fit and the LRT of fit relative to the fiveparameter model. To obtain distributions for parameter uncertainty, use asymptotic normality for large
samples, and use bootstrapping or the normalized likelihood for small samples. Ideally, simulation
studies would be used to at least check on coverage probabilities associated with the uncertainty analysis.

6.2.2 Case 2: Three to Five Statistics Available

Summary of Methodology

# Models: two-parameter gamma, lognormal, and Weibull

# Estimation: minimize average absolute percent error in the available statistics
# GOF tests: bootstrapping
# Uncertainty: bootstrapping
Because the available information is quite limited, consideration should be given to obtaining the
raw data.

If only three to five statistics are available, information is very limited, and it seems important to
use all available quantities in the estimation process. Such limited data also make it difficult to justify
going beyond the two-parameter models. Accordingly, fitting the two-parameter gamma, lognormal, and
Weibull models, using estimation to minimize the average absolute percent error in all available
quantities, is recommended. (With four or five statistics available, it would also be possible to fit the
generalized gamma, in addition to the two-parameter models.)

If the original sample size n is known, then bootstrapping can be used to obtain p-values for GOF
as well as to obtain parameter uncertainty distributions. To illustrate these applications of bootstrapping,
assume that three statistics are originally available: for example, the mean, standard deviation, and 90th
percentile. Parameters have been estimated for each of the three models (gamma, lognormal, and
Weibull) by minimizing the average absolute percent error. To apply bootstrapping, first generate 1,000
random samples of size n from the estimated (gamma, lognormal, or Weibull) distribution. For each

6-4

Research Triangle Institute

Discussion and Recommendations

sample, calculate the mean, standard deviation, and 90th percentile. Also for each sample, determine the
minimized average absolute percent error (MAAPE) and note which parameter values achieve the
minimum. Rank these 1,000 MAAPEs from largest to smallest. The p-value for GOF is determined by
the location of the original MAAPE among the 1,000 ordered simulated MAAPEs. For instance, if the
original MAAPE is between the 47th and 48th largest ordered simulated MAAPEs, then the p-value for
GOF is 0.048. The parameter uncertainty distribution for each model is simply the discrete distribution
that places mass 0.001 on each of the simulated parameter pairs for that model. The possibility of bias in
the bootstrapped parameter pairs should be checked. If necessary, such bias can be removed by a simple
translation so that the mean of the parameter uncertainty distribution is equal to the original estimated
parameter vector.

This bootstrap approach can be used for each of the three types of models. GOF p-values can be
used to decide whether model uncertainty requires that more than one of the three types of models be
used for risk assessment.

6.2.3 Case 3: Two Statistics Available

Summary of Methodology

# Models: two-parameter gamma, lognormal, and Weibull

# Estimation: exact agreement with the available statistics
# GOF tests: not applicable
# Uncertainty: bootstrap the available statistics for each model

Another fairly common EFH situation involves only two summary statistics, such as a mean and
upper percentile, or a mean and standard deviation. We will assume for illustrative purposes that the
mean and standard deviation are available. If bio-physico-chemical considerations do not dictate the type
of model, then determining the two-parameter gamma, lognormal, and Weibull distributions that agree
with the given information is recommended. Because of the considerable model uncertainty, at least the
first two types of models should be used in risk assessment. In some cases, such as CV<50%, as in

6-5

Discussion and Recommendations

Research Triangle Institute

Section 5 for inhalation rates, the differences between the models may be negligible relative to the
overall risk assessment so that use of any one of the models may be sufficient.

Because of data limitations, the models fit the available data perfectly and formal GOF tests are
not possible.

For parameter uncertainty distributions for each type of model, bootstrapping from the estimated
model can be used to obtain a distribution of parameter uncertainty, as described in Section 6.2.2. That
is, using the original estimated model parameters, 1,000 random samples of the original size are
generated and summarized in terms of the same two quantities, mean and standard deviation. For each
such simulated pair, the model agreeing with the mean and standard deviation is determined. This yields
parameter uncertainty distributions.

6.2.4 Case 4: At Most, One Statistic Available

If this situation arises, it will have to be treated on a case-by-case basis, as described in the fifth
paragraph of Section 6.1. Subjective, even Bayesian methods, would seem to be required, using expert
judgment and analogies with other similar factors to hypothesize models and parameter distributions.

6.2.5 Topics for Future Research

In Section 1.1, we discuss briefly two important problems related to the iid (identically and
independently distributed) assumption: modeling data from complex survey designs and the need to
account for correlations among exposure factors. While both issues were beyond the scope of the present
study, their importance cannot be overstated. Since risk assessors often lack raw data and must work
with published data summaries that may not be properly weighted, it would be useful to investigate
(perhaps by simulation) the magnitudes and nature of inaccuracies that arise by ignoring various aspects
of sample designs. Further, it would be interesting to examine whether these biases might be
differentially reflected in different PDF models and/or estimation procedures; in particular, it would be
useful to compare the robustness of the nonparametric density estimators to the parametric probability
density functions (PDF) models.

6-6

Research Triangle Institute

Discussion and Recommendations

Because many exposures, especially through dietary intake, are strongly correlated, multivariate
PDF modeling may be preferable to the univariate approach presented here. While multivariate models
are more realistic, their complexity makes them much more difficult to fit, estimate, and validate, and
they require considerably more data than their univariate counterparts. Nonetheless, efforts should be
made to extend the topics covered in this report to the multivariate case. Recent availability of userfriendly software for implementing multivariate parametric PDFs in Monte Carlo risk assessment models
(Millard, 1998; Millard and Neerchal, 1999) suggests that if the data are available and the limitations and
requirements properly understood, multivariate PDF models could be utilized by risk assessors who have
a basic understanding of statistical methods.

Finally, it should be noted that this report does not address temporal correlations within
individuals. Frequently, risk assessors will want to model, longitudinally, an individuals exposure to
one or more risk factors from birth to some advanced age. However, it is likely that assessors will have
to utilize cross-sectional exposure data reported for discrete age classes. While the methods described in
this report can be used to fit parametric PDFs to such data, there is an implicit assumption that the agespecific exposure distributions are mutually independent. In reality, a persons quantile values in the
various age-specific distributions will be correlated. Thus, a person who is in the first quartile of meat
ingestion in the jth age class is more likely to be in the first quartile of the j+1th age class than is a person
who was in the third quartile of meat ingestion in the jth age class. This problem is similar to the
multivariate exposure factor issue, just discussed, and should have a similar solution. It is important to
investigate and solve both in a manner that allows risk assessors to develop more realistic and flexible
models.

6-7

Week 6
No ratings yet
Week 6
13 pages
Chapter 1
No ratings yet
Chapter 1
32 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
TN 5 3.2 - 3.3
No ratings yet
TN 5 3.2 - 3.3
5 pages
Estimating Failure Function For Sure
No ratings yet
Estimating Failure Function For Sure
17 pages
Modified Generalized Weibull Distribution: Theory and Applications
No ratings yet
Modified Generalized Weibull Distribution: Theory and Applications
16 pages
Kebebewe
No ratings yet
Kebebewe
8 pages
Ch-1 Solution
No ratings yet
Ch-1 Solution
27 pages
Session-9 Mar 14
No ratings yet
Session-9 Mar 14
45 pages
Threshold Selection Methods
No ratings yet
Threshold Selection Methods
5 pages
Problem
No ratings yet
Problem
25 pages
Extended Modified Generalized Exponential Distribution: Properties and Applications
No ratings yet
Extended Modified Generalized Exponential Distribution: Properties and Applications
12 pages
Distribution of Parameters
No ratings yet
Distribution of Parameters
23 pages
Course No: MATH F432: Applied Statistical Methods
No ratings yet
Course No: MATH F432: Applied Statistical Methods
40 pages
Introduction To Statistics and Data Analysis
No ratings yet
Introduction To Statistics and Data Analysis
26 pages
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
No ratings yet
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
22 pages
Fitting Models To Percentile Data: Appendix B
No ratings yet
Fitting Models To Percentile Data: Appendix B
4 pages
Sas Procs
No ratings yet
Sas Procs
8 pages
Weib Zichuan
No ratings yet
Weib Zichuan
14 pages
Whether They Have A Natural Order - Whether They Are Recurrences of The Same Types of Events
No ratings yet
Whether They Have A Natural Order - Whether They Are Recurrences of The Same Types of Events
40 pages
Estimating The Shape Parameter of The Exponential-Weibull Distribution Using Bayesian Technique
No ratings yet
Estimating The Shape Parameter of The Exponential-Weibull Distribution Using Bayesian Technique
11 pages
El 31 4 01
No ratings yet
El 31 4 01
10 pages
Key Probability Concepts in Engineering: Complement Rule
No ratings yet
Key Probability Concepts in Engineering: Complement Rule
3 pages
A Crash Course in Statistics - Handouts
No ratings yet
A Crash Course in Statistics - Handouts
46 pages
Six Sigma Green Belt Prep
No ratings yet
Six Sigma Green Belt Prep
6 pages
Confidence Limits For Lognormal Percentiles and For Lognormal Mean - 2011
No ratings yet
Confidence Limits For Lognormal Percentiles and For Lognormal Mean - 2011
15 pages
Samples Research
No ratings yet
Samples Research
12 pages
HWK3 324
No ratings yet
HWK3 324
9 pages
Oxford Three Notes
No ratings yet
Oxford Three Notes
40 pages
STA 2426 E Threshold Selection Methods
No ratings yet
STA 2426 E Threshold Selection Methods
6 pages
A079807 PDF
No ratings yet
A079807 PDF
23 pages
A New Nadarajah-Haghighi Distribution With Applica
No ratings yet
A New Nadarajah-Haghighi Distribution With Applica
27 pages
Equipment Failure Rate Updating
No ratings yet
Equipment Failure Rate Updating
5 pages
Unit 6 Input Modeling: Collect Data From The Real System of Interest
No ratings yet
Unit 6 Input Modeling: Collect Data From The Real System of Interest
7 pages
07 One Sample Numerical
No ratings yet
07 One Sample Numerical
42 pages
Homework 2: QRA in Safety Engineering
No ratings yet
Homework 2: QRA in Safety Engineering
12 pages
Topic 2 - Probability Distribution
No ratings yet
Topic 2 - Probability Distribution
33 pages
Lect 23
No ratings yet
Lect 23
17 pages
Statistical Methods for Informed Decisions
100% (2)
Statistical Methods for Informed Decisions
4 pages
Parameter Estimation For The Two-Parameter Weibull Distribution
No ratings yet
Parameter Estimation For The Two-Parameter Weibull Distribution
108 pages
Final Cheat Sheet 2
No ratings yet
Final Cheat Sheet 2
4 pages
STAM Formula Sheet
100% (3)
STAM Formula Sheet
4 pages
A Comparison of Methods For The Estimation of Weibull Distribution Parameters
No ratings yet
A Comparison of Methods For The Estimation of Weibull Distribution Parameters
14 pages
Geography - Climatology and Hydrology
No ratings yet
Geography - Climatology and Hydrology
48 pages
Quality of Analytical Measurements: Statistical Methods For Internal Validation
No ratings yet
Quality of Analytical Measurements: Statistical Methods For Internal Validation
60 pages
Checking Model Assumptions
No ratings yet
Checking Model Assumptions
4 pages
1 Ayo
No ratings yet
1 Ayo
8 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
SAHADEB - Categorical - Data - LECTURES 1 - Part 2
No ratings yet
SAHADEB - Categorical - Data - LECTURES 1 - Part 2
108 pages
An Lisis de Datos Aplicada A La Gesti N de Activos
No ratings yet
An Lisis de Datos Aplicada A La Gesti N de Activos
75 pages
EP Curve Shape Determinants in Reinsurance
No ratings yet
EP Curve Shape Determinants in Reinsurance
26 pages
Ch4.1 Stochastic Hydrology
No ratings yet
Ch4.1 Stochastic Hydrology
34 pages
Statistical Terms Glossary
No ratings yet
Statistical Terms Glossary
7 pages
P 4 C 07
No ratings yet
P 4 C 07
14 pages
Introduction To Weibull Analysis Ver4
100% (5)
Introduction To Weibull Analysis Ver4
113 pages
Statistical Analysis Illustrated - Foundations
No ratings yet
Statistical Analysis Illustrated - Foundations
91 pages
Stat 101
No ratings yet
Stat 101
21 pages
QAB II Lecture Notes2021
No ratings yet
QAB II Lecture Notes2021
91 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
Mediation Analysis: Estimation & Testing
No ratings yet
Mediation Analysis: Estimation & Testing
77 pages
Grade 6 Conversion Metric Length F
No ratings yet
Grade 6 Conversion Metric Length F
2 pages
Discrete Random Variable Distributions
No ratings yet
Discrete Random Variable Distributions
9 pages
Hall Quiz 1 - Empirical Phase: Correct
No ratings yet
Hall Quiz 1 - Empirical Phase: Correct
16 pages
Assignment 2 - IUK108
No ratings yet
Assignment 2 - IUK108
8 pages
GRADE 12 - Print Players - Quizizz
No ratings yet
GRADE 12 - Print Players - Quizizz
22 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Hypothesis Testing in Nursing Examples
No ratings yet
Hypothesis Testing in Nursing Examples
61 pages
Assignment-5 Noc18 Ma07 15
100% (1)
Assignment-5 Noc18 Ma07 15
6 pages
Notes of Economics
No ratings yet
Notes of Economics
9 pages
Statistics and Probability Theory Summary and Answer of Exercises
No ratings yet
Statistics and Probability Theory Summary and Answer of Exercises
120 pages
Intro Bayes Inference in Psychology
No ratings yet
Intro Bayes Inference in Psychology
25 pages
2022 Nov Algebra 2
No ratings yet
2022 Nov Algebra 2
2 pages
'Engineering Data Analysis (Probability and Statistics)
100% (1)
'Engineering Data Analysis (Probability and Statistics)
2 pages
Statif - 2 - Slides Probability I
No ratings yet
Statif - 2 - Slides Probability I
14 pages
STA 342-TH9-Correlation and Regression
No ratings yet
STA 342-TH9-Correlation and Regression
13 pages
Complete Statistics Probability Topics
No ratings yet
Complete Statistics Probability Topics
4 pages
ASTM E 178 Dealing With Outlying Observations
100% (2)
ASTM E 178 Dealing With Outlying Observations
11 pages
Introduction To Econometrics, 5 Edition: Review: Random Variables, Sampling, Estimation, and Inference
No ratings yet
Introduction To Econometrics, 5 Edition: Review: Random Variables, Sampling, Estimation, and Inference
5 pages
Advanced Statistics Assessment Test
No ratings yet
Advanced Statistics Assessment Test
2 pages
Hersheys Kiss Activity
No ratings yet
Hersheys Kiss Activity
3 pages
Confidence Interval Exam Questions
No ratings yet
Confidence Interval Exam Questions
5 pages
AS-Level Maths:: Statistics 1
0% (1)
AS-Level Maths:: Statistics 1
32 pages
Menstruasi dan Infertilitas: Analisis Chi-Square
No ratings yet
Menstruasi dan Infertilitas: Analisis Chi-Square
7 pages
Sampling Distribution & CLT Guide
No ratings yet
Sampling Distribution & CLT Guide
16 pages
1 Statistics and Probability g11 Quarter 4 Module 1 Test of Hypothesis
No ratings yet
1 Statistics and Probability g11 Quarter 4 Module 1 Test of Hypothesis
19 pages
CC9 Chi Square
No ratings yet
CC9 Chi Square
24 pages
Probability (Nexus) - Practice Sheet
No ratings yet
Probability (Nexus) - Practice Sheet
22 pages

Risk Assessment Methodology Guide

Uploaded by

Risk Assessment Methodology Guide

Uploaded by

6

Discussion and Recommendations

Discussion and Recommendations

Research Triangle Institute

Application of Methodology to Other Exposure Factors

Research Triangle Institute

Discussion and Recommendations

6.2.1 Case 1: Percentile Data

# Models: 12-model hierarchy based on generalized F with point mass at zero

Discussion and Recommendations

Research Triangle Institute

6.2.2 Case 2: Three to Five Statistics Available

# Models: two-parameter gamma, lognormal, and Weibull

Research Triangle Institute

Discussion and Recommendations

6.2.3 Case 3: Two Statistics Available

# Models: two-parameter gamma, lognormal, and Weibull

Discussion and Recommendations

Research Triangle Institute

6.2.4 Case 4: At Most, One Statistic Available

6.2.5 Topics for Future Research

Research Triangle Institute

Discussion and Recommendations

You might also like