0% found this document useful (0 votes)
18 views19 pages

Ann 7

The document outlines a methodology for developing data-driven models, emphasizing the importance of high-quality input data and the detection and removal of outliers to enhance model performance. It discusses various techniques for outlier detection, model selection, and validation, highlighting the need for proper alignment of data and the selection of appropriate parameters during model building. Additionally, it addresses the necessity of regular model tuning and adaptation to maintain performance over time in industrial applications.

Uploaded by

sklahiri70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

Ann 7

The document outlines a methodology for developing data-driven models, emphasizing the importance of high-quality input data and the detection and removal of outliers to enhance model performance. It discusses various techniques for outlier detection, model selection, and validation, highlighting the need for proper alignment of data and the selection of appropriate parameters during model building. Additionally, it addresses the necessity of regular model tuning and adaptation to maintain performance over time in industrial applications.

Uploaded by

sklahiri70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Model development

methodology
Data-driven models are
constructed using a huge
Data amount of historical data
gathering
and
The quality of the produced
examination model is determined by the
quality of the input data
There is a spike or an
extremely high value in the
data
Pre-
processing Noise is present in data as a
and result of the process or from
measurement transmitters
conditioning
of data
It's possible that data will
have missing values or that
values will be frozen
Outliers are observations that do not
match the majority of the data, such as
missing data points and observations that
differ greatly from normal values
Detecting Outlier process data can be caused by a
and variety of factors, including the failure of
process equipment or measurement
replacing transmitters, the failure of a data collection
system, and so on
outliers Outlier detection and removal from data
sets is crucial for soft sensor development,
as undetected outliers have a negative
impact on the final soft sensor model's
performance
Outlier detection
using a univariate
technique
Hampel identifier and 3 edit rule are two popular
univariate approaches for detecting outliers
PCA is a multivariate statistical method for reducing
data dimensionality by projecting the data matrix onto a
lower dimensional space using loading vectors

The loading vectors corresponding to the k greatest


Outlier eigenvalues are capable of capturing data variations
and hence contain the majority of the data

detection The residual matrix and Q statistics, which indicate the


using a distance of a sample from the PCA model's space, can
be used to calculate the fitness between data and
model
multivariate The distance between a given data point and the
multivariate mean of the data is indicated by Hotelling's
approach T2 statistics, which provides a measure of variability
within the normal subspace
Outliers are detected using a combination of Q and T2
tests
Measurements with Q or T2
values over the threshold are
Outlier classed as outliers, based on
the significance level for the Q
detection and T2 statistics
using a
multivariate Outside of the 99 percent
approach confidence ellipse are outliers
Selection of
relevant input
output variables

• Choosing relevant input is an important step


in modelling the input–output relationship in
a process model
ANN modelling is frequently used in multivariate
systems that have multiple operating sample
rates

Normally, product quality parameters are


measured offline in a laboratory or by an online
analyzer with a long dead time in many
industrial chemical processes
Assemble Every second or minute, input variables such as
temperature and pressure are measured and
data recorded

As a result, the data must be aligned in the


proper time scale

It is critical that laboratory data be properly time


stamped and aligned with other continuous data
on a consistent time scale
This is the most important step in
creating an ANN model
Selection,
Because the model is at the heart of
training, ANN, selecting the right model is
and crucial to its success
validation of Model developer need to give the
model following parameters as user input
during ANN model building phase
parameters • Number of nodes in hidden layer
• type of activation function in input layer and
output layer and
• algorithms for weight up gradation etc
There are lot of options available for model
selection but no clear cut guidelines are
available to select which model at what
conditions
Most of the cases, type of model is
selected by the developer is based on his
Model personal choice and expertise

selection This can be very detrimental to develop a


good model

The best approach is to remain open


minded for all the model types
The good practise is to start with a simple
model type with less number of nodes in
hidden and gradually increase model
complexity as long as significant
improvement in the model’s performance
can be observed
Best
practices During model building phase, performance
of individual model can be judged by unseen
for model validation data

selection
The same approach can also be applied to
the parameters selection of the pre-
processing methods like for instance
variable selection
Normally data driven models need large amount of
data which is usually available in modern industry

However, in some instances where lab data is used,


Cross may be very small amount of data is available

Additionally, for some industrial processes where


validation there is few reliable lab data is available, statistical
error-estimation techniques like K-fold cross-
validation can be applied
This method makes an optimal uses of the available
data by partitioning it in such a way that all of the
samples are used for the model performance
validation
After finding the optimal model
structure and training the
model, the trained ANN model
performance has to judge on
Model new validation data set once
again
Performanc
Mean Squared Error , which
e measures the average square
distance between the predicted
and the correct value is most
popular performance
evaluation techniques for
model
Another way of performance judgement
is using visual representation of the
predictions

Model In these, the four-plot analysis is a useful


tool since it provides useful information

Performanc about the relation between the


predictions and the correct values
together with the analysis of the
e prediction residuals

A disadvantage of the visual methods is


that they require an assistance of the
model developer and the final decision if
the model performs adequately, is up to
the subjective judgement of the model
developer
To evaluate that the developed
model has some resemble with
the underlying physics of the
process
Important
criteria Many model experts stress the
necessity for the application of
process knowledge during the
ANN model development
phase
Model acceptance
and model tuning
After developing ANN model, the model is put
on test in offline mode to see how model
prediction matches with fresh data currently
generated in DCS
If model prediction closely matched with the
actual output, then model is accepted in
industry

Usual criteria is average prediction error


should be less than 1% with R2 value greater
than 0.95
It is very common in industry that
the performance of ANN model
deteriorates over time

Model
performanc Underlying process
may change
Measuring

e Reasons are many transmitters data


may drift
analyzer reading may

deterioratio
change due to
recalibration etc

n All of these can cause the


performance of the ANN model to
deteriorate and have to be
compensated for by adapting or
re-developing the model
ANN model is to be maintained and tuned
on a regular basis

Model In literature researchers tried various


adaptive approaches to update the model
based on its performance
tuning and
model Neural model is updated every six months
with fresh current data when it is found that
update present model prediction capability
deteriorates over time

Most of these auto model update methods


still limited to research publications and
very few is really applied in actual industry

You might also like