0% found this document useful (0 votes)

18 views19 pages

Ann 7

The document outlines a methodology for developing data-driven models, emphasizing the importance of high-quality input data and the detection and removal of outliers to enhance model performance. It discusses various techniques for outlier detection, model selection, and validation, highlighting the need for proper alignment of data and the selection of appropriate parameters during model building. Additionally, it addresses the necessity of regular model tuning and adaptation to maintain performance over time in industrial applications.

Uploaded by

sklahiri70

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views19 pages

Ann 7

Uploaded by

sklahiri70

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Model development

methodology
Data-driven models are
constructed using a huge
Data amount of historical data
gathering
and
The quality of the produced
examination model is determined by the
quality of the input data
There is a spike or an
extremely high value in the
data
Pre-
processing Noise is present in data as a
and result of the process or from
measurement transmitters
conditioning
of data
It's possible that data will
have missing values or that
values will be frozen
Outliers are observations that do not
match the majority of the data, such as
missing data points and observations that
differ greatly from normal values
Detecting Outlier process data can be caused by a
and variety of factors, including the failure of
process equipment or measurement
replacing transmitters, the failure of a data collection
system, and so on
outliers Outlier detection and removal from data
sets is crucial for soft sensor development,
as undetected outliers have a negative
impact on the final soft sensor model's
performance
Outlier detection
using a univariate
technique
Hampel identifier and 3 edit rule are two popular
univariate approaches for detecting outliers
PCA is a multivariate statistical method for reducing
data dimensionality by projecting the data matrix onto a
lower dimensional space using loading vectors

The loading vectors corresponding to the k greatest

Outlier eigenvalues are capable of capturing data variations
and hence contain the majority of the data

detection The residual matrix and Q statistics, which indicate the

using a distance of a sample from the PCA model's space, can
be used to calculate the fitness between data and
model
multivariate The distance between a given data point and the
multivariate mean of the data is indicated by Hotelling's
approach T2 statistics, which provides a measure of variability
within the normal subspace
Outliers are detected using a combination of Q and T2
tests
Measurements with Q or T2
values over the threshold are
Outlier classed as outliers, based on
the significance level for the Q
detection and T2 statistics
using a
multivariate Outside of the 99 percent
approach confidence ellipse are outliers
Selection of
relevant input
output variables

• Choosing relevant input is an important step

in modelling the input–output relationship in
a process model
ANN modelling is frequently used in multivariate
systems that have multiple operating sample
rates

Normally, product quality parameters are

measured offline in a laboratory or by an online
analyzer with a long dead time in many
industrial chemical processes
Assemble Every second or minute, input variables such as
temperature and pressure are measured and
data recorded

As a result, the data must be aligned in the

proper time scale

It is critical that laboratory data be properly time

stamped and aligned with other continuous data
on a consistent time scale
This is the most important step in
creating an ANN model
Selection,
Because the model is at the heart of
training, ANN, selecting the right model is
and crucial to its success
validation of Model developer need to give the
model following parameters as user input
during ANN model building phase
parameters • Number of nodes in hidden layer
• type of activation function in input layer and
output layer and
• algorithms for weight up gradation etc
There are lot of options available for model
selection but no clear cut guidelines are
available to select which model at what
conditions
Most of the cases, type of model is
selected by the developer is based on his
Model personal choice and expertise

selection This can be very detrimental to develop a

good model

The best approach is to remain open

minded for all the model types
The good practise is to start with a simple
model type with less number of nodes in
hidden and gradually increase model
complexity as long as significant
improvement in the model’s performance
can be observed
Best
practices During model building phase, performance
of individual model can be judged by unseen
for model validation data

selection
The same approach can also be applied to
the parameters selection of the pre-
processing methods like for instance
variable selection
Normally data driven models need large amount of
data which is usually available in modern industry

However, in some instances where lab data is used,

Cross may be very small amount of data is available

Additionally, for some industrial processes where

validation there is few reliable lab data is available, statistical
error-estimation techniques like K-fold cross-
validation can be applied
This method makes an optimal uses of the available
data by partitioning it in such a way that all of the
samples are used for the model performance
validation
After finding the optimal model
structure and training the
model, the trained ANN model
performance has to judge on
Model new validation data set once
again
Performanc
Mean Squared Error , which
e measures the average square
distance between the predicted
and the correct value is most
popular performance
evaluation techniques for
model
Another way of performance judgement
is using visual representation of the
predictions

Model In these, the four-plot analysis is a useful

tool since it provides useful information

Performanc about the relation between the

predictions and the correct values
together with the analysis of the
e prediction residuals

A disadvantage of the visual methods is

that they require an assistance of the
model developer and the final decision if
the model performs adequately, is up to
the subjective judgement of the model
developer
To evaluate that the developed
model has some resemble with
the underlying physics of the
process
Important
criteria Many model experts stress the
necessity for the application of
process knowledge during the
ANN model development
phase
Model acceptance
and model tuning
After developing ANN model, the model is put
on test in offline mode to see how model
prediction matches with fresh data currently
generated in DCS
If model prediction closely matched with the
actual output, then model is accepted in
industry

Usual criteria is average prediction error

should be less than 1% with R2 value greater
than 0.95
It is very common in industry that
the performance of ANN model
deteriorates over time

Model
performanc Underlying process
may change
Measuring

e Reasons are many transmitters data

may drift
analyzer reading may

deterioratio
change due to
recalibration etc

n All of these can cause the

performance of the ANN model to
deteriorate and have to be
compensated for by adapting or
re-developing the model
ANN model is to be maintained and tuned
on a regular basis

Model In literature researchers tried various

adaptive approaches to update the model
based on its performance
tuning and
model Neural model is updated every six months
with fresh current data when it is found that
update present model prediction capability
deteriorates over time

Most of these auto model update methods

still limited to research publications and
very few is really applied in actual industry

Machine Learning Essentials Guide
No ratings yet
Machine Learning Essentials Guide
33 pages
Data Mining Techniques and Models
No ratings yet
Data Mining Techniques and Models
43 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
Lecture 2 - The Data Science Process
No ratings yet
Lecture 2 - The Data Science Process
30 pages
Data Preprocessing Before Classification: Presented by
No ratings yet
Data Preprocessing Before Classification: Presented by
23 pages
Neural Networks for System Optimization
No ratings yet
Neural Networks for System Optimization
10 pages
Ai - Foundations of Machine Learning III
No ratings yet
Ai - Foundations of Machine Learning III
98 pages
Unit6 Part3 General Procedure
No ratings yet
Unit6 Part3 General Procedure
19 pages
Module 2-b Prediction Methods and Models-Data Preperation
No ratings yet
Module 2-b Prediction Methods and Models-Data Preperation
26 pages
Business Analytics
No ratings yet
Business Analytics
14 pages
Introduction To Predictive Analytics: UNIT-1
No ratings yet
Introduction To Predictive Analytics: UNIT-1
14 pages
CH 3
No ratings yet
CH 3
33 pages
Capstone Project
No ratings yet
Capstone Project
28 pages
AI & ML Interview Preparation
No ratings yet
AI & ML Interview Preparation
15 pages
Unit 2
No ratings yet
Unit 2
19 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
Dimensionality Reduction & Model Evaluation
No ratings yet
Dimensionality Reduction & Model Evaluation
80 pages
Data Cleaning & Predictive Modeling Guide
No ratings yet
Data Cleaning & Predictive Modeling Guide
26 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
37 pages
Overview of Data Mining Process
No ratings yet
Overview of Data Mining Process
43 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
L3 Overview of ML Model Development Lifecycle-1
No ratings yet
L3 Overview of ML Model Development Lifecycle-1
30 pages
Sent-Machine Learning For Data Science
100% (1)
Sent-Machine Learning For Data Science
463 pages
AI Notes
No ratings yet
AI Notes
12 pages
Model Evaluation
No ratings yet
Model Evaluation
39 pages
Unit Iii
No ratings yet
Unit Iii
67 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
7 Data Preprocessing Steps in Machine Learning
No ratings yet
7 Data Preprocessing Steps in Machine Learning
5 pages
ML Performance Improvement Cheatsheet
No ratings yet
ML Performance Improvement Cheatsheet
11 pages
Model Performance Evaluation Guide
No ratings yet
Model Performance Evaluation Guide
5 pages
Development and Deployment Setup: Data Collection
No ratings yet
Development and Deployment Setup: Data Collection
8 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Data Preparation for Data Science
No ratings yet
Data Preparation for Data Science
57 pages
Big Data Analytics: Data Prep
No ratings yet
Big Data Analytics: Data Prep
58 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Ids Unit1
No ratings yet
Ids Unit1
3 pages
Fraud Detection Model for Motor Claims
No ratings yet
Fraud Detection Model for Motor Claims
13 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Lecture 1 Introduction PM
No ratings yet
Lecture 1 Introduction PM
21 pages
Data Preprocessing Techniques in ML
No ratings yet
Data Preprocessing Techniques in ML
23 pages
Unit 2
No ratings yet
Unit 2
48 pages
BA Full Note 1
No ratings yet
BA Full Note 1
183 pages
Unit 2
No ratings yet
Unit 2
19 pages
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
No ratings yet
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
35 pages
C1000-177 STU SGC1000177v2
No ratings yet
C1000-177 STU SGC1000177v2
9 pages
Anomaly Detection Techniques Explained
No ratings yet
Anomaly Detection Techniques Explained
68 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
No ratings yet
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
96 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
4 pages
SML Updated UNIT-2
No ratings yet
SML Updated UNIT-2
43 pages
308 PDF
No ratings yet
308 PDF
15 pages
Matrix Questions and Important Algorithms
No ratings yet
Matrix Questions and Important Algorithms
22 pages
Relations and Functions: Reflexivity, Symmetry, Transitivity
No ratings yet
Relations and Functions: Reflexivity, Symmetry, Transitivity
52 pages
Science: Quarter 3: Week 4 Learning Activity Sheets
No ratings yet
Science: Quarter 3: Week 4 Learning Activity Sheets
6 pages
Installation Guide For Buried Pipes
100% (1)
Installation Guide For Buried Pipes
76 pages
Algorithms and Flowcharts Overview
No ratings yet
Algorithms and Flowcharts Overview
23 pages
7 Path Profile
No ratings yet
7 Path Profile
19 pages
Chapter 4
No ratings yet
Chapter 4
14 pages
Amravati TPP BTG Interlocks Guide
100% (5)
Amravati TPP BTG Interlocks Guide
62 pages
Anticipating Correlations A New Paradigm For Risk Management Robert Engle PDF Download
No ratings yet
Anticipating Correlations A New Paradigm For Risk Management Robert Engle PDF Download
43 pages
SS2 3RD Term Mathematics
No ratings yet
SS2 3RD Term Mathematics
73 pages
CSE Applied Physics Exam Questions Guide
No ratings yet
CSE Applied Physics Exam Questions Guide
2 pages
GPT-3 for Low-Data Chemistry Discovery
No ratings yet
GPT-3 for Low-Data Chemistry Discovery
99 pages
Multimedia Systems Overview
No ratings yet
Multimedia Systems Overview
24 pages
Rakit PC Anda - Komputer Medan Toko Komputer Gaming Terlengkap
No ratings yet
Rakit PC Anda - Komputer Medan Toko Komputer Gaming Terlengkap
3 pages
The Higher The Frequency
No ratings yet
The Higher The Frequency
17 pages
Total Protection Alarm System: MODEL: SC-100
No ratings yet
Total Protection Alarm System: MODEL: SC-100
16 pages
Sap PP Integration Flow
67% (3)
Sap PP Integration Flow
2 pages
Brushless DC Motor
No ratings yet
Brushless DC Motor
24 pages
Posting Control and Negative Posting Guide
No ratings yet
Posting Control and Negative Posting Guide
2 pages
Machine Design Board Exam
100% (1)
Machine Design Board Exam
13 pages
Week 1 - Webinar Slides - 1 Per Slide
No ratings yet
Week 1 - Webinar Slides - 1 Per Slide
38 pages
Mercedes-Benz WDB2110411B324751 AllSystemDTC 20250722141937
No ratings yet
Mercedes-Benz WDB2110411B324751 AllSystemDTC 20250722141937
6 pages
Enzymology
No ratings yet
Enzymology
17 pages
Econometric Analysis Seminar
No ratings yet
Econometric Analysis Seminar
7 pages
Test Report: SGS-CSTC Standards Technical Services Co., Ltd. Shenzhen Branch
No ratings yet
Test Report: SGS-CSTC Standards Technical Services Co., Ltd. Shenzhen Branch
44 pages
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
100% (1)
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
55 pages
Smart Graphics Sampler
No ratings yet
Smart Graphics Sampler
9 pages
Emb Vlsi
No ratings yet
Emb Vlsi
4 pages
Electrical Installations - Numbers & Vocabulary Worksheet (A1-A2)
No ratings yet
Electrical Installations - Numbers & Vocabulary Worksheet (A1-A2)
4 pages

Ann 7

Uploaded by

Ann 7

Uploaded by

Model development

The loading vectors corresponding to the k greatest

detection The residual matrix and Q statistics, which indicate the

• Choosing relevant input is an important step

Normally, product quality parameters are

As a result, the data must be aligned in the

It is critical that laboratory data be properly time

selection This can be very detrimental to develop a

The best approach is to remain open

However, in some instances where lab data is used,

Additionally, for some industrial processes where

Model In these, the four-plot analysis is a useful

Performanc about the relation between the

A disadvantage of the visual methods is

Usual criteria is average prediction error

e Reasons are many transmitters data

n All of these can cause the

Model In literature researchers tried various

Most of these auto model update methods

You might also like