0% found this document useful (0 votes)

67 views17 pages

NLP Models and Epistemology in AI

Models* provide a way to solve real-world problems safely and efficiently. They are an important method of analysis which is easily verified, communicated, and understood. We use them when conducting experiments on a real system is impossible or impractical, often because of cost or time [AnyLogic]

Uploaded by

mahdi.fa97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views17 pages

NLP Models and Epistemology in AI

Uploaded by

mahdi.fa97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Research and Application Workshop

AI4SE & SE4AI

NLP and Knowledge Engineering
to extract models from text

Carlo Lipizzi – [email protected]

September 2022
The context

 Models* provide a way to solve real-world problems safely and efficiently. They are an important
method of analysis which is easily verified, communicated, and understood. We use them when
conducting experiments on a real system is impossible or impractical, often because of cost or time
[AnyLogic]
 Models are never as good as the reality. They are as good as their representation of the system they
model
 They have embedded the knowledge we have of the system we want to represent
 The more accurate and comprehensive is the knowledge and its representation, the more accurate
=> useful the model is
 This has merit in traditional modeling as well as AI/ML models and applications as digital twins

*model is a physical, mathematical or logical representation of a system, entity, phenomenon, or process [SYS 611]

2
The study of Knowledge - Epistemology

 Epistemological questions explore the nature of knowledge

Ask how someone has come to know something, inquire into the scope and
limits of knowledge or try to discover the degree of certainty attached to
particular knowledge

 E.g.: the stick that appears to bend in the water

Use the knowledge of science to rationalize that the stick is not bent but it is the
refraction of light in the water that makes it look that way

But the epistemologist might ask how do you really know that the stick does not
actually bend in the water

3
Why epistemology is relevant for AI?

 We cannot represent what we don’t know

 If we don’t/cannot fully “know”, we should have an approximation of the knowledge,
supported by theoretical and empirical evidences, possibly knowing its limitations
 Epistemology is studying knowledge, advocating models to represent it
 We should use epistemology models as input for the mathematical models to be used to
write the code for our AI systems
 Without a solid framework for representing knowledge, we may face the risk of a new “AI
winter”

4
Philosophy and AI/ML

Philosophy AI/ML
 Rationalists  Symbolic Reasoning – traditional AI
- Believe that knowledge comes from - Using preset symbolic structure to get
exercising the human ability to reason. knowledge about a given problem.
Reason not only enables people to know Taxonomies, ontologies, rules (IF/THEN)
things that the senses do not reveal but are examples
it is also the primary source of
knowledge. Plato and Descartes were
rationalists

 Empiricists  Data Driven – Machine Learning

- Believe that knowledge comes from - Applying algorithms to large collection of
experience. This is evidence provided by data “describing” the reality to be
the senses represented. This is in line with advanced
statistical models, centered on pattern
recognition

5
Natural Language as source of Data
 85-90 percent of all corporate data is in some kind of unstructured form, such
as text and multimedia [Gartner, 2019]
 Tapping into these information sources is a need to stay competitive

Source: m-files.com

 Text conveys a great portion of the knowledge people have about a given
domain

6
Implementing NLP/NLU

 Language is changing constantly, and NLP is following the changes, going from processing
based on predefined structures (taxonomies/ontologies, syntax) to structures deducted
from the text itself

Limitations of the traditional- Machine Learning/inductive approach

deductive-”symbolic” approach  Extracting a numerical structure from
 Today, language is more fragmented, text
has less structure, has more jargons  Different structures for different points
 Different points of view may provide of view
different interpretations  Different structures automatically
extracted over time

7
Testing the two approaches
 In order to compare the 2 approaches, we defined 2 tasks:
- Named Entity Recognition (NER)
- Semantic Role Labeling (SRL). We are using SPO semantic triples (subject, predicate, object)
We selected those 2 tasks because they are essential building block of most of the models
 To make the comparison more accurate, we are working on 2 types of documents:
- General purpose
- Domain specific
For each one of the 2 types, we are analyzing a longer and smaller documents

Documents Total #sentences Total #words

Long_generic (HP) 6480 77290

Short_generic (NYtimes) 54 1121

Long_domain-specific (Neurology) 13719 235093

Short_domain-specific (Brain Inflammation) 712 8464

8
The tools

Symbolic approach
 spaCy: To create customized training dataset for data-driven approach
 coreNLP: To extract NEs and SPO triples
 NLTK: We use nltk.wordnet to capture taxonomy structure (hypernym-hyponym) of
extracted named entity and SPO semantic triples

Data-driven approach
 XLNet: this is a Transformer-based generalized NLP tool (Generalized Autoregressive
Pretraining for Language Understanding)

9
Preliminary results

 Evaluating semantic accuracy in text needs to have humans involved. Automatic evaluation would
be affected by the bias in the annotated text used for the evaluation
 So far, we run NER with both the approaches, SRL with the symbolic one only
 The data-driven approach has very good results on large generic datasets, poor results on domain-
specific and smaller in particular. This is because those models use as semantic base for pre-
training large generic texts. It could be possible to use customized semantic bases, but they should
be sizable. Pre-training on reasonable large datasets could take weeks of computing time
 The symbolic approach is as good as the “symbols” we use. We used general ones, with good
results on the large generic documents (but worst than the data-driven), good on small generic
documents (better than the data-driven). Inconclusive the results on the domain-specific

10
Results interpretation

 The data-driven approach is using a “mechanical” approach to semantic that does not reflect the
way we develop and use our knowledge. The underlying theoretical method is correlation/pattern
recognition, but we reason in more complex ways. The complexity of the algorithms inside those
models makes understanding what is going inside practically impossible: the vast number of layers
of the neural networks inside would require a memory of the status of each layer and this is not
there. In theory, this approach could be totally unsupervised (with pre-definable bias), but the cost
of pre-training makes this option non applicable
 The symbolic approach is as good as the symbols (taxonomies, rules, meta languages) that are
used. Symbols are domain-specific and may change in time. This is a fully supervised approach
 What is missing is a model representing the knowledge, able to use algorithms and approaches as
its components

11
Moving forward

 While we will complete the comparison of the 2 traditional approaches (data-driven and symbolic),
we will introduce a method based on a knowledge representation model we developed (the “room
theory”) and using graph theory
 The ”room theory” is a framework to address the relativity of the point of view by providing a
computational representation of the context
 The non computational theory was first released as “schema theory” by Sir Frederic Bartlett (1886–
1969) and revised for AI applications as “framework theory” by Marvin Minsky (mid ‘70)
 For instance, when we enter a physical room, we instantly know if it is a bedroom, a bathroom, or a
living room
 Rooms/schemata/frameworks are mental frameworks we use to organize remembered information
and represent an individuals/domain-specific view of the reality

12
How the “room theory” works
“Room”: Domain-specific
1 Knowledge base  “Room theory” enables the use of context-subjectivity in the analysis
of the incoming documents
using  Context-subjectivity can be the point of of view of a subject matter
expert
 The context-subjectivity in the analysis is represented by a domain
2 3 specific numerical knowledge base, created from a large domain
compared with
specific & representative corpus that is then transformed into a
List of n-grams
“Benchmarks”:
to analyze
numerical dataset (“embeddings table”)
Keywords defining
target elements
Proximity of each
element in the list to
keywords

 The key components are:

1. A point of view for the comparison (the “room”). This is represented by the embeddings table extracted
from a large/representative corpus from the specific domain
2. A list of “extended” keywords (using synonyms and misspellings) to be used for the analysis (the
”benchmark”)

13
Our Approach – putting things together
Cleaning and
n-gramming “Benchmarks”: keywords
defining the domain
 The flow on top provides
adjustments based on domain-
specific knowledge
using
Corpus/body of “Room”, representing the
knowledge of the industry knowledge of the domain Compare keywords
with nodes

 The flow at the bottom is the actual

workflow to get NERs (from the clusters)
and SRLs (from the network)
Cleaning and Generating the
n-gramming semantic Generating
Generating the
network individual
clusters/topics
Causal Chains

Vectorized
System Documentation Documentation
Generating
Global Causal
Chain

14
Thank you!

Dr. Carlo Lipizzi

[email protected]

Shiyu Yuan – PhD Candidate

[email protected]

15
Our Approach – putting things together
 We prune the list of ngrams using the room theory
 We create ego networks for the “subjects”. The degrees of separation is function of the size
of the cluster
 The ego networks represent the semantic dependency between the nodes within the topics
 The approach can be extended to inter-clusters relations to recreate the complete formal
representation

16
How we use it so far
 We used it to determine the causal chain in the domain of technologies
 Each technology has “components”, that are other technologies required for the
first one. For example, cell. phones <- batteries, display, antennas, …

 The model has been

partially implemented in
WRT-1010 “Meshing
Capability and Threat-
based Science &
Technology Resource
Allocation”

Knowledge Will Propel Machine Understanding of Content: Extrapolating From Current Examples
No ratings yet
Knowledge Will Propel Machine Understanding of Content: Extrapolating From Current Examples
17 pages
7 MachineLearningBasics
No ratings yet
7 MachineLearningBasics
46 pages
System Paradigms in NLP
No ratings yet
System Paradigms in NLP
8 pages
Artificial Intelligence and The Internet: Edward Brent Theodore Carnahan
No ratings yet
Artificial Intelligence and The Internet: Edward Brent Theodore Carnahan
32 pages
Analyzing Big Data with Computational Linguistics
No ratings yet
Analyzing Big Data with Computational Linguistics
29 pages
Lect # 3 Knowledge Representation Techniques
No ratings yet
Lect # 3 Knowledge Representation Techniques
45 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
21 pages
Fai Solution S PDF
No ratings yet
Fai Solution S PDF
11 pages
Machine Learning in Semantic Analysis
No ratings yet
Machine Learning in Semantic Analysis
24 pages
AI Knowledge Representation Guide
No ratings yet
AI Knowledge Representation Guide
25 pages
Week10 Reasoning
No ratings yet
Week10 Reasoning
25 pages
Paper Review
No ratings yet
Paper Review
31 pages
Glossary
No ratings yet
Glossary
3 pages
AI Logic & Knowledge Systems
No ratings yet
AI Logic & Knowledge Systems
101 pages
A.I. Lecture 3
No ratings yet
A.I. Lecture 3
58 pages
Bcse306l Ai Module-4 Smsatapathy
No ratings yet
Bcse306l Ai Module-4 Smsatapathy
101 pages
AI and Expert System
No ratings yet
AI and Expert System
18 pages
Overview of AI Capabilities and Risks
No ratings yet
Overview of AI Capabilities and Risks
27 pages
2403 11996v3-2
No ratings yet
2403 11996v3-2
85 pages
Session 6
No ratings yet
Session 6
19 pages
Lec-1 Knowledge Representation
No ratings yet
Lec-1 Knowledge Representation
27 pages
AI ct-1
No ratings yet
AI ct-1
8 pages
Answer
No ratings yet
Answer
4 pages
3-Knowledge Representation and Reasoning
No ratings yet
3-Knowledge Representation and Reasoning
9 pages
DVT U4 My Notes
No ratings yet
DVT U4 My Notes
15 pages
LLM-Powered Natural Language Text Processing For O
No ratings yet
LLM-Powered Natural Language Text Processing For O
14 pages
4 Knowledge Representation
No ratings yet
4 Knowledge Representation
29 pages
DL Unit-V
No ratings yet
DL Unit-V
23 pages
02.137DH Notes
No ratings yet
02.137DH Notes
16 pages
Knowledge Representation Methods
No ratings yet
Knowledge Representation Methods
26 pages
AI-first Module Notes-1
No ratings yet
AI-first Module Notes-1
10 pages
Generative AI Content
No ratings yet
Generative AI Content
7 pages
Knowledge Representation
No ratings yet
Knowledge Representation
13 pages
Unit - 4 AI
No ratings yet
Unit - 4 AI
75 pages
Expert Systems
No ratings yet
Expert Systems
29 pages
Index of Terms
No ratings yet
Index of Terms
28 pages
AI Unit-3
No ratings yet
AI Unit-3
44 pages
Instance and ISA Relationships in AI
No ratings yet
Instance and ISA Relationships in AI
25 pages
Pert23 - NLP
No ratings yet
Pert23 - NLP
30 pages
Unit 2
No ratings yet
Unit 2
65 pages
A Cognition-Inspired Knowledge Representation Approach For Knowledge-Based Interpretation Systems
No ratings yet
A Cognition-Inspired Knowledge Representation Approach For Knowledge-Based Interpretation Systems
7 pages
Pec Gen Ai Notes
No ratings yet
Pec Gen Ai Notes
11 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
30 pages
Knowledge Representation in AI Techniques
No ratings yet
Knowledge Representation in AI Techniques
5 pages
Knowledge Representation
No ratings yet
Knowledge Representation
65 pages
UEU Sistem Pendukung Keputusan Pertemuan 11
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 11
48 pages
Unit 2
No ratings yet
Unit 2
152 pages
Ai 3
No ratings yet
Ai 3
250 pages
AI Notes
No ratings yet
AI Notes
19 pages
AI Fundamentals With Capstone Session 2 - July 9th 2025
No ratings yet
AI Fundamentals With Capstone Session 2 - July 9th 2025
44 pages
AI & Expert Systems Overview
No ratings yet
AI & Expert Systems Overview
92 pages
Unit 4 - IT - N
No ratings yet
Unit 4 - IT - N
197 pages
Unit V
No ratings yet
Unit V
38 pages
AI Cert
No ratings yet
AI Cert
2 pages
Lecture7 2-MultimodalInference
No ratings yet
Lecture7 2-MultimodalInference
68 pages
Transformer
No ratings yet
Transformer
5 pages
Semantic Networks
No ratings yet
Semantic Networks
19 pages
Unit5 AI
No ratings yet
Unit5 AI
30 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
MBA Capstone Course Syllabus
No ratings yet
MBA Capstone Course Syllabus
9 pages
4000 Essential English Words 1
No ratings yet
4000 Essential English Words 1
36 pages
Complete PreTest Emergency Medicine Adam J Rosh HQ File Verified
No ratings yet
Complete PreTest Emergency Medicine Adam J Rosh HQ File Verified
302 pages
Scientific Values and Ethics Guide
No ratings yet
Scientific Values and Ethics Guide
23 pages
Week 2 Library
No ratings yet
Week 2 Library
3 pages
Pedagogy MCQs for Educators
No ratings yet
Pedagogy MCQs for Educators
8 pages
Ug B.ed. Education 70123 C - Teaching of Mathematics 9236
No ratings yet
Ug B.ed. Education 70123 C - Teaching of Mathematics 9236
277 pages
Computational Intelligence Overview
No ratings yet
Computational Intelligence Overview
43 pages
Module 6 Purposive Communication
No ratings yet
Module 6 Purposive Communication
6 pages
Kaltem Gibson - Telekinesis - Unleash Your Telekinetic Ability
No ratings yet
Kaltem Gibson - Telekinesis - Unleash Your Telekinetic Ability
232 pages
Business Statistics 6th Edition Levine Solutions Manual Complete Edition
100% (2)
Business Statistics 6th Edition Levine Solutions Manual Complete Edition
305 pages
A Teacher's Guide To Cosmos
100% (2)
A Teacher's Guide To Cosmos
45 pages
Episode 16
No ratings yet
Episode 16
8 pages
Reviewing Literature: Bringing Clarity and Focus To Your Research Problem
100% (1)
Reviewing Literature: Bringing Clarity and Focus To Your Research Problem
6 pages
Introduction and Research Question
No ratings yet
Introduction and Research Question
18 pages
Im More Dateable Than A Plate of Refried Beans Ginny Hogan Digital Access
No ratings yet
Im More Dateable Than A Plate of Refried Beans Ginny Hogan Digital Access
406 pages
Glocalization-ATheoreticalAnalysis 2
No ratings yet
Glocalization-ATheoreticalAnalysis 2
9 pages
Ai - 1st Sem Booklet
No ratings yet
Ai - 1st Sem Booklet
65 pages
Science 6th Foundation
No ratings yet
Science 6th Foundation
225 pages
Finocchiaro - Galileo and The Art of Reasoning
No ratings yet
Finocchiaro - Galileo and The Art of Reasoning
512 pages
2015 Paper 1 Specimen Paper Markscheme
No ratings yet
2015 Paper 1 Specimen Paper Markscheme
12 pages
Ed 7 BEED 2 TOS FINAL
No ratings yet
Ed 7 BEED 2 TOS FINAL
3 pages
Moreno 1 Learning Episode 2 Activity 2.1
No ratings yet
Moreno 1 Learning Episode 2 Activity 2.1
12 pages
National Open Univeristy of Nigeria: Course Code: Mpa 751
100% (1)
National Open Univeristy of Nigeria: Course Code: Mpa 751
163 pages
How To Answer Structure Questions
No ratings yet
How To Answer Structure Questions
3 pages
Preview Theories of Human Development 3rd
50% (4)
Preview Theories of Human Development 3rd
68 pages
Hans Jonas, Toward A Philosophy of Technology
No ratings yet
Hans Jonas, Toward A Philosophy of Technology
11 pages
BP - CB - IX - Social Science - PT3 - B
No ratings yet
BP - CB - IX - Social Science - PT3 - B
2 pages
Previewpdf
100% (1)
Previewpdf
40 pages
Research Design: Methods & Approaches
No ratings yet
Research Design: Methods & Approaches
7 pages