0% found this document useful (0 votes)

18 views30 pages

Web Minnig

Uploaded by

Zahra Waheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views30 pages

Web Minnig

Uploaded by

Zahra Waheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Deeper Dive into

Purpose-Built
Search: A Bullet
Point Journey
Core Concept

Tailored information retrieval systems designed for specific domains or

user needs, offering superior relevance and efficiency compared to
general-purpose search.
Key Benefits:

 Domain Expertise: Deep understanding of language, data

structures, and search intent within a specific domain.
 Targeted Functionalities: Specialized features and operators
catered to the domain (e.g., legal citation search, product filtering).
 Streamlined Efficiency: Faster and more accurate results, saving
time and effort.
Diverse Applications:

 E-commerce: Advanced product comparisons based on specific criteria.

 Legal Research: Efficient navigation of databases with specialized search
operators.
 Enterprise Search: Role-specific search for internal documents and
resources.
 Media & Entertainment: Granular search by genre, cast, release date, etc.
 Scientific Exploration: Domain-specific ranking algorithms for relevant
research papers.
 Healthcare: Search medical databases based on symptoms, diagnoses, and
medications.
 Education: Curated search experiences for students and educators across
disciplines.
Technical Underpinnings:

 Advanced Indexing & Processing: Algorithms optimize data for

specific domain searches.
 Specialized Query Understanding: Intent analysis tailored to the
domain vocabulary and patterns.
 Domain-Specific Ranking: Prioritizes results based on relevance
and search context within the domain.
Emerging Trends:

 AI-Powered Insights: Extracting deeper connections and patterns

from search results.
 Cross-Domain Integration: Seamlessly search across specialized
tools for broader exploration.
 Personalization & Adaptability: Intuitive interfaces learning from
user habits and preferences.
Future Implications:

 Democratization of information access across various domains.

 Increased productivity and efficiency in knowledge-driven tasks.
 Personalized learning experiences and deeper understanding of
complex topics.
Controlled Queries vs.
Uncontrolled Queries in
Web Mining:
Concept

 Controlled queries: Formulated by the researcher with specific

goals and requirements, often tailored to a particular domain or
dataset. They leverage structured query languages (e.g., SQL, XPath)
or web APIs to precisely retrieve relevant data.
 Uncontrolled queries: Submitted by users (e.g., search
keywords, reviews, forum posts) with varying levels of
clarity, structure, and intent. They represent spontaneous information
needs in diverse formats and require parsing, understanding, and
interpretation.
Key Differences:
Key Differences:
Relation to Web Mining:

 Controlled queries:
 Used to access well-organized data repositories (e.g., databases, websites with clean
APIs)
 Support targeted extraction of specific data points for analysis or modeling
 Examples: Crawling product prices from e-commerce sites, extracting scientific
literature through APIs
 Uncontrolled queries:
 Often require pre-processing, text analysis, and natural language processing (NLP)
techniques
 Present challenges due to noise, subjectivity, and ambiguity
 Used for broader exploration, sentiment analysis, topic modeling, and understanding
user behavior
 Examples: Analyzing customer reviews, mining social media trends, exploring
unstructured knowledge bases
Considerations:

 Choice between controlled and uncontrolled queries depends on

research objectives, data availability, and resource constraints.
 Both approaches can be valuable, and often they are combined for
comprehensive web mining.
 Uncontrolled queries offer broader insights but necessitate deeper
understanding and careful processing.
Web Mining Examples:

 Travel website data:

 Controlled queries could be used to extract hotel listings based on specific
criteria (location, price, amenities).
 Uncontrolled queries could analyze visitor reviews to understand sentiment
and identify areas for improvement.
 News analysis:
 Controlled queries could retrieve articles on specific topics from credible
sources.
 Uncontrolled queries could explore broader social media discussions to
uncover emerging trends and public opinion.
Future Directions:

 Integration of semantic web technologies and advanced NLP

techniques to better understand unstructured data.
 Development of adaptive mining methods that can dynamically
switch between controlled and uncontrolled queries based on context
and needs.
 Enhanced use of explainable AI (XAI) to make query interpretation
and analysis more transparent.
Understanding
Word Embedding
and Word2Vec for
Efficient Language
Processing
https://www.youtube.com/watch?
v=viZrOnJclY0
Understanding Word Embedding and
Word2Vec for Efficient Language Processing

 Word embeddings and the Word2Vec model can be used to

assign numerical representations to words based on their
context, allowing for more efficient processing of language
and understanding of word similarities.
Understanding Word Embedding and
Word2Vec for Efficient Language Processing

 Key insights
• Word embeddings allow similar words to have similar numbers, making it easier to analyze
and understand text data.
• Words with similar meanings and usage should be assigned similar numbers in word
embedding to help neural networks learn more efficiently.
• Backpropagation is used to optimize the random values of the weights in a neural network,
enabling the network to make accurate predictions.
• The word embedding model uses input words to predict the next word in a phrase, assigning
higher values to the desired output word.
• Optimizing the weights of word embeddings can potentially improve the performance of
natural language processing models by capturing semantic relationships between words.
• Using word embeddings can optimize the weights in a neural network, allowing it to learn
how similar words are used and improve language processing.
• Word2vec efficiently creates word embeddings by selectively optimizing weights for specific
outputs, allowing for the creation of multiple embeddings for each word in a large vocabulary.
Q&A

 What are word embeddings and Word2Vec?

 —Word embeddings and Word2Vec are methods used to convert
words into numerical representations based on their context, making
it easier to process language and understand word similarities in
machine learning.
 How does a neural network determine word associations?
 —A simple neural network can determine the association between
words and numbers based on their context in phrases, allowing for
the prediction of the next word in a phrase.
Q&A

 Why is training a neural network important for word

embeddings?
 —Training a neural network is important for correctly predicting the
next word in a phrase and adjusting word embeddings to make
similar words more similar to each other based on their context.
 What strategies does Word2Vec use to increase context in
word embeddings?
 —Word2Vec uses two strategies, continuous bag-of-words and skip-
gram, to increase context in word embeddings by predicting
surrounding words based on the middle word and vice versa.
Q&A

 How does Word2Vec optimize training for word embeddings?

 —Word2Vec speeds up training by using negative sampling to
optimize only for the words we want to predict, efficiently creating
word embeddings by selecting a few words to predict and optimizing
only a fraction of the total weights in the neural network.
Timestamped Summary


00:00 Word embeddings and word2vec convert words into numbers,
allowing similar words to have similar numerical representations for easier
use in machine learning algorithms.
 02:38 Similar words should have similar numbers to help a neural network
learn and apply knowledge, and a simple neural network can determine
word-number associations based on context.
 04:54 We create a neural network with inputs for each unique word,
connect them to activation functions, and optimize the weights through
backpropagation to associate numbers with each word.
 06:20 Using word embeddings and the Word2Vec model, we can predict the
next word in a phrase by training a neural network to assign values to input
words, connect them to activation functions with weights, and run the
outputs through the softmax function for classification.
Timestamped Summary

 08:18 Word embeddings are adjusted through backpropagation to

make words that appear in the same context more similar to each
other, and the neural network accurately predicts the next word
based on input.
 10:37 Training a neural network with Word2Vec can help process
language and understand how similar words are used by assigning
numbers to words based on their context.
 12:31 Word2Vec uses multiple activation functions and a large
vocabulary to efficiently create word embeddings by optimizing only
a fraction of the total weights in the neural network.
GOOGLE BERT

 https://jalammar.github.io/illustrated-bert/
How to download pre-trained models and corpora

 https://radimrehurek.com/gensim/auto_examples/howtos/
run_downloader_api.html
Pre trained corpus

 A pre-trained corpus is a massive collection of text data that has

already been used to train a language model. Think of it like a vast
library of books that a language model has already read and learned
from. This "reading" process lets the model understand the nuances
of language, like how words are used together, sentence structure,
and different writing styles.
What's in it?

 A pre-trained corpus can contain diverse sources like

books, articles, code, websites, and even social media conversations.
 The size can vary, with some corpora containing billions of words!
Why is it used?

 Training a language model from scratch requires immense computing

power and data.
 Pre-trained corpora save time and resources by providing a
foundation of knowledge.
 The model can then be fine-tuned on specific tasks like summarizing
text, translating languages, or writing different kinds of creative
content.
Benefits:

 Faster training of language models.

 Improved performance on various NLP tasks.
 Adaptability to diverse domains by fine-tuning.
Examples:

 Well-known pre-trained corpora include Wikipedia, BookCorpus, and

Common Crawl.
 Specialized corpora exist for legal documents, medical texts, or
scientific papers.

BDMH LLM
No ratings yet
BDMH LLM
51 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Word2Vec Overview and Techniques
No ratings yet
Word2Vec Overview and Techniques
33 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
Word2Vec: Vector Representations Explained
No ratings yet
Word2Vec: Vector Representations Explained
31 pages
NLP & AI Techniques Guide
No ratings yet
NLP & AI Techniques Guide
37 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
4 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
Word 2 Vec
No ratings yet
Word 2 Vec
28 pages
DL U45hl
No ratings yet
DL U45hl
19 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Neural Networks in NLP Overview
No ratings yet
Neural Networks in NLP Overview
162 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Natural Language Processing: Lecture # 7
No ratings yet
Natural Language Processing: Lecture # 7
36 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
1725888984module 4 Deep Learning For Natural Language Processing (NLP)
No ratings yet
1725888984module 4 Deep Learning For Natural Language Processing (NLP)
15 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Word Vectors: Word2Vec and GloVe Explained
No ratings yet
Word Vectors: Word2Vec and GloVe Explained
39 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Understanding Word2Vec and Dense Vectors
No ratings yet
Understanding Word2Vec and Dense Vectors
60 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Lect 5
No ratings yet
Lect 5
40 pages
Computational Intelligence Endsem
No ratings yet
Computational Intelligence Endsem
8 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
L5 - L6 - Natural Language Processing
100% (1)
L5 - L6 - Natural Language Processing
94 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
Important 2 Marks
No ratings yet
Important 2 Marks
11 pages
Neural Networks and Word Vectors Explained
No ratings yet
Neural Networks and Word Vectors Explained
96 pages
NLP Word Vectors for Students
No ratings yet
NLP Word Vectors for Students
33 pages
NLP Scheme for Mobile Forensics Exam
No ratings yet
NLP Scheme for Mobile Forensics Exam
6 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
14 Key Skills To Master Large Language Models 1729745509
No ratings yet
14 Key Skills To Master Large Language Models 1729745509
17 pages
Word Vectors and Text Classification Techniques
No ratings yet
Word Vectors and Text Classification Techniques
52 pages
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
100% (1)
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
24 pages
FDP Deep Learning Architectures and Applications
No ratings yet
FDP Deep Learning Architectures and Applications
51 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
8 pages
2023 LLMBC Augmented Lms
No ratings yet
2023 LLMBC Augmented Lms
95 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
Slide
No ratings yet
Slide
28 pages
Part 3
No ratings yet
Part 3
5 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
Text Representation in NLP Techniques
No ratings yet
Text Representation in NLP Techniques
57 pages
2023 07 28 Evolution of Language Models
No ratings yet
2023 07 28 Evolution of Language Models
73 pages
Embedding Optimization in Pinterest Ads
No ratings yet
Embedding Optimization in Pinterest Ads
82 pages
7 MachineLearningBasics
No ratings yet
7 MachineLearningBasics
46 pages
Organization of Culture
No ratings yet
Organization of Culture
9 pages
Maturity Levels
No ratings yet
Maturity Levels
8 pages
Lecture 1 - 2
No ratings yet
Lecture 1 - 2
33 pages
SEE Presentation File
No ratings yet
SEE Presentation File
5 pages
Interacting-Enhancing Feature Transformer For Cross-Modal Remote-Sensing Image and Text Retrieval
No ratings yet
Interacting-Enhancing Feature Transformer For Cross-Modal Remote-Sensing Image and Text Retrieval
15 pages
English Preparation Guide DAF 202306
No ratings yet
English Preparation Guide DAF 202306
12 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
Data Warehousing & Mining Question Bank
No ratings yet
Data Warehousing & Mining Question Bank
10 pages
MSC Thesis Maintainability Analysis of Mining Trucks With Data Analytics
No ratings yet
MSC Thesis Maintainability Analysis of Mining Trucks With Data Analytics
76 pages
Machine Learning 2M&10M Qpaper
No ratings yet
Machine Learning 2M&10M Qpaper
3 pages
Zafira fk,+4 Vol11No1 855+ (36-47) +
No ratings yet
Zafira fk,+4 Vol11No1 855+ (36-47) +
12 pages
DBSCAN Clustering Algorithm Based On Density
No ratings yet
DBSCAN Clustering Algorithm Based On Density
5 pages
Analysis and Probability
100% (3)
Analysis and Probability
320 pages
BDA Notes Unit-1
No ratings yet
BDA Notes Unit-1
18 pages
100 AI Algorithms
No ratings yet
100 AI Algorithms
5 pages
Efficient Frequent Itemset Mining Techniques
No ratings yet
Efficient Frequent Itemset Mining Techniques
47 pages
Overview of Business Intelligence Techniques
No ratings yet
Overview of Business Intelligence Techniques
2 pages
Seecs 2712
No ratings yet
Seecs 2712
26 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Single Layer Perceptron and Multilayer Perceptron
No ratings yet
Single Layer Perceptron and Multilayer Perceptron
2 pages
University of WOLVERHAMPTON
No ratings yet
University of WOLVERHAMPTON
13 pages
9.heart Disease Diagnosis and Prediction Based On Hybrid 30o3m8z8
No ratings yet
9.heart Disease Diagnosis and Prediction Based On Hybrid 30o3m8z8
6 pages
Analyzing Analytics
No ratings yet
Analyzing Analytics
126 pages
Book - Computing With Spatial Trajectories of Humans
100% (4)
Book - Computing With Spatial Trajectories of Humans
328 pages
Probablistic Clustering
No ratings yet
Probablistic Clustering
28 pages
Data Mining for Analysts
100% (1)
Data Mining for Analysts
29 pages
Curse of Dimensionality and Its Reduction
No ratings yet
Curse of Dimensionality and Its Reduction
5 pages
Li Et Al. (2022)
No ratings yet
Li Et Al. (2022)
17 pages
Unit - 1 INTRODUCTION, DATA - 1: What Is Data Mining? Motivating Challenges The Origins of Data 6 Hours
No ratings yet
Unit - 1 INTRODUCTION, DATA - 1: What Is Data Mining? Motivating Challenges The Origins of Data 6 Hours
6 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
52 pages
Incremental Erasable Pattern Mining
No ratings yet
Incremental Erasable Pattern Mining
15 pages
Clustering in Data Mining Explained
No ratings yet
Clustering in Data Mining Explained
36 pages
Free Download Research Paper On Data Mining
100% (1)
Free Download Research Paper On Data Mining
8 pages
Introduction To Data Mining Assignment 2
No ratings yet
Introduction To Data Mining Assignment 2
1 page
Fuzzy Logic in Clinical Trials
No ratings yet
Fuzzy Logic in Clinical Trials
6 pages

Web Minnig

Uploaded by

Web Minnig

Uploaded by

Deeper Dive into

Tailored information retrieval systems designed for specific domains or

 Domain Expertise: Deep understanding of language, data

 E-commerce: Advanced product comparisons based on specific criteria.

 Advanced Indexing & Processing: Algorithms optimize data for

 AI-Powered Insights: Extracting deeper connections and patterns

 Democratization of information access across various domains.

 Controlled queries: Formulated by the researcher with specific

 Choice between controlled and uncontrolled queries depends on

 Travel website data:

 Integration of semantic web technologies and advanced NLP

 Word embeddings and the Word2Vec model can be used to

 What are word embeddings and Word2Vec?

 Why is training a neural network important for word

 How does Word2Vec optimize training for word embeddings?

 08:18 Word embeddings are adjusted through backpropagation to

 A pre-trained corpus is a massive collection of text data that has

 A pre-trained corpus can contain diverse sources like

 Training a language model from scratch requires immense computing

 Faster training of language models.

 Well-known pre-trained corpora include Wikipedia, BookCorpus, and

You might also like