0% found this document useful (0 votes)
29 views7 pages

NLP Module 6 Notes

Module 6 covers various applications of Natural Language Processing (NLP) including machine translation, information retrieval, question answering systems, sentiment analysis, text categorization, named entity recognition, and ethical considerations. Each application is defined, with examples provided, and challenges are discussed, particularly in relation to bias and fairness. Additionally, transfer learning is highlighted as a method for improving NLP tasks using pre-trained models.

Uploaded by

nikyadav456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views7 pages

NLP Module 6 Notes

Module 6 covers various applications of Natural Language Processing (NLP) including machine translation, information retrieval, question answering systems, sentiment analysis, text categorization, named entity recognition, and ethical considerations. Each application is defined, with examples provided, and challenges are discussed, particularly in relation to bias and fairness. Additionally, transfer learning is highlighted as a method for improving NLP tasks using pre-trained models.

Uploaded by

nikyadav456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Module 6: Applications of NLP

- Machine translation:

- Rule-based

- Statistical

- Neural approaches

- Information retrieval:

- Search engines

- Semantic search

- Ranking algorithms

- Question Answering (QA) systems:

- Open-domain QA

- Closed-domain QA

- Conversational QA

- Text processing applications:

- Categorization

- Summarization (extractive & abstractive)

- Sentiment and opinion analysis (aspect-based sentiment analysis, emotion recognition)

- Named Entity Recognition (NER) and entity linking

- Ethical considerations in NLP: Bias in language models, fairness, interpretability

1. Machine Translation (MT)

Definition:

Machine Translation (MT) is an NLP task that automatically converts text from one language to another. It goes
beyond word-for-word translation to preserve the meaning, tone, and context of the source language.

Types of MT Systems:

Type Description Example

Rule-Based Uses linguistic grammar rules and bilingual dictionaries. SYSTRAN


(RBMT)

Statistical (SMT) Uses probabilities from large bilingual corpora (parallel texts). IBM Translation
Model

Neural (NMT) Uses Deep Learning models (e.g., LSTMs, Transformers) to learn Google Translate,
translation patterns contextually. DeepL

Process Flow (Flowchart):

Input Text → Tokenization → POS Tagging → Parsing → Semantic Analysis → Translation Generation → Target
Language Output
Example Input/Output:

Input (English): “How are you?”


Output (Hindi): “आप कैसे हैं ?”

Key Challenges:

• Ambiguity: Multiple meanings for the same word.

• Idioms: “Break a leg” → “Good luck” (not literal).

• Cultural Nuances: Context lost due to cultural references.

• Context Sensitivity: Same word changes meaning with context.

Applications:

• Website localization (e.g., multilingual websites).

• Document translation (technical, legal).

• Real-time translation (Google Translate).

• Language learning and accessibility tools.

2. Information Retrieval (IR)

Definition:

Information Retrieval is the process of fetching relevant documents from a large collection (corpus) based on a user
query. It powers search engines like Google or Bing.

Core Process:

Flowchart:

User Query → Preprocessing (Tokenization, TF-IDF) → Document Matching → Ranking → Top-k Results Displayed

Approaches to Matching:

1. Direct Match: Exact string match (inefficient).

2. Regex Matching: Uses patterns for flexible search.

3. Fuzzy Matching: Allows minor spelling variations.

4. Distance-based: Hamming/Levenshtein distances.

5. TF-IDF: Weighted word frequency.

6. Embedding Similarity: Uses word vectors and cosine similarity.

Ranking Techniques:

• Pointwise: Regression-based ranking using relevance score.


• Pairwise: Compares document pairs (RankNet, LambdaRank).

• Listwise: Optimizes ranking metrics like NDCG (Normalized Discounted Cumulative Gain).

Example Input/Output:

Input Query: “Best NLP research papers 2024”


Output: Ranked list of documents based on relevance (via cosine similarity or TF-IDF).

Applications:

• Search engines (Google, Bing)

• Job search tools (LinkedIn)

• E-commerce recommendations

• Research databases (Google Scholar)

3. Question Answering (QA) Systems

Definition:

QA systems allow computers to answer human questions directly by understanding the query and extracting or
generating precise answers.

Types:

Type Description

Open-domain General knowledge questions. Example: “Who is the Prime Minister of India?”

Closed-domain Domain-specific (medical, education).

Factoid Short factual answers.

Non-factoid Long explanatory answers.

Process Flow:

User Question → Natural Language Understanding → Information Retrieval → Answer Extraction → Response
Generation

Example Input/Output:

Input: “Who invented Python?”


Output: “Guido van Rossum in 1991.”

Applications:

• Search engine featured snippets.

• Chatbots and voice assistants (Alexa, Siri).


• Customer support automation.

• Educational tutoring systems.

4. Sentiment and Opinion Analysis

Definition:

Sentiment Analysis (or Opinion Mining) determines whether the emotional tone in text is positive, negative, or
neutral.

Levels of Analysis:

1. Document-level: Overall emotion of the document.

2. Sentence-level: Sentiment for each sentence.

3. Aspect-based: Opinion on specific attributes (e.g., “battery life poor, camera great”).

Approaches:

• Rule-based: Uses sentiment lexicons.

• Machine Learning-based: Trained models (Naive Bayes, SVM).

• Deep Learning-based: LSTM, BERT, Transformer models.

Example Input/Output:

Input: “The product quality is amazing but the delivery was slow.”
Output:

• Product quality → Positive

• Delivery → Negative

Applications:

• Customer feedback monitoring

• Brand reputation tracking

• Market trend analysis

• Healthcare emotion detection

5. Text Categorization

Definition:

Text categorization (or text classification) is assigning predefined labels to text based on its content.
Process Flow:

Input Text → Preprocessing (Tokenization, Stopword Removal) → Feature Extraction (TF-IDF/Embeddings) →


Classification (Naive Bayes, SVM, BERT) → Output Category

Example Input/Output:

Input: “Stock prices are falling rapidly.”


Output: Category = “Finance News”

Applications:

• News categorization

• Spam email detection

• Topic classification

• Sentiment tagging on social media posts

6. Named Entity Recognition (NER) and Entity Linking

Definition:

NER identifies named entities such as persons, organizations, locations, dates, etc.
Entity Linking connects these entities to structured databases (e.g., Wikipedia, DBpedia).

Example Input/Output:

Input: “Elon Musk is the CEO of Tesla.”


Output:

• Elon Musk → Person

• Tesla → Organization

Entity Linking: Tesla → “Tesla, Inc.” (Wikipedia link)

Challenges in NER:

• Ambiguity (e.g., “Apple” = fruit or company)

• Multilingual data

• Inconsistent capitalization

• Data bias or lack of representation

Applications:

• Information extraction

• News summarization

• Knowledge graph building


• Chatbots and Q&A systems

7. Ethical Considerations in NLP

Definition:

Ethics in NLP refers to ensuring fairness, transparency, and privacy in NLP systems and datasets.

Major Ethical Issues:

Issue Description

Bias and Fairness Models inherit biases from unbalanced training data.

Privacy Personal or sensitive data leakage during text processing.

Transparency Lack of explainability in model decisions.

Misinformation Automated text generation can spread false information.

Example:

If a dataset overrepresents one gender in job roles, a resume-screening NLP model may show gender bias.

Mitigation Strategies:

1. Use diverse and representative datasets.

2. Apply fairness-aware algorithms.

3. Maintain explainability (model transparency).

4. Implement user feedback loops.

5. Follow ethical guidelines and legal frameworks.

8. Transfer Learning in NLP

Definition:

Transfer Learning is the process of using a pre-trained NLP model (like BERT, GPT) for a new task with limited data. It
transfers knowledge learned from one domain to another.

Process:

Pre-trained Model (on large corpus) → Fine-tuning (on specific task) → Task-specific Output

Example Input/Output:

Input: Pre-trained BERT on Wikipedia → Fine-tune for Sentiment Analysis


Output: Classifies sentences as Positive/Negative with high accuracy.
Advantages:

• Reduces training time.

• Requires less labeled data.

• Provides high accuracy with fewer resources.

Applications:

• Text classification

• Sentiment analysis

• Named entity recognition

• Question answering

Summary Table: Applications Overview

Application Goal Techniques Used Example Tool/Model

Machine Translate text between Neural MT, Attention Google Translate


Translation languages Mechanism

Information Fetch relevant documents TF-IDF, Ranking, Cosine Google Search


Retrieval Similarity

Sentiment Analysis Detect emotion/opinion SVM, LSTM, BERT Tweepy Sentiment


Classifier

Text Categorization Classify text into topics TF-IDF, Naive Bayes, BERT Spam Filters

NER Identify named entities CRF, SpaCy, BERT Chatbots

Transfer Learning Use pre-trained models Fine-tuning BERT/GPT HuggingFace Models

Ethics in NLP Ensure fairness, privacy Bias detection, Explainability Responsible AI


Frameworks

You might also like