Key Notes for Natural Language Processing (NLP) Chapter
1. Introduction to NLP
• NLP enables computers to understand, interpret, and process human (natural) languages.
• It bridges the gap between human language and machine language.
• Domains of AI related to NLP: Linguistics, Computer Science, Information Engineering.
2. Applications of NLP
1. Automatic Summarization: Tools like ChatGPT summarize text efficiently.
2. Text Classification: Categorizes data, e.g., email spam filtering.
3. Sentiment Analysis: Identifies opinions or sentiments from text (e.g., customer reviews).
4. Virtual Assistants: Tools like Siri, Alexa process and respond to voice commands.
5. Chatbots: Automated systems for interaction (customer support or therapy).
3. Challenges in NLP
• Syntax Issues: Understanding grammar and sentence structure.
• Semantics: Interpreting multiple meanings of words based on context.
• Ambiguity: Words with identical spelling but different meanings.
• Perfect Syntax, No Meaning: Grammatically correct sentences may lack logical sense.
4. Text Normalization
Steps to simplify text for machine processing:
1. Sentence Segmentation: Dividing text into sentences.
2. Tokenization: Splitting sentences into individual words or tokens.
3. Stopwords Removal: Removing common but uninformative words (e.g., "and," "the").
4. Lowercase Conversion: Ensures case consistency (e.g., "HELLO" → "hello").
5. Stemming: Reduces words to root forms (e.g., "playing" → "play").
6. Lemmatization: Converts words to meaningful base forms (e.g., "studies" → "study").
5. Bag of Words (BoW) Model
• A technique to convert text into numerical features for machine learning.
• Key Outputs:
o Vocabulary: Unique words in the corpus.
o Frequency: Count of each word's occurrences.
• Steps:
1. Normalize text.
2. Create a vocabulary.
3. Generate a document vector table.
6. Types of Chatbots
1. Script Bots: Follow predefined rules or scripts (e.g., basic customer service bots).
2. Smart Bots: AI-powered, capable of dynamic interactions (e.g., Google Assistant).
7. Human vs. Machine Language Processing
• Human Language: Intuitive, context-dependent, and ambiguous.
• Machine Language: Numeric, precise, and error-prone without proper input.